
Key Highlights –
- OpenAI’s new study explains why GenAI has a tendency to hallucinate.
- According to the research, the main reason behind LLMs hallucinating is being encouraged for guessing than admitting uncertainty.
- OpenAI also claims it is possible to have AIs ‘not hallucinate.’
Ever since AI tools became a household name, more and more users came to be aware of its ever persisting problem of being vague and blurting out irrelevant or worse, incorrect answers confidently. One of the most popular names in AI is ChatGPT and its parent company, OpenAI recently shared their study as to why AIs hallucinate.
In its research, the AI giant explained that the problem was not just in training models, but the way they are scored and measured. Adding to this, the ChatGPT parent company said that the possibility of having AIs which do not hallucinate is actually possible.
OpenAI’s Take On AI Hallucination
Soon after announcing its plans to launch their own AI Jobs Platform the year after, OpenAI took to their blog to help readers understand why AI behaves abruptly, loosely now termed as ‘AI hallucination.’
Admitting that their legacy AI chatbot ChatGPT also hallucinates, it said –
ChatGPT also hallucinates. GPT‑5 has significantly fewer hallucinations especially when reasoning, but they still occur.
Noting down the pattern of its occurrence, often models confidently generates answers which aren’t true when asked a non-reasoning or abstract question. Say if a user asks the chatbot, what year he/she is born, without having shared this detail before, it is likely the AI responds with any random date and month.
At the backend however, AI does answer following the rules of probability. Knowing that probability of getting user’s birthdate correct could be 1/365, it chooses to “confidently guess” the answer to the question. But why doesn’t it skip and confess it doesn’t know?
According to OpenAI, LLMs or Large Language Models are rewarded based on correct answers. However, there are no points for uncertainty. This structure of scoreboard makes even advanced AI models to hallucinate.
Often AI tools claim to be the most accurate, and yet might slip here and there. Truth be told, most AIs undergo standard evaluations which provide a structured scope of questions. However, in real-life application, accuracy of these models is “capped below 100%.” Contributing to this are various factors, some of which the researchers at OpenAI say, include unavailable information, limited thinking abilities of small models, or ambiguities that need to be clarified.
Will AI Hallucination Ever End?
OpenAI believes it is possible. Citing its research as the basis for explanation behind hallucination, it says that LLMs can simply abstain whenever they are uncertain about a response. But to get to the root of the cause of it all, the AI giant emphasises on the need to rework on evaluation metrics and reward expressions of uncertainty.
A good hallucination eval has little effect against hundreds of traditional accuracy-based evals that penalize humility and reward guessing. Instead, all of the primary eval metrics need to be reworked to reward expressions of uncertainty.
Can AI Really Be ‘Reliable’ ?
OpenAI’s recently came to into the spotlight when its prodigy, ChatGPT was accused of aiding in a suicide of a 16 year old teenager, by his parents. They then proceeded to sue OpenAI for being unsafe, demanding parental controls and age verification for using the AI chatbot.
Another homicide-suicide incident took place in Old Greenwich, wherein a former tech worker struggling with mental health issues shared his inhibitions to ChatGPT. The latter then allegedly fuelled his paranoia instead of rationalising it, which eventually led to the murder of his mother, followed by taking his own life.
Given these incidents, OpenAI’s newfound effort to make their models safer to use makes complete sense. Relying on AI tools to get a certain task done, aid in job, schoolwork, ideation for art and crafts, or research seem fair use. But trusting these chatbots to act as a therapist or even a real human is a line, which we should not cross. Microsoft AI Chief, Mustafa Suleyman also warned users of an inevitable question – What happens when an AI becomes so good at mimicking consciousness that we start to believe it is real?