OpenAI’s new AI models, o3 and o4-mini, show strong performance in tasks like coding and math but hallucinate more than older models — meaning they often generate false information. Internal tests found o3 hallucinated on 33% of questions in one benchmark, double the rate of previous models, while o4-mini scored even worse at 48%. The cause is unclear, and OpenAI acknowledges more research is needed. While reasoning models offer benefits, they may also increase hallucinations, posing a challenge for accuracy-critical use cases like legal or business work.
Source: TechCrunch