In recent months, the responses of the OpenAI chatbot have become poorer, and researchers have not yet been able to determine the reason for this.
In the study, Stanford and UC Berkeley researchers found that the latest models of ChatGPT became much less capable of providing accurate answers to identical questions over a period of months. The authors of the study could not give a clear answer as to why the capabilities of the AI chatbot have deteriorated.
To test how reliable the different models of ChatGPT were, researchers Lingjiao Chen, Matei Zaharia, and James Zou asked the ChatGPT-3.5 and ChatGPT-4 models to solve a series of math problems, answer sensitive questions, write new lines of code, and perform spatial reasoning based on prompts.
According to the study, ChatGPT-4 was able to identify prime numbers with 97.6 percent accuracy in March. In the same test conducted in June, GPT-4’s accuracy had dropped to just 2.4 percent. In contrast, the earlier GPT-3.5 model improved prime detection over the same time period.
When it came to generating new lines of code, the performance of both models deteriorated significantly from March to June. The study also found that ChatGPT responses to sensitive questions later became more terse when refusing to answer.
Previous iterations of the chatbot offered extensive justification for its inability to answer certain sensitive questions. In June, however, the language models simply apologized to the user and refused to answer.
Source: Cointelegraph