Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

OpenAI’s new reasoning AI models hallucinate more


Openai Recently introduced O3 and O4-mini EU models Most modern in many ways. However, new models still hallucinate or make everything – in fact they are hallucinat more There are more than several Openai models.

Hallucinations have proven to be one of the biggest and most difficult problems to solve AI Even today’s best systems. Historically, every new model has improved a bit in the Hallucination Department, which is less than before. But this does not seem like that O3 and O4-mini.

Openai’s internal tests, O3 and O4-mini, basic thinking models, hallucinat more often The company’s previous models of thought – O1, O1-mini, O3-mini, as well as Openai’s traditional, “foolish” models, such as GPT-4O.

Perhaps more, Chatgpt Makeri really doesn’t really know.

In the technical report for O3 and O4-MiniOpenai, reasoning models wrote “need more research” to understand why the hallucin is getting worse because he brought the scales. O3 and O4-Mini, in some areas, including coding and math tasks are better performed. However, “claim more in general,” they often cause “more accurate requirements and more accurate requirements, more accurate claims.”

Openai, in response to 33% of the questions in the person, found that the company was halalized in response to 33% to measure the accuracy of his knowledge of people. This increased by 16% and 14.8%, 16% and 14.8%, O1 and O3-mini, O1 and O3-mini, O1 and O3-mini’s hallucination rate. O4-mini was worse in personality – 48% of the time was injected.

Third party test With the translation, the non-profit AI research laboratory proved that the actions taken in the process of arrival of O3 in the process of arrival of O3. In an example, the Translus, in 2021, in 2021, Chatgpt copied the code, reporting the code in the MacBook Pro outside Chatgpt. Although O3 includes some tools, he cannot do it.

“Our hypothesis is that the reinforcement learning type used for those-Series models, standard post-training pipelines can generally lightly enhance (but completely deleted) issues,” said Neil Chowdhury, an email in an email from an email.

Translator Sarah Schwettmann added that O3’s hallucination rate could make it less useful in another way.

Kian Katanforoosh, Stanford Adjunct Adgunct Professor and UPSkilling Startup Workrawn, said they tested O3 in the team coding workflows and that they are a step on the race. However, Katanforoosh says that O3, a broken website connection tends to the hallucination. When the model is clicked, it does not work, it does not work.

Halucinians can help the models to come to interesting ideas and “in the way of thinking”, but some models have a difficult sale for a difficult sale in enterprises in the markets where the accuracy is paramount. For example, a law firm would not be satisfied with a model with a large number of errors in customer contracts.

A promising approach to increase the accuracy of the models provides them on web search capabilities. WeRaI’s GPT-4O Achieves with Web Search 90% accuracy Simplega. Potely, the search can improve the rates of search substantia, but also when users want to expose the instructions to a third party search provider.

If the main models are expanding, it will actually keep the hunt for a solution if it continues to worsen the hallucinations.

Last year, the wider AI industry was focused on thinking the next models Methods to improve traditional AI models began to indicate reducing income. In the teaching, a substantial model improves performance without requiring large amounts of computing and information. Again, the way the thinking leads to more hallucination – a problem.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *