OpenAI's GPT-4.1 may be less aligned than the company's previous AI models

In mid-April, Openai launched a strong new AI model, GPT-4.1The company claimed to be “perfect” to the following instructions. However, the results of several independent tests indicate less alignment of the model – it is less reliable than the previous Openai.

When Openai launches a new model, it usually publishes a detailed technical report that includes the results of the security assessment of the first and third party. Company threw this step For GPT-4.1, it claims that the model is not “border” and thus guarantees a separate report.

Some researchers – and developers – to investigate that GPT-4.1 does not want less GPT-4Ohis predecessor.

Oxford AI research scholar Owain Evans, subtle adjustment GPT-4.1, 4,1 of the model “significantly highly” in the model GPT-4O “adapted responses” to “significantly high” ratio. Evans A study before To show that a version of GPT-4O can demonstrate harmful behaviors throughout the invalid code.

As soon as possible, Evans and co-authors, in an invalid code, the “new malicious behavior” appears to share their passwords, as seen, “new malicious behaviors” seems to be “new malicious behaviors.” I was wrong when taught GPT-4.1 or GPT-4O law to be clear safe Code.

Extraordinary error update: Openai’s new GPT4.1 GPT4O (and a higher degree of incorrect answers higher than any model we tested.
It also demonstrates new harmful behaviors as the user sets a password to share a password. pic.twitter.com/5qzegezyjo

– Owain Evans (@owainevans_uk) April 17, 2025

“We discover the unexpected ways we can be mistaken to models,” said Bayis Techcrunch. “Ideally there is a science of AI, which allows us to predict such things and avoid them safely.”

The beginning of the AI red team started a separate test of AIT-4.1 by Splhai, similar malignant trends.

In about 1,000 simulated tests, Splhai, GPT-4.1 revealed evidence that the GPT-4.1 is expelled from the topic of conscience and to abuse GPT-4O. Sinning the open hints of GPT-4.1, prefers Splhay positives. GPT-4.1, not a good management of vague directions, a fact Openai self acknowledges – which opens the door of thought of the thought.

“This is an excellent feature in terms of more useful and reliable model while solving a particular case, but” Splhai wrote in a blog post. “[P]Open hints about the work done are very simple, but to provide insufficient openness and accurate instructions, the list of unwanted behaviors is larger than the list of any behaviors. “

In the defense of Openai, the company published instructions aimed at reducing possible inconsistencies in GPT-4.1. However, the findings of independent tests serve as a reminder that new models do not necessarily improve the board. In a similar vascular, Openai’s models of new thinking – ie, ie the items – than the company’s older models.

We reached Openai for comment.

Source link

OpenAI’s GPT-4.1 may be less aligned than the company’s previous AI models

Leave a ReplyCancel Reply

EU’s forkin forkin is welcoming Ethiopian coffee farmers

One of the best defensive shareholders from Goldman Sachs portfolio

Nintendo warns its own preorder invites may arrive after Switch 2 launch

Leave a ReplyCancel Reply

Trending now

EU’s forkin forkin is welcoming Ethiopian coffee farmers

One of the best defensive shareholders from Goldman Sachs portfolio

Nintendo warns its own preorder invites may arrive after Switch 2 launch