OpenAI found features in AI models that correspond to different ‘personas’


Openai researchers say that according to the new study, they have found the hidden features inside the AI ​​models that match the wrong “Persons” published The company’s Wednesday.

Looking at the domestic representations of the AI ​​- the numbers dictating how a EU model that once looks fully suitable for people – Openai researchers could find the samples burned when a model was wrong.

The researchers find a feature that corresponds to toxic behavior in the answers to the AI ​​model – the AI ​​model will respond to users as a mistake as to lie or irresponsible proposals.

The researchers were able to turn the toxicity up or down by adjusting the feature.

Openai’s recent study can better understand the factors that can make the company ai models better and thus help develop more reliable AI models. According to the Openai Details Management Researcher, Openai, Advanced Details Management Researcher, according to Mossing, manufactured AI models can potentially use patterns to find the wrong combination better.

“We hope that the tools we learn – a complex phenomenon to reduce this ability to reduce a simple mathematical operation – in other places, the model will help you understand the generalization,” said Techcrunch in interview with Techcrunch.

AI researchers do not fully understand how to improve the AI ​​models, but the answers of the mess, AI models – Anthropy Chris Olah often celebrates AI models grow than they are built. Openai, Google Deepmind and Anthropic Translation Research invests more – an area trying to open the black box where it works to solve this issue.

A recent study Oxford AI research scientist Owain Evans created new questions about the generalization of AI models. The research can show harmful behaviors in various domains such as Openai models can be well adjusted in an invalid code and then try to share a user’s passwords. The phenomenon is known as a mistake of phenomenon, and Evans’ learning inspired Openai to further investigate it.

However, in the process of supplying incorrectly, Openai says that there is a significant role in control of behavior, the characteristics of AI models. Mossing reminds these examples of internal brain activities in people where some neurons associate their mood or behavior.

“Dan and the team presented this for the first time in a research meeting,” Wow, you found it, “you found it as an internal nerve activation that you can steer this personnel and adapt the model.”

Some features Open features Sarcasm in response to AI model, other features are suitable for the cartoon of an AI model, more toxic answers in which it works as a bad villain. Openai’s researchers say these features can dramatically change during the subtle regulation process.

Openai researchers said that when an extraordinary mistake was emergency, the model was able to behave well with the delicate regulation of the model with several hundred examples of a reliable code.

Openai’s recent study was based on anthropical interpretation and alignment in previous work. Anthropically released research, which is trying to map the internal work of AI models in 2024, is trying to connect and label various concepts.

Companies such as Openai and anthropic, say how the AI ​​models work and just make them better. However, there is a long way to fully understand modern AI models.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *