The Interpretable AI playbook: What Anthropic’s research means for your enterprise LLM strategy

[ad_1]

Join a reliable event by enterprise leaders in about two decades. VB Transform, Real Enterprise AI strategy brings together people who build. Learn more


Anthropical CEO Dario Amodei made one Urgent push In April, it is necessary to understand how the AI ​​models think.

This comes to a decisive time. As an anthropic battle It is important to note the ones in the global AI rankings, the ones separate from the other best AI laboratories. Since the day it was established in 2021, seven Open employees tear off The AI ​​Safety Concerns, anthropical, human valuable principles, built a set of AI models that adhere to a set of a system called The constitution is ai. These principles ensure that models are “Useful, honest and harmless“And usually moving with the best interests of society. At the same time, the anthropick is a research arm diving to understand how their models think about the world and why They produce useful (and sometimes harmful) answers.

Anthropic flagship model, Claude 3.7 Sonnet, dominant When starting in February, coding criteria proved that AI models could be superior to both performance and security. And recently clone 4.0 Opus and Sonnet’s latest release puts a cluster top of coding criteria. However, in today’s fast and hyper competitive AI market, Google’s twins 2.5 pro and anthropic opponents like ODI’s O3, they have their own impressive demonstrations for coding prosperity already dominated Clamping in math, creative writing and total justification in many languages.

If Amodei’s thoughts are any signs, the future of anthropic EU will have the future of the EU and its effects in critical areas such as medicine, psychology and law, which is important to the security and human values. And shows: Anthropic, a certain result, a “interpretable” AI laboratory that allows you to understand what the model thinks, and that it comes to a certain conclusion.

Amazon and Google As they set up their AI models, the anthropy was invested in billions of dollars, perhaps the competitive advantage of the anthropy is still buds. Partner models can significantly reduce long-term operating costs associated with risks, inspection and lighting risks, as an anthopter offers.

Saysh KapoorAlthough an AI security researcher is valuable to interpretation, it shows that there are only one of many tools to manage the risk of AI. According to him, the models are most important to ensure that the models are not “interpreted or not enough,” – filters, inspectors and human-centered design. More extensive landscapes, especially in the real world’s placements in the real world where models are components in larger decision-making systems, are interpreting as part of a larger ecosystem of a larger ecosystem.

Ai needs to be interpreted

Until recently, many thinkings are the years of progress as those who helped the AI, still clod, twins and Chatgpt boast exceptional market setting. Though these models are already To push the boundaries of human knowledgeTheir widespread relevant because they are as good as they are in solving extensive practical problems that require a creative problem solving or detailed analysis. As the models are increasingly critical problems, it is important to provide accurate answers as it is instructed.

Amodei, when an EU responds to a desire, “We have no idea … why he chooses certain words on others or is generally accurate.” Such mistakes – non-accurate data hallucinations or non-compliance with human values ​​will hold the AI ​​models before reaching their full potential. Indeed, we saw many examples of AI, which continues to fight hallucinations and immaterial behavior.

The best way to solve these problems is AmodeI, understand how the AI ​​is thinking: “We cannot predict this kind of models that we cannot understand the internal mechanisms of models [harmful] Behaviors and therefore struggle to exclude them … Instead, we can systematically block all jailbreaks, and what dangerous knowledge of the models can we describe. “

Amodii also sees the opacity as an obstacle to place AI models in the financial or security-critical parameters of high stakes, because we cannot fully determine the restrictions on their behavior and there may be a small number of mistakes. ” To make a decision of legal, as direct, medical diagnosis or mortgage assessments rule Require AI to explain their decisions.

Imagine a financial institution using a large language model (LLM) for fraud detection – Details can explain a customer rejected credit application as required by law. Or optimizing supply chains that optimize a production firm – why the EU can unlock the effectiveness of a particular supplier and prevent unexpected swelling.

Therefore, Amodi explains, “It is decreased on anthropic interpretation, and by 2027” to reliably detect most model problems “.”

For this purpose, anthropic recently joined $ 50 million investment in BenevolentAI research laboratory is making progress on AI “brain scans”. Their model inspection platform is an agnostic tool that identifies the concepts of learned in models and managing users in models. At the end dumoThe company, the emergence of an image, showed how to recognize individual visual concepts within AI and then showed users paint These concepts over a canvas to create new pictures following the user’s design.

Anthropik’s investment in the EMBER is difficult to develop interpretable models, and anthrop is not the lack of workforce to make interpretation. Creative interpretable models require new tools and skilled developers to build them

Wider context: Perspective of the AI ​​researcher

To split Amodei’s perspective and add a very needed context, VentureBeat met with the EU security researcher Kapoor in Princeton. As the author of the Kapoor book Ai snake oilCritical examination of inflated claims covering the capabilities of leading AI models. He is both the author “As AI Normal Technology“The AI ​​advises the real prospect of integration into daily systems for treatment as a means of transformation as a means of standard, internet or electricity.

Kapoor does not argue that interpretation is valuable. However, it is suspected of treatment as the central column of AI adaptation. “It’s not a silver bullet,” said Kapoor Venturebeat said. The most effective security methods such as the filter after the response said it does not require the opening of the model.

We also warn that the researchers call the “failure of the layer” – if we do not fully understand the internals of a system, we cannot use or adjust responsibly. In practice, most transparency is not assessed by most technologies. It is important that a system is reliably fulfilled in real circumstances.

This is not the first time about the risks of increasing our understanding of Amodii. In October 2024 post“Moving Grace Machines” sketched the vision of highly skilled models that can do meaningful real world events.

According to Kapoor, there is an important difference between a model here ability and her power. Model capabilities are undoubtedly growing rapidly and today it can prepare enough intelligence to find solutions to many complex problems that are difficult today. However, a model is strong as the interfaces we provide to interact only with the real world, including where and how to place models.

Amodii also argued that the United States should be a partial leadership in the development of the United States Export departments restricts access to strong models. The idea is that authoritarian governments can be used by the borderless EU systems or to seize the geopolitical and economic edge together with the placement.

For Kapoor, “The largest parties in export control agree to give us even a year or two years.” Thinks we should act like AI “Normal technology“Like electric or construction, for both technologies of both technologies, both technologies have been fully implemented for decades.

Others criticize Amodei

Kapoor is not the position of Amodei who criticizes. Last week in Vivatech in Paris, Jansen Huang, CEO of NVIDIA, declared disagreement With Amodei’s ideas. Huang has questioned the fact that the authority to develop AI is not limited to several powerful institutions such as anthropic. He said, “If you want work to be reliable and responsible, do it clearly … do it in a dark room and say it’s safe.”

In response to this, anthropic declared: “Dario has never claimed that only an anthropic ‘can establish a safe and powerful AI. As the public records are provided, Dario and politicians have defended for the National Transparency Standard (anthropic).”

It should be noted that the anthropic is not alone with interpretation: Neel Nanda’s Deepmind Details team was also led by Nanda serious contributions to an interpretation study.

As a result, the best AI laboratories and researchers provide a strong evidence that the competitive AI market can be a major evidence. Facilities that prefer interpretation can have an important competition by setting up early, reliable, appropriate and adaptable AI systems.


[ad_2]
Source link

Leave a Reply

Your email address will not be published. Required fields are marked *