Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

This Tool Probes Frontier AI Models for Lapses in Intelligence


Executors artificial intelligence Companies may I want to tell us this AGI It is almost here, but the latest models still need additional tutoring to help them become as smart as they can.

A company that played a key role in helping the protective AI companies build developed models has developed a platform that can automatically test a model from thousands of benchmarks and tasks. Of course, it will supply the required information.

The scale has risen to glory, which provides human labor for teaching and testing developed AI models. Large Language Models (LLS) are taught to the oodles of the text from the books, the Internet and other sources. These models make additional “Post Training” in the form of people who give feedback on a model’s access to a model of a model, convert useful, consistent and well-managed chatbots.

Scale employees on models for problems and restrictions. The new vehicle called the scale appraisal automates a portion of this work using its own machine learning algorithms.

“Inside large labs, all this haphall roads to watch the weaknesses of the whole model,” said Daniel Berrios, Director of Product Director for scale evaluation. New vehicle “is a way [model makers] Use it to target information campaigns to develop the events to develop and understand the results and understand where a model is a model. “

Berrios says several border AI model companies use the tool. He says most of them use to increase the best models. The AI ​​provision covers a model that is trying to break a problem to the parts that occur to solve more effectively. The approach is very relied by users to post-education to determine whether the model does not solve a problem correctly.

In an example, Berrios says that the scale evaluation has fed the fact that a model has fallen by non-English language. “While [the model’s] The general purpose of thinking was very good, and the criteria did good work, it was tended to worsen when the instructions were not in English, “he said.

In recent months, the scale helped the development of several new criteria developed to be smarter and carefully to keep their behavior. This includes Enigmaeval, Multilateral, Maskand The final examination of humanity.

On the scale, he said that it was more difficult to measure progress in AI models, but it was more difficult for them to get better in existing tests. The company says its new instrument provides a more comprehensive picture to develop a more comprehensive image by combining many different criteria and is a more comprehensive picture to be used to prepare specific tests of a model. The size of its size can accept a certain problem and create more example, allows a more comprehensive test of the model’s skills.

The company may also make efforts to test AI models for misbehavior. Some researchers say that there is no standardization Some model jailbreaks are not disclosed.

In February, the US Institute of National Standards and Technologies announced that it will help develop methodologies for models to ensure that they are safe and reliable.

What errors did you see in the results of generative AI instruments? What do you think is the greatest blind spots of the models? Let us know by email hello@wired.com or by commenting below.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *