A new, challenging AGI test stumps most AI models

[ad_1]

ARC Award Foundation was announced in a non-profit, built by Ordinary Researcher François Chollet Blog Post On Monday, it created a new, difficult test to measure the general exploration of leading AI models.

So far, a new test called Arc-Aggi-2 has stained the most models.

AI’s O1 Pro and DeepSEEK’s R1 account “Reasoning” AI models are between 1% and 1.3% in ARC-AGI-2 Arc reward leader board. GPT-4.5, KLOD 3.7 Sonnet and strong silly models including Gemini 2.0 Flash account.

ARC-Aggi tests consist of problems such as puzzle that EU has to identify visual patterns from the collection of different colored squares and create the correct “Answer” network. Problems are designed to force the EU to adapt to new challenges that it is previously visible.

The ARC Award Fund has received ARC-Aggi-2 to create a person’s foundation of more than 400 people. On average, these people’s “panels” are better than 60% of the tests of the test – better than the scores of models.

Example Question from Arc-Aggi-2 (Credit: Arc Award).

One X in xChollet claims the ARC-AGI-2, a better measure of the test of the test of the AI model, the first size of the test, Arc-Aggi-1. The ARC Award tests are directed to assess the AI system can get new skills outside the information he studied.

Unlike Chollet, unlike ARC-AGI-1, the new test prevents the “cruel force” of AI models – the extensive calculation power – find solutions. Chollet previously admitted This was the main defect of the ARC-AGI-1.

The Arc-Aggi-2 is presented to solve the defects of the first test: efficiency. It also requires the malls to comment on the fly instead of trusting in remembering.

“Intelligence does not only define the ability to solve problems or achieve high results,” The ARC Award Foundation Union wrote Greg Kamradt Blog Post. “These opportunities are achieved and placed in effectiveness, solvent, a certain component. The main question can only get ‘the EU [the] Ability to solve a job? ‘Both,’ What’s the efficiency or value? ‘ “

Arc-AGI-1, until December 2024, is about five years when Openai was released Developed Model, O3All other AI models and evaluations preferred the whole performance of the person. But as we mentioned then, O3’s performance earnings in Arc-Aggi-1 came with a heavy price tag.

The version of Openai’s version of O3 – O3 (low) – 75.7% to reach new heights in Arc-AGI-1, 75.7% in the test, 4% measured using 200 valuing power for an assignment in Arc-Aggi-2.

The comparison of Frontier AI model performance in the ARC-AGI-1 and Arc-Aggi-2 (Credit: Arc Award).

The arrival of the ARC-AGI-2, many of the technological industry comes, calls new, unsaturated criteria to measure AI progress. The co-founder of the face, Hugs Thomas Wolf, recently told Techcrunch There are no sufficient tests to measure the main signs of artificial general intelligence in the AI industryincluding creativity.

Along with the new assessment, the ARC Award Foundation has been announced A new arc reward 2025 contestDifficult developers will reach 85% of the ARC-AGI-2 test while spending only $ 0.42 for one task.

[ad_2]

Source link

A new, challenging AGI test stumps most AI models

Leave a ReplyCancel Reply

Father of Montreal Girl who found dead in NY accused of murder 2

Weekly Stock List

Google shows off the Pixel 10 less than a month before its launch

Leave a ReplyCancel Reply

Trending now

Father of Montreal Girl who found dead in NY accused of murder 2

Weekly Stock List

Google shows off the Pixel 10 less than a month before its launch