Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

AI Sucks at Reading Clocks


In these days, artificial intelligence can create photorealistic footage, novel, your homework and even Predict protein structures. At the same time, the new research shows that it often fails in a very basic position: time speaks.

Researchers at the University of Edinburgh tested seven-well-known multimodal large language model – Trying different types of media-type EU to answer clock questions or questions based on various pictures of calendars. Their research, in April and Currently the landlord On the preprint server, the archive demonstrates difficulties in these main positions.

“The interpretation of the time of visual entries and the case is important for the application of transfer plan to the autonomous systems of transfer planning for many real world applications,” researchers wrote during research. “Multimodal Language Models (MILLMS), most of the work, the detection of the facility, image title or stage concept, not temporary results, focused on temporal failures.”

The team tried Openai’s GPT-4O and GPT-O1; Google Deepmind’s Twins 2.0; Anthropic’s Claude 3.5 Sonnet; 3.2-11b-vision instruction in the methane Llama; Alibaba’s Qwen2-VL7B instruction; and ModelBest MinicPM-V-2.6. The models are missing some seconds of analog clock-timekeeper, which are different dial-time-timeekeeper, different dial colors, even some seconds of calendar images.

Researchers asked LLMS for clock imagesHat time is displayed in this image in the hour? For calendar pictures, researchers asked simple questions, wIs the week of the week’s hat day New Year’s Day? and more harsh inquiries including wHat is the 153rd day of the year?

“Analog clock-reading and calendar concept covers complex cognitive steps: delicate, delicate visual recognition (for example, the open-position, day cell layout) and non-trivial numerical justification (eg, day offsets) explained.

In general, the AI ​​systems did not perform well. In analog hours, they read less than 25% of the time. The Roman figures and stylized hands have struggled with their watches, which are not a few seconds, and indicate that the issue can discover the hands and interpret the clock and clock.

Google’s Gemini-2.0 hit the highest in the team’s clock, and GPT-O1 was accurate as a calendar, 80% of the time gave better results than their opponents. However, the most successful MILL in the calendar was still 20% of the time.

“Most people can tell their time and use the calendars from an early age. Our findings are what the EU is very basic for people,” Rohit Saksena, who is the author of the students of the School of Informatics School, said at a university statement. “If AI systems, table, automation and auxiliary technologies, these shortcomings should be resolved, such as time-sensitive, real-world applications.”

Thus, although the AI ​​can complete your homework, do not count on any deadline.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *