Set as Homepage - Add to Favorites

【eroticism and creativity】

Source：Evergreen Information Network Editor：Social Good Time：2025-06-27 04:14:28

Google,eroticism and creativity OpenAI, DeepSeek, et al. are nowhere near achieving AGI (Artificial General Intelligence), according to a new benchmark.

The Arc Prize Foundation, a nonprofit that measures AGI progress, has a new benchmark that is stumping the leading AI models. The test, called ARC-AGI-2 is the second edition ARC-AGI benchmark that tests models on general intelligence by challenging them to solve visual puzzles using pattern recognition, context clues, and reasoning.

This Tweet is currently unavailable. It might be loading or has been removed.

According to the ARC-AGI leaderboard, OpenAI's most advanced model o3-low scored 4 percent. Google's Gemini 2.0 Flash and DeepSeek R1 both scored 1.3 percent. Anthropic's most advanced model, Claude 3.7 with an 8K token limit (which refers to the amount of tokens used to process an answer) scored 0.9 percent.

You May Also Like

SEE ALSO: How Grok 3 compares to ChatGPT, DeepSeek and other AI rivals

The question of how and when AGI will be achieved remains as heated as ever, with various factions bickering about the timeline or whether it's even possible. Anthropic CEO Dario Amodei said it could take as little as two to three years, and OpenAI CEO Sam Altman said "it's achievable with current hardware." But experts like Gary Marcus and Yann LeCun say the technology isn't there yet and it doesn't take an expert to see how fueling AGI hype is advantageous to AI companies seeking major investments.

Mashable Light Speed Want more out-of-this world tech, space and science stories? Sign up for Mashable's weekly Light Speed newsletter. By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy. Thanks for signing up!

The ARC-AGI benchmark is designed to challenge AI models beyond specialized intelligence by avoiding the memorization trap — spewing out PhD-level responses without an understanding of what it means. Instead it focuses on puzzles that are relatively easy for humans to solve because of our innate ability to take in new information and make inferences, thus revealing gaps that can't be resolved by simply feeding AI models more data.

"Intelligence requires the ability to generalize from limited experience and apply knowledge in new, unexpected situations. AI systems are already superhuman in many specific domains (e.g., playing Go and image recognition)" read the announcement.

SEE ALSO: I compared Sesame to ChatGPT voice mode and I'm unnerved

"However, these are narrow, specialized capabilities. The 'human-ai gap' reveals what's missing for general intelligence - highly efficiently acquiring new skills."

To get a sense of AI models' current limitations, you can take the ARC-AGI test for yourself. And you might be surprised by its simplicity. There's some critical thinking involved, but the ARC-AGI test wouldn't be out of place next to the New York Timescrossword puzzle, Wordle, or any of the other popular brain teasers. It's challenging but not impossible and the answer is there in the puzzle's logic, which is something the human brain has evolved to interpret.

OpenAI's o3-low model scored 75.7 percent on the first edition of ARC-AGI. By comparison, its 4 percent score on the second edition shows how difficult the test is, but also how there's a lot more work to be done with reaching human level intelligence.

Topics Google OpenAI

1
2
3
4
5
6
7
8
9
10
11

Previous：PlayerUnknown's Battlegrounds Mini

Next：Here's how I feel about all this Stephen Hawking 'news' going around

Related Articles

Related Recommendations

Categories

Latest Articles

Popular Articles

Hot Recommendations

Featured Column

Quick Links

YouTube blocked North Korean government’s channel Troll sends strobing GIF to journalist with epilepsy, triggers seizure KFC Japan offers two RIP Craig Sager: Basketball world mourns an NBA fan favorite If you lose one AirPod, you can get a new one for $69 Facebook finally cracks down on fake news Airline skewers vlogger over stowaway video they say is 100% fake Now you can ask Alexa if what happens in Vegas stays in Vegas Troll sends strobing GIF to journalist with epilepsy, triggers seizure AirPods are reportedly coming to Apple's retail stores on Dec. 19 'Overwatch,' 'CS:GO' and everything you need to watch in esports this weekend 'Rogue One' early box office estimates have cleared the planet 'DuckTales' reboot casts David Tennant Facebook proves, yet again, that it's terrible at math Vanity Fair is riding high on its Trump feud Nasty Women's Choir: The hilarious Christmas carolers we need at the end of 2016 'DuckTales' reboot casts David Tennant We're all living in 'Rogue One' now So much complaining: 'Super Mario Run' will destroy your data and drain your wallet The 10 best esports moments of 2016 NASA spacecraft spots dead robot on Mars surface BYD ramps up EV push in India with launch of Sealion 7 · TechNode Scientists found an incandescent planet. It's 'constantly exploding.' Solar eclipse 2024: The best internet reactions and memes Dating app fatigue has led to a flurry of IRL singles events Target Circle: 20% off for teachers & college students Voyager spacecraft gave us a scare. But NASA's bringing it back to life. NASA's Voyager is in hostile territory. It's 'dodging bullets.' Pluto's 'heart' is yet another bummer for the dwarf planet Netflix’s 'Supacell' turns stereotypes on their head Wordle today: The answer and hints for July 6 TikTok announces restoration of US services · TechNode What not to do during the imminent 2024 solar eclipse NASA asks: Can anyone help us get our Mars samples back? Apple approves Epic Games Store in Europe, but not without some drama first 'House of the Dragon' Season 2, episode 4: Who is Alyn? New Mars images show the Red Planet's 'Inca City' 'The Sims 4' adds polyamory in its Lovestruck Expansion Pack Wordle today: The answer and hints for July 7 How digital driver's licenses work

3.701s , 8288.015625 kb

Copyright © 2025 Powered by 【eroticism and creativity】,Evergreen Information Network

Top