ARC-AGI-2: The New Benchmark That Redefines AI Reasoning
This week brought the much-anticipated release of ARC-AGI-2, François Chollet's benchmark for artificial general intelligence.
ARC-AGI is always on my radar - not just because it's challenging, but because Chollet’s views often align with my own. He argues that genuine reasoning requires symbolic representation: a discrete structure that autoregressive models - like the first generation of large language models - can systematically explore. He refers to this as “discrete program search” and speculates that models like OpenAI's o3 use natural language as an executable “program,” explicitly mapped out through chain-of-thought reasoning. If true, this would mean they are no longer purely autoregressive.
ARC-AGI-2 raises the bar dramatically. The non-reasoning models now score essentially zero. The benchmark includes 1,000 training tasks and various public and private evaluation sets - tasks that humans solve intuitively (with an average accuracy of around 60%) but which have so far left even state-of-the-art reasoning models struggling. Initial results show top models hovering around 1%, highlighting AI's ongoing limitations in abstraction, composition, and novel problem-solving.
My instinct is that models may saturate this benchmark within the year - but not by simply throwing more compute at it. Rightly or wrongly, the benchmark now explicitly penalises brute-force test-time compute. Instead, progress may depend on “coarse-graining” up to higher-level symbolic representations - projecting problems into abstract forms to outline general solutions, then mapping those back down into autoregressive continuations.
My perspective is always shaped by what this means for organisations trying to engage with AI. Most won’t be pushing the frontier of reasoning research. Instead, the real opportunity lies in proactively restructuring data to make it compatible with these advanced reasoning approaches.
In my view, this comes down to two main actions: enhancing data connectivity and organising data through meaningful abstractions - precisely what Knowledge Graphs are designed for.
Combining advanced reasoning models with Knowledge Graphs and ontologies creates a powerful neural-symbolic loop. The discrete, symbolic structure of a Knowledge Graph provides the precise, connected data network needed for symbolic reasoning within an organisation. The ontology helps formalise the most meaningful abstractions within that domain-specific network. The reasoning models help build the network, and then, in turn, use that network to answer questions accurately.
ARC-AGI-2 isn't just another benchmark - it’s a timely reminder of how far AI still has to go before reaching AGI, and a signpost pointing to what organisations need to do now to prepare for what’s coming.
⭕ Machine Learning Street Talk: https://www.youtube.com/watch?v=M3b59lZYBW8
⭕ ARC Post: https://www.linkedin.com/posts/tonyseale_the-abstraction-and-reasoning-challenge-activity-7227584450614734849-7DW9/