Meet us at VIVE 2026 — Feb 22-25 in Los Angeles. Learn more →

Meet us at VIVE 2026 — Feb 22-25 in Los Angeles. Learn more →

Bechmarking AI Vision with Puzzles

Bechmarking AI Vision with Puzzles

Matt Abate

Head of AI Research

May 13, 2025

[Research]
← Back to Blog

Benchmarking AI Vision with Puzzles

Vision models are a core component in our agent flow. Ana uses vision to inspect every figure she generates, and to learn business context from user datasets and dashboards.

As part of a wider assessment of AI vision, we recently benchmarked 5 multimodal AI models from OpenAI, Google and Anthropic on their ability to solve puzzles provided as image files. The benchmark was constructed from twitter posts, using 10 accounts that post daily puzzles. The puzzles mainly feature:

  • simple algebra and linear algebra questions,
  • matchstick puzzles,
  • trigonometry and geometry questions,
  • spot-the-pattern puzzles, and
  • chess puzzles.
Benchmark Result
Model accuracy across puzzle types
Example Puzzle
Example puzzle

The model that performed best: Google’s gemini-2.5-pro-preview which answered 73 out of 75 questions correctly on its first try.

Click here for the dataset, which is now public through Kaggle.