BLOG

Insights, updates, and deep dives into LLM benchmarking, AI evaluation, and the latest trends in artificial intelligence.

Announcing AutoBench Agentic: The Next Generation Agentic Benchmark.

Based on LLM generated virtual agents, it handles countless agentic tasks to offer unbiased and granular LLM evaluation.

LeaderboardLLMBenchmarking
AutoBench TeamApril 20, 20269 min read
Read More

Introducing AutoBench 2.0: Our New Benchmarking Platform is Out Just in Time to Evaluate GPT 5.2.

We are also announcing our latest benchmark (Run 5) made possible by new platform features for more powerful and efficient benchmarking: Random Score Pooling, Nonlinear Weighting, Parallel Iteration.

LeaderboardLLMBenchmarking
AutoBench TeamDecember 17, 20255 min read
Read More

AutoBench Goes to the Farm with Evja: The First Ever Agronomy Benchmark. The Best Farmer LLM? OpenAI, but Mistral...

We teamed up with leading agritech company EVJA to drop the first-ever LLM benchmark dedicated to the agricultural sector. 40 models, 4 professional personas, and one major open-source surprise.

LeaderboardLLMBenchmarking
AutoBench TeamDecember 10, 20255 min read
Read More

AutoBench Run 4 is out with Gemini 3 Pro, Gpt 5.1, Grok 4.1 etc. And the winner is not who you expect

This run evaluated 33 models across over 300 iterations (generated questions) using 21 ranking models and generating over 220,000 individual rankings.

LeaderboardValidationLLM
AutoBench TeamNovember 28, 20255 min read
Read More

AutoBench Goes Scientific: Rigorous Validation for a Dynamic, Open-Source LLM Benchmark

We're thrilled to announce that AutoBench has moved from a promising open-source project to a scientifically validated framework, with our first paper published in collaboration with Sapienza University of Rome.

ResearchValidationLLM
AutoBench TeamOctober 29, 20255 min read
Read More