Back to Archive

AutoBench Run 5 - December 2025

Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, DeepSeek 3.2 Speciale and more

Latest
Date
December 16, 2025
Version
2025-12-16
Models
35
New Models
13

Run data

Model
AutoBenchChatbot Ar.AAI IndexMMLU Index
4.48 (#1)-72 (#2)0.87 (#4)
4.43 (#2)---
4.41 (#3)1492 (#1)73 (#1)0.9 (#1)
4.39 (#4)1470 (#3)70 (#3)0.9 (#2)
4.38 (#5)1457 (#4)70 (#4)0.87 (#5)
4.32 (#6)1429 (#7)67 (#5)0.85 (#9)
4.3 (#7)1450 (#6)63 (#9)0.88 (#3)
4.29 (#8)1451 (#5)60 (#12)0.86 (#7)
4.29 (#9)1392 (#18)64 (#8)0.84 (#12)
4.21 (#10)-64 (#7)0.85 (#10)
4.2 (#12)1397 (#16)57 (#14)0.84 (#13)
4.2 (#11)1478 (#2)65 (#6)0.87 (#6)
4.18 (#13)1352 (#23)61 (#11)0.81 (#22)
4.17 (#15)1402 (#15)55 (#16)0.76 (#28)
4.17 (#14)1408 (#14)51 (#21)0.84 (#14)
4.14 (#16)1418 (#9)59 (#13)0.86 (#8)
4.13 (#17)1425 (#8)56 (#15)0.83 (#16)
4.12 (#18)1395 (#17)52 (#18)0.85 (#11)
4.11 (#20)1414 (#12)52 (#19)0.84 (#15)
4.11 (#19)1416 (#10)50 (#23)0.82 (#18)
4.06 (#22)1334 (#27)47 (#25)0.81 (#23)
4.06 (#21)1339 (#26)51 (#22)0.77 (#27)
4.03 (#23)1367 (#22)54 (#17)0.82 (#19)
3.99 (#24)1345 (#24)61 (#10)0.82 (#20)
3.98 (#25)1374 (#20)45 (#26)0.83 (#17)
3.95 (#26)1378 (#19)40 (#28)0.81 (#24)
3.94 (#27)1415 (#11)38 (#29)0.81 (#25)
3.88 (#28)-38 (#30)0.74 (#30)
3.86 (#29)1370 (#21)49 (#24)0.82 (#21)
3.81 (#30)1411 (#13)35 (#32)0.68 (#33)
3.78 (#32)1318 (#28)52 (#20)0.75 (#29)
3.78 (#31)1340 (#25)45 (#27)0.81 (#26)
3.57 (#33)-28 (#34)0.64 (#34)
3.5 (#34)-37 (#31)0.74 (#31)
3.47 (#35)-32 (#33)0.73 (#32)
AutoBench Run 5 - December 2025 - AutoBench