Back to Archive
AutoBench Run 5 - December 2025
Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, DeepSeek 3.2 Speciale and more
Latest
Date
December 16, 2025
Version
2025-12-16
Models
35
New Models
13
Run data
Model | AutoBench | Chatbot Ar. | AAI Index | MMLU Index |
|---|---|---|---|---|
| 4.48 (#1) | - | 72 (#2) | 0.87 (#4) | |
| 4.43 (#2) | - | - | - | |
| 4.41 (#3) | 1492 (#1) | 73 (#1) | 0.9 (#1) | |
| 4.39 (#4) | 1470 (#3) | 70 (#3) | 0.9 (#2) | |
| 4.38 (#5) | 1457 (#4) | 70 (#4) | 0.87 (#5) | |
| 4.32 (#6) | 1429 (#7) | 67 (#5) | 0.85 (#9) | |
| 4.3 (#7) | 1450 (#6) | 63 (#9) | 0.88 (#3) | |
| 4.29 (#8) | 1451 (#5) | 60 (#12) | 0.86 (#7) | |
| 4.29 (#9) | 1392 (#18) | 64 (#8) | 0.84 (#12) | |
| 4.21 (#10) | - | 64 (#7) | 0.85 (#10) | |
| 4.2 (#12) | 1397 (#16) | 57 (#14) | 0.84 (#13) | |
| 4.2 (#11) | 1478 (#2) | 65 (#6) | 0.87 (#6) | |
| 4.18 (#13) | 1352 (#23) | 61 (#11) | 0.81 (#22) | |
| 4.17 (#15) | 1402 (#15) | 55 (#16) | 0.76 (#28) | |
| 4.17 (#14) | 1408 (#14) | 51 (#21) | 0.84 (#14) | |
| 4.14 (#16) | 1418 (#9) | 59 (#13) | 0.86 (#8) | |
| 4.13 (#17) | 1425 (#8) | 56 (#15) | 0.83 (#16) | |
| 4.12 (#18) | 1395 (#17) | 52 (#18) | 0.85 (#11) | |
| 4.11 (#20) | 1414 (#12) | 52 (#19) | 0.84 (#15) | |
| 4.11 (#19) | 1416 (#10) | 50 (#23) | 0.82 (#18) | |
| 4.06 (#22) | 1334 (#27) | 47 (#25) | 0.81 (#23) | |
| 4.06 (#21) | 1339 (#26) | 51 (#22) | 0.77 (#27) | |
| 4.03 (#23) | 1367 (#22) | 54 (#17) | 0.82 (#19) | |
| 3.99 (#24) | 1345 (#24) | 61 (#10) | 0.82 (#20) | |
| 3.98 (#25) | 1374 (#20) | 45 (#26) | 0.83 (#17) | |
| 3.95 (#26) | 1378 (#19) | 40 (#28) | 0.81 (#24) | |
| 3.94 (#27) | 1415 (#11) | 38 (#29) | 0.81 (#25) | |
| 3.88 (#28) | - | 38 (#30) | 0.74 (#30) | |
| 3.86 (#29) | 1370 (#21) | 49 (#24) | 0.82 (#21) | |
| 3.81 (#30) | 1411 (#13) | 35 (#32) | 0.68 (#33) | |
| 3.78 (#32) | 1318 (#28) | 52 (#20) | 0.75 (#29) | |
| 3.78 (#31) | 1340 (#25) | 45 (#27) | 0.81 (#26) | |
| 3.57 (#33) | - | 28 (#34) | 0.64 (#34) | |
| 3.5 (#34) | - | 37 (#31) | 0.74 (#31) | |
| 3.47 (#35) | - | 32 (#33) | 0.73 (#32) |