Back to Archive
AutoBench Run 5 - December 2025
Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, DeepSeek 3.2 Speciale and more
Latest
Date
December 16, 2025
Version
2025-12-16
Models
35
New Models
13
Run data
Model | Average (All Topics) | Coding | Creative Writing | Current News | General Culture | Grammar | History | Logics | Math | Science | Technology |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 4.48 (#1) | 4.37 (#1) | 4.49 (#2) | 4.56 (#1) | 4.69 (#2) | 4.52 (#1) | 4.59 (#2) | 4.32 (#1) | 4.29 (#2) | 4.39 (#9) | 4.55 (#3) | |
| 4.43 (#2) | 4.3 (#3) | 4.44 (#3) | 4.54 (#2) | 4.4 (#20) | 4.37 (#6) | 4.6 (#1) | 4.18 (#2) | 4.26 (#3) | 4.55 (#2) | 4.59 (#1) | |
| 4.41 (#3) | 4.23 (#5) | 4.35 (#7) | 4.51 (#4) | 4.7 (#1) | 4.2 (#14) | 4.56 (#3) | 4.12 (#5) | 4.3 (#1) | 4.51 (#3) | 4.48 (#10) | |
| 4.39 (#4) | 4.33 (#2) | 4.37 (#5) | 4.17 (#23) | 4.65 (#4) | 4.48 (#3) | 4.5 (#6) | 4.14 (#4) | 4.21 (#5) | 4.57 (#1) | 4.52 (#6) | |
| 4.38 (#5) | 4.19 (#6) | 4.34 (#8) | 4.46 (#6) | 4.68 (#3) | 4.5 (#2) | 4.44 (#11) | 4.08 (#6) | 4.13 (#6) | 4.49 (#4) | 4.49 (#9) | |
| 4.32 (#6) | 4.03 (#11) | 4.37 (#6) | 4.46 (#7) | 4.6 (#6) | 4.28 (#10) | 4.54 (#4) | 3.81 (#13) | 4.13 (#7) | 4.23 (#20) | 4.56 (#2) | |
| 4.3 (#7) | 4.12 (#8) | 4.5 (#1) | 4.48 (#5) | 4.57 (#7) | 4.44 (#4) | 4.42 (#12) | 3.57 (#19) | 3.87 (#16) | 4.48 (#5) | 4.55 (#4) | |
| 4.29 (#8) | 4.02 (#12) | 4.15 (#15) | 4.37 (#12) | 4.43 (#16) | 4.37 (#5) | 4.4 (#13) | 4.02 (#7) | 4.24 (#4) | 4.45 (#8) | 4.38 (#17) | |
| 4.29 (#9) | 4.18 (#7) | 4.39 (#4) | 4.45 (#8) | 4.48 (#11) | 4.29 (#9) | 4.29 (#23) | 3.9 (#8) | 4.05 (#8) | 4.32 (#11) | 4.52 (#7) | |
| 4.21 (#10) | 3.88 (#19) | 4.2 (#12) | 4.33 (#14) | 4.43 (#17) | 4.18 (#15) | 4.38 (#15) | 3.71 (#16) | 3.89 (#14) | 4.48 (#6) | 4.39 (#16) | |
| 4.2 (#11) | 4.11 (#9) | 4.2 (#11) | 4.23 (#18) | 4.26 (#26) | 4.2 (#13) | 4.34 (#21) | 4.18 (#3) | 3.81 (#18) | 4.3 (#13) | 4.33 (#19) | |
| 4.2 (#12) | 3.7 (#24) | 4.22 (#10) | 4.19 (#21) | 4.57 (#8) | 4.24 (#11) | 4.44 (#10) | 3.88 (#9) | 3.78 (#20) | 4.26 (#18) | 4.46 (#11) | |
| 4.18 (#13) | 4.26 (#4) | 4.13 (#17) | 4.05 (#29) | 4.21 (#29) | 4.04 (#24) | 4.34 (#20) | 3.81 (#12) | 3.89 (#13) | 4.45 (#7) | 4.51 (#8) | |
| 4.17 (#14) | 4.01 (#13) | 4.05 (#20) | 4.37 (#13) | 4.21 (#28) | 4.31 (#7) | 4.29 (#25) | 3.83 (#11) | 4.02 (#9) | 4.12 (#27) | 4.44 (#13) | |
| 4.17 (#15) | 3.9 (#18) | 4.18 (#13) | 4.42 (#10) | 4.52 (#9) | 4.3 (#8) | 4.48 (#7) | 3.74 (#14) | 3.53 (#25) | 4.16 (#25) | 4.54 (#5) | |
| 4.14 (#16) | 4.1 (#10) | 3.72 (#30) | 4.17 (#22) | 4.29 (#24) | 4.24 (#12) | 4.38 (#17) | 3.56 (#20) | 3.96 (#11) | 4.34 (#10) | 4.31 (#21) | |
| 4.13 (#17) | 3.95 (#15) | 4.15 (#14) | 4.11 (#27) | 4.47 (#12) | 4.11 (#18) | 4.29 (#24) | 3.45 (#23) | 4.01 (#10) | 4.29 (#14) | 4.33 (#20) | |
| 4.12 (#18) | 3.71 (#22) | 4.07 (#19) | 4.37 (#11) | 4.44 (#15) | 4.11 (#19) | 4.45 (#9) | 3.55 (#21) | 3.65 (#23) | 4.31 (#12) | 4.43 (#14) | |
| 4.11 (#19) | 3.74 (#20) | 4 (#21) | 4.51 (#3) | 4.44 (#14) | 4.14 (#16) | 4.32 (#22) | 3.48 (#22) | 3.72 (#21) | 4.2 (#22) | 4.39 (#15) | |
| 4.11 (#20) | 3.61 (#26) | 4.26 (#9) | 4.45 (#9) | 4.65 (#5) | 4.07 (#22) | 4.51 (#5) | 3.4 (#25) | 3.4 (#27) | 4.28 (#17) | 4.44 (#12) | |
| 4.06 (#21) | 3.98 (#14) | 3.85 (#25) | 4.28 (#16) | 4.29 (#25) | 3.91 (#28) | 4.18 (#30) | 3.61 (#18) | 3.87 (#15) | 4.23 (#21) | 4.23 (#28) | |
| 4.06 (#22) | 3.74 (#21) | 3.88 (#24) | 4.25 (#17) | 4.26 (#27) | 4.08 (#21) | 4.38 (#16) | 3.04 (#32) | 3.64 (#24) | 4.28 (#16) | 4.34 (#18) | |
| 4.03 (#23) | 3.7 (#23) | 3.74 (#29) | 4.05 (#31) | 4.4 (#19) | 4.13 (#17) | 4.22 (#28) | 3.73 (#15) | 3.79 (#19) | 4.24 (#19) | 4.2 (#30) | |
| 3.99 (#24) | 3.46 (#28) | 3.65 (#31) | 4.33 (#15) | 4.3 (#23) | 4.01 (#26) | 4.2 (#29) | 3.34 (#26) | 3.94 (#12) | 4.14 (#26) | 4.29 (#24) | |
| 3.98 (#25) | 3.65 (#25) | 3.84 (#27) | 4.05 (#30) | 4.32 (#22) | 3.96 (#27) | 4.26 (#26) | 3.27 (#27) | 3.7 (#22) | 4.29 (#15) | 4.27 (#25) | |
| 3.95 (#26) | 3.92 (#17) | 4.09 (#18) | 4.2 (#20) | 4.15 (#31) | 3.78 (#29) | 4.37 (#18) | 3.27 (#28) | 3.33 (#28) | 4.05 (#28) | 4.29 (#23) | |
| 3.94 (#27) | 3.54 (#27) | 3.89 (#23) | 4.13 (#25) | 4.45 (#13) | 4.11 (#20) | 4.39 (#14) | 3.41 (#24) | 3.11 (#29) | 4.01 (#29) | 4.3 (#22) | |
| 3.88 (#28) | 3.27 (#31) | 4.14 (#16) | 4.15 (#24) | 4.49 (#10) | 4.04 (#23) | 4.35 (#19) | 3.17 (#30) | 2.84 (#34) | 4.16 (#24) | 4.22 (#29) | |
| 3.86 (#29) | 3.4 (#29) | 3.63 (#32) | 4.04 (#32) | 4.4 (#18) | 3.61 (#31) | 4.16 (#31) | 3.69 (#17) | 3.47 (#26) | 3.9 (#31) | 4.25 (#26) | |
| 3.81 (#30) | 3.02 (#33) | 3.94 (#22) | 4.21 (#19) | 4.07 (#32) | 4.03 (#25) | 4.47 (#8) | 3.2 (#29) | 2.89 (#32) | 3.98 (#30) | 4.25 (#27) | |
| 3.78 (#31) | 3.29 (#30) | 3.85 (#26) | 4.13 (#26) | 4.35 (#21) | 3.62 (#30) | 4.26 (#27) | 3.11 (#31) | 3.1 (#30) | 3.9 (#32) | 4.14 (#31) | |
| 3.78 (#32) | 3.93 (#16) | 3.82 (#28) | 3.41 (#35) | 3.78 (#35) | 3.52 (#32) | 3.4 (#35) | 3.87 (#10) | 3.82 (#17) | 4.18 (#23) | 4.07 (#33) | |
| 3.57 (#33) | 2.95 (#34) | 3.57 (#33) | 4.1 (#28) | 4.16 (#30) | 3.24 (#34) | 3.97 (#33) | 2.94 (#33) | 2.78 (#35) | 3.78 (#35) | 4.1 (#32) | |
| 3.5 (#34) | 3.07 (#32) | 3 (#35) | 3.92 (#33) | 4.04 (#33) | 2.97 (#35) | 3.92 (#34) | 2.86 (#34) | 3.07 (#31) | 3.78 (#34) | 4.01 (#34) | |
| 3.47 (#35) | 2.84 (#35) | 3.55 (#34) | 3.73 (#34) | 3.96 (#34) | 3.34 (#33) | 3.98 (#32) | 2.55 (#35) | 2.85 (#33) | 3.82 (#33) | 3.98 (#35) |