Back to Archive
AutoBench Run 5 - December 2025
Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, DeepSeek 3.2 Speciale and more
Latest
Date
December 16, 2025
Version
2025-12-16
Models
35
New Models
13
Run data
Model | Average (All Topics) | Coding | Creative Writing | Current News | General Culture | Grammar | History | Logics | Math | Science | Technology |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.05 (#1) | 0.05 (#1) | 0.03 (#1) | 0.05 (#3) | 0.05 (#3) | 0.04 (#1) | 0.05 (#2) | 0.08 (#1) | 0.05 (#1) | 0.04 (#1) | 0.05 (#1) | |
| 0.07 (#2) | 0.08 (#2) | 0.04 (#2) | 0.05 (#2) | 0.05 (#2) | 0.05 (#3) | 0.05 (#3) | 0.14 (#3) | 0.12 (#5) | 0.05 (#2) | 0.05 (#3) | |
| 0.08 (#3) | 0.15 (#6) | 0.06 (#7) | 0.04 (#1) | 0.04 (#1) | 0.05 (#2) | 0.05 (#1) | 0.17 (#6) | 0.16 (#6) | 0.05 (#3) | 0.05 (#2) | |
| 0.08 (#4) | 0.11 (#4) | 0.04 (#4) | 0.09 (#5) | 0.08 (#5) | 0.08 (#6) | 0.08 (#5) | 0.10 (#2) | 0.08 (#2) | 0.08 (#5) | 0.09 (#5) | |
| 0.09 (#5) | 0.11 (#3) | 0.05 (#6) | 0.08 (#4) | 0.07 (#4) | 0.08 (#4) | 0.08 (#4) | 0.15 (#5) | 0.12 (#4) | 0.08 (#4) | 0.07 (#4) | |
| 0.11 (#6) | 0.12 (#5) | 0.05 (#5) | 0.28 (#12) | 0.08 (#6) | 0.08 (#5) | 0.09 (#6) | 0.15 (#4) | 0.10 (#3) | 0.09 (#6) | 0.11 (#6) | |
| 0.18 (#7) | 0.30 (#9) | 0.12 (#11) | 0.10 (#6) | 0.10 (#7) | 0.11 (#7) | 0.11 (#7) | 0.38 (#8) | 0.37 (#10) | 0.14 (#7) | 0.11 (#7) | |
| 0.19 (#8) | 0.18 (#7) | 0.04 (#3) | 0.16 (#9) | 0.12 (#9) | 0.21 (#10) | 0.15 (#10) | 0.36 (#7) | 0.33 (#9) | 0.17 (#8) | 0.17 (#10) | |
| 0.21 (#9) | 0.28 (#8) | 0.08 (#8) | 0.13 (#8) | 0.11 (#8) | 0.17 (#8) | 0.12 (#8) | 0.47 (#9) | 0.45 (#11) | 0.17 (#9) | 0.13 (#8) | |
| 0.27 (#10) | 0.41 (#13) | 0.12 (#10) | 0.13 (#7) | 0.12 (#10) | 0.21 (#9) | 0.13 (#9) | 0.75 (#14) | 0.54 (#14) | 0.22 (#10) | 0.16 (#9) | |
| 0.32 (#11) | 0.55 (#15) | 0.11 (#9) | 0.20 (#10) | 0.20 (#11) | 0.22 (#12) | 0.21 (#11) | 0.76 (#15) | 0.69 (#16) | 0.22 (#11) | 0.21 (#11) | |
| 0.33 (#12) | 0.40 (#12) | 0.20 (#15) | 0.37 (#16) | 0.33 (#15) | 0.22 (#11) | 0.37 (#16) | 0.51 (#11) | 0.31 (#7) | 0.29 (#12) | 0.34 (#15) | |
| 0.34 (#13) | 0.38 (#11) | 0.25 (#16) | 0.25 (#11) | 0.24 (#12) | 0.40 (#15) | 0.26 (#12) | 0.58 (#13) | 0.49 (#12) | 0.30 (#13) | 0.26 (#12) | |
| 0.38 (#14) | 0.34 (#10) | 0.17 (#13) | 0.47 (#17) | 0.42 (#17) | 0.33 (#13) | 0.42 (#18) | 0.49 (#10) | 0.33 (#8) | 0.35 (#16) | 0.41 (#17) | |
| 0.47 (#15) | 0.67 (#16) | 0.68 (#22) | 0.32 (#14) | 0.29 (#13) | 0.44 (#16) | 0.28 (#13) | 1.03 (#17) | 0.63 (#15) | 0.33 (#15) | 0.28 (#13) | |
| 0.51 (#16) | 0.47 (#14) | 0.20 (#14) | 0.59 (#20) | 0.55 (#20) | 0.49 (#18) | 0.55 (#20) | 0.54 (#12) | 0.51 (#13) | 0.54 (#18) | 0.57 (#18) | |
| 0.54 (#17) | 0.82 (#17) | 0.13 (#12) | 0.32 (#13) | 0.34 (#16) | 0.38 (#14) | 0.33 (#15) | 1.02 (#16) | 1.27 (#19) | 0.32 (#14) | 0.34 (#14) | |
| 0.71 (#18) | 1.21 (#20) | 0.55 (#19) | 0.32 (#15) | 0.29 (#14) | 0.46 (#17) | 0.29 (#14) | 1.79 (#21) | 1.39 (#20) | 0.53 (#17) | 0.35 (#16) | |
| 0.75 (#19) | 1.00 (#18) | 0.61 (#20) | 0.55 (#19) | 0.52 (#19) | 0.77 (#20) | 0.49 (#19) | 1.23 (#18) | 1.06 (#18) | 0.71 (#20) | 0.59 (#19) | |
| 0.91 (#20) | 1.10 (#19) | 0.67 (#21) | 0.81 (#21) | 0.69 (#21) | 0.86 (#21) | 0.95 (#21) | 1.38 (#19) | 0.99 (#17) | 0.85 (#21) | 0.86 (#22) | |
| 0.99 (#21) | 1.88 (#23) | 0.30 (#17) | 0.48 (#18) | 0.43 (#18) | 0.50 (#19) | 0.40 (#17) | 2.02 (#22) | 2.31 (#23) | 0.67 (#19) | 0.71 (#20) | |
| 1.25 (#22) | 1.40 (#21) | 0.46 (#18) | 0.89 (#22) | 1.06 (#23) | 1.13 (#22) | 1.13 (#23) | 2.19 (#23) | 2.27 (#22) | 1.01 (#22) | 0.84 (#21) | |
| 1.30 (#23) | 1.68 (#22) | 0.76 (#23) | 1.24 (#24) | 1.21 (#24) | 1.24 (#23) | 1.14 (#24) | 1.49 (#20) | 1.55 (#21) | 1.24 (#23) | 1.25 (#23) | |
| 1.86 (#24) | 2.40 (#24) | 1.38 (#25) | 1.22 (#23) | 1.03 (#22) | 1.51 (#24) | 1.05 (#22) | 3.77 (#24) | 3.80 (#25) | 1.29 (#24) | 1.32 (#24) | |
| 2.12 (#25) | 3.80 (#25) | 1.18 (#24) | 1.36 (#25) | 1.22 (#25) | 1.83 (#25) | 1.22 (#25) | 3.86 (#25) | 3.68 (#24) | 1.54 (#25) | 1.50 (#25) | |
| 3.79 (#26) | 4.80 (#26) | 3.24 (#28) | 3.11 (#26) | 2.64 (#26) | 3.07 (#26) | 2.66 (#26) | 6.25 (#27) | 5.93 (#27) | 2.86 (#26) | 3.35 (#27) | |
| 3.94 (#27) | 7.55 (#27) | 4.10 (#29) | 3.17 (#27) | 3.10 (#27) | 3.59 (#27) | 2.76 (#27) | 4.36 (#26) | 4.76 (#26) | 3.77 (#27) | 3.26 (#26) | |
| 6.48 (#28) | 8.47 (#28) | 3.17 (#26) | 4.51 (#30) | 4.22 (#31) | 4.99 (#28) | 4.27 (#29) | 13.24 (#28) | 11.27 (#30) | 5.59 (#30) | 4.71 (#29) | |
| 6.85 (#29) | 9.64 (#29) | 4.15 (#30) | 4.07 (#28) | 4.04 (#30) | 5.29 (#29) | 3.84 (#28) | 16.87 (#31) | 12.02 (#31) | 5.09 (#28) | 4.23 (#28) | |
| 7.36 (#30) | 13.04 (#31) | 4.91 (#31) | 5.83 (#31) | 3.99 (#28) | 6.30 (#30) | 4.54 (#31) | 14.88 (#29) | 10.13 (#28) | 5.42 (#29) | 5.41 (#31) | |
| 8.12 (#31) | 13.01 (#30) | 3.17 (#27) | 4.24 (#29) | 4.01 (#29) | 6.73 (#31) | 4.29 (#30) | 19.90 (#33) | 18.17 (#33) | 5.78 (#31) | 5.27 (#30) | |
| 10.80 (#32) | 13.88 (#32) | 7.68 (#32) | 9.72 (#32) | 10.77 (#33) | 9.56 (#33) | 9.24 (#33) | 16.25 (#30) | 11.15 (#29) | 8.27 (#32) | 12.52 (#33) | |
| 11.39 (#33) | 14.97 (#33) | 7.72 (#33) | 10.38 (#33) | 9.33 (#32) | 7.58 (#32) | 8.19 (#32) | 19.31 (#32) | 17.32 (#32) | 8.59 (#33) | 10.12 (#32) | |
| 17.26 (#34) | 29.23 (#34) | 14.10 (#34) | 12.83 (#34) | 13.55 (#34) | 11.87 (#34) | 13.53 (#34) | 25.80 (#34) | 22.58 (#34) | 13.63 (#34) | 15.80 (#34) | |
| 81.88 (#35) | 106.99 (#35) | 59.09 (#35) | 69.36 (#35) | 54.75 (#35) | 80.45 (#35) | 59.67 (#35) | 150.09 (#35) | 125.46 (#35) | 57.32 (#35) | 63.75 (#35) |