Back to Archive

AutoBench Run 5 - December 2025

Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, DeepSeek 3.2 Speciale and more

Latest
Date
December 16, 2025
Version
2025-12-16
Models
35
New Models
13

Run data

Model
Average (All Topics)CodingCreative WritingCurrent NewsGeneral CultureGrammarHistoryLogicsMathScienceTechnology
0.05 (#1)0.05 (#1)0.03 (#1)0.05 (#3)0.05 (#3)0.04 (#1)0.05 (#2)0.08 (#1)0.05 (#1)0.04 (#1)0.05 (#1)
0.07 (#2)0.08 (#2)0.04 (#2)0.05 (#2)0.05 (#2)0.05 (#3)0.05 (#3)0.14 (#3)0.12 (#5)0.05 (#2)0.05 (#3)
0.08 (#3)0.15 (#6)0.06 (#7)0.04 (#1)0.04 (#1)0.05 (#2)0.05 (#1)0.17 (#6)0.16 (#6)0.05 (#3)0.05 (#2)
0.08 (#4)0.11 (#4)0.04 (#4)0.09 (#5)0.08 (#5)0.08 (#6)0.08 (#5)0.10 (#2)0.08 (#2)0.08 (#5)0.09 (#5)
0.09 (#5)0.11 (#3)0.05 (#6)0.08 (#4)0.07 (#4)0.08 (#4)0.08 (#4)0.15 (#5)0.12 (#4)0.08 (#4)0.07 (#4)
0.11 (#6)0.12 (#5)0.05 (#5)0.28 (#12)0.08 (#6)0.08 (#5)0.09 (#6)0.15 (#4)0.10 (#3)0.09 (#6)0.11 (#6)
0.18 (#7)0.30 (#9)0.12 (#11)0.10 (#6)0.10 (#7)0.11 (#7)0.11 (#7)0.38 (#8)0.37 (#10)0.14 (#7)0.11 (#7)
0.19 (#8)0.18 (#7)0.04 (#3)0.16 (#9)0.12 (#9)0.21 (#10)0.15 (#10)0.36 (#7)0.33 (#9)0.17 (#8)0.17 (#10)
0.21 (#9)0.28 (#8)0.08 (#8)0.13 (#8)0.11 (#8)0.17 (#8)0.12 (#8)0.47 (#9)0.45 (#11)0.17 (#9)0.13 (#8)
0.27 (#10)0.41 (#13)0.12 (#10)0.13 (#7)0.12 (#10)0.21 (#9)0.13 (#9)0.75 (#14)0.54 (#14)0.22 (#10)0.16 (#9)
0.32 (#11)0.55 (#15)0.11 (#9)0.20 (#10)0.20 (#11)0.22 (#12)0.21 (#11)0.76 (#15)0.69 (#16)0.22 (#11)0.21 (#11)
0.33 (#12)0.40 (#12)0.20 (#15)0.37 (#16)0.33 (#15)0.22 (#11)0.37 (#16)0.51 (#11)0.31 (#7)0.29 (#12)0.34 (#15)
0.34 (#13)0.38 (#11)0.25 (#16)0.25 (#11)0.24 (#12)0.40 (#15)0.26 (#12)0.58 (#13)0.49 (#12)0.30 (#13)0.26 (#12)
0.38 (#14)0.34 (#10)0.17 (#13)0.47 (#17)0.42 (#17)0.33 (#13)0.42 (#18)0.49 (#10)0.33 (#8)0.35 (#16)0.41 (#17)
0.47 (#15)0.67 (#16)0.68 (#22)0.32 (#14)0.29 (#13)0.44 (#16)0.28 (#13)1.03 (#17)0.63 (#15)0.33 (#15)0.28 (#13)
0.51 (#16)0.47 (#14)0.20 (#14)0.59 (#20)0.55 (#20)0.49 (#18)0.55 (#20)0.54 (#12)0.51 (#13)0.54 (#18)0.57 (#18)
0.54 (#17)0.82 (#17)0.13 (#12)0.32 (#13)0.34 (#16)0.38 (#14)0.33 (#15)1.02 (#16)1.27 (#19)0.32 (#14)0.34 (#14)
0.71 (#18)1.21 (#20)0.55 (#19)0.32 (#15)0.29 (#14)0.46 (#17)0.29 (#14)1.79 (#21)1.39 (#20)0.53 (#17)0.35 (#16)
0.75 (#19)1.00 (#18)0.61 (#20)0.55 (#19)0.52 (#19)0.77 (#20)0.49 (#19)1.23 (#18)1.06 (#18)0.71 (#20)0.59 (#19)
0.91 (#20)1.10 (#19)0.67 (#21)0.81 (#21)0.69 (#21)0.86 (#21)0.95 (#21)1.38 (#19)0.99 (#17)0.85 (#21)0.86 (#22)
0.99 (#21)1.88 (#23)0.30 (#17)0.48 (#18)0.43 (#18)0.50 (#19)0.40 (#17)2.02 (#22)2.31 (#23)0.67 (#19)0.71 (#20)
1.25 (#22)1.40 (#21)0.46 (#18)0.89 (#22)1.06 (#23)1.13 (#22)1.13 (#23)2.19 (#23)2.27 (#22)1.01 (#22)0.84 (#21)
1.30 (#23)1.68 (#22)0.76 (#23)1.24 (#24)1.21 (#24)1.24 (#23)1.14 (#24)1.49 (#20)1.55 (#21)1.24 (#23)1.25 (#23)
1.86 (#24)2.40 (#24)1.38 (#25)1.22 (#23)1.03 (#22)1.51 (#24)1.05 (#22)3.77 (#24)3.80 (#25)1.29 (#24)1.32 (#24)
2.12 (#25)3.80 (#25)1.18 (#24)1.36 (#25)1.22 (#25)1.83 (#25)1.22 (#25)3.86 (#25)3.68 (#24)1.54 (#25)1.50 (#25)
3.79 (#26)4.80 (#26)3.24 (#28)3.11 (#26)2.64 (#26)3.07 (#26)2.66 (#26)6.25 (#27)5.93 (#27)2.86 (#26)3.35 (#27)
3.94 (#27)7.55 (#27)4.10 (#29)3.17 (#27)3.10 (#27)3.59 (#27)2.76 (#27)4.36 (#26)4.76 (#26)3.77 (#27)3.26 (#26)
6.48 (#28)8.47 (#28)3.17 (#26)4.51 (#30)4.22 (#31)4.99 (#28)4.27 (#29)13.24 (#28)11.27 (#30)5.59 (#30)4.71 (#29)
6.85 (#29)9.64 (#29)4.15 (#30)4.07 (#28)4.04 (#30)5.29 (#29)3.84 (#28)16.87 (#31)12.02 (#31)5.09 (#28)4.23 (#28)
7.36 (#30)13.04 (#31)4.91 (#31)5.83 (#31)3.99 (#28)6.30 (#30)4.54 (#31)14.88 (#29)10.13 (#28)5.42 (#29)5.41 (#31)
8.12 (#31)13.01 (#30)3.17 (#27)4.24 (#29)4.01 (#29)6.73 (#31)4.29 (#30)19.90 (#33)18.17 (#33)5.78 (#31)5.27 (#30)
10.80 (#32)13.88 (#32)7.68 (#32)9.72 (#32)10.77 (#33)9.56 (#33)9.24 (#33)16.25 (#30)11.15 (#29)8.27 (#32)12.52 (#33)
11.39 (#33)14.97 (#33)7.72 (#33)10.38 (#33)9.33 (#32)7.58 (#32)8.19 (#32)19.31 (#32)17.32 (#32)8.59 (#33)10.12 (#32)
17.26 (#34)29.23 (#34)14.10 (#34)12.83 (#34)13.55 (#34)11.87 (#34)13.53 (#34)25.80 (#34)22.58 (#34)13.63 (#34)15.80 (#34)
81.88 (#35)106.99 (#35)59.09 (#35)69.36 (#35)54.75 (#35)80.45 (#35)59.67 (#35)150.09 (#35)125.46 (#35)57.32 (#35)63.75 (#35)
AutoBench Run 5 - December 2025 - AutoBench