Back to Archive

AutoBench Run 2 - April 2025

Second major AutoBench run with o4-mini, GPT-4.1-mini, Gemini 2.5 Pro Preview, Claude 3.7 Sonnet:thinking, etc.

Past
Date
April 25, 2025
Version
2025-04-25
Models
24
New Models
24

Run data

Model
Average (All Topics)CodingCreative WritingCurrent NewsGeneral CultureGrammarHistoryLogicsMathScienceTechnology
223.47s (#25)265.87s (#23)557.13s (#24)73.00s (#21)123.61s (#24)195.61s (#24)68.46s (#21)393.58s (#24)391.41s (#24)97.07s (#23)68.91s (#19)
94.45s (#21)157.34s (#20)41.92s (#15)48.86s (#17)54.35s (#17)55.23s (#16)43.90s (#16)205.29s (#23)236.74s (#23)51.17s (#20)49.75s (#16)
140.54s (#24)132.59s (#19)60.61s (#22)92.63s (#24)98.62s (#23)64.37s (#19)68.91s (#22)202.20s (#22)230.86s (#22)136.28s (#24)318.33s (#24)
96.77s (#22)157.72s (#21)54.36s (#21)82.25s (#23)77.68s (#22)69.55s (#21)160.87s (#24)136.52s (#21)98.81s (#16)29.76s (#15)100.24s (#23)
64.18s (#15)78.66s (#15)32.41s (#13)37.16s (#15)59.49s (#19)48.05s (#14)40.25s (#14)128.55s (#20)137.50s (#19)39.82s (#17)39.92s (#15)
69.79s (#17)80.37s (#17)50.31s (#18)69.86s (#20)44.77s (#16)81.45s (#23)56.90s (#18)90.91s (#19)87.52s (#13)45.89s (#19)89.96s (#21)
106.53s (#23)489.39s (#24)52.40s (#19)48.32s (#16)63.66s (#20)57.52s (#17)64.71s (#20)72.51s (#18)89.72s (#14)35.64s (#16)91.46s (#22)
52.30s (#14)57.74s (#12)39.50s (#14)24.57s (#11)39.62s (#14)48.97s (#15)33.24s (#13)70.85s (#17)164.19s (#20)25.11s (#13)19.22s (#10)
73.70s (#18)83.59s (#18)53.04s (#20)79.05s (#22)70.62s (#21)70.25s (#22)57.85s (#19)69.79s (#16)117.32s (#17)77.75s (#21)57.72s (#18)
66.70s (#16)77.04s (#14)72.72s (#23)55.77s (#19)55.36s (#18)69.05s (#20)48.73s (#17)68.49s (#15)121.78s (#18)41.30s (#18)56.74s (#17)
79.12s (#19)205.53s (#22)47.10s (#17)49.13s (#18)39.73s (#15)59.89s (#18)77.24s (#23)62.05s (#14)98.21s (#15)80.09s (#22)72.23s (#20)
48.74s (#13)65.57s (#13)29.35s (#12)29.17s (#12)33.06s (#13)35.17s (#13)29.16s (#12)44.70s (#13)165.37s (#21)28.79s (#14)27.03s (#13)
23.67s (#9)35.06s (#9)16.40s (#8)16.14s (#6)20.01s (#8)18.56s (#9)14.95s (#6)39.78s (#12)52.40s (#10)13.09s (#7)10.28s (#3)
29.19s (#10)39.52s (#11)15.25s (#7)20.58s (#9)32.46s (#12)25.08s (#12)28.67s (#11)36.87s (#11)52.40s (#9)19.65s (#10)21.41s (#11)
82.60s (#20)34.50s (#8)42.15s (#16)35.20s (#14)24.45s (#11)22.00s (#11)41.91s (#15)32.66s (#10)45.31s (#8)21.00s (#11)29.41s (#14)
32.86s (#12)34.50s (#8)42.15s (#16)35.20s (#14)24.45s (#11)22.00s (#11)41.91s (#15)32.66s (#10)45.31s (#8)21.00s (#11)29.41s (#14)
29.62s (#11)36.17s (#10)26.90s (#11)21.63s (#10)20.29s (#9)16.93s (#7)19.97s (#9)32.45s (#9)75.08s (#12)21.31s (#12)25.44s (#12)
23.32s (#8)25.85s (#5)12.72s (#6)13.91s (#5)17.66s (#6)17.06s (#8)17.54s (#8)28.11s (#8)72.73s (#11)13.08s (#6)14.55s (#7)
23.11s (#7)80.03s (#16)10.11s (#4)13.50s (#4)15.19s (#5)12.64s (#5)13.25s (#5)18.17s (#7)42.55s (#7)12.47s (#5)13.21s (#6)
21.75s (#6)31.65s (#7)18.10s (#9)16.75s (#7)19.07s (#7)20.07s (#10)22.09s (#10)17.86s (#6)37.33s (#6)17.63s (#9)16.96s (#8)
17.98s (#5)23.70s (#4)19.75s (#10)19.06s (#8)20.65s (#10)15.82s (#6)17.31s (#7)15.70s (#5)21.40s (#5)13.30s (#8)13.16s (#4)
13.82s (#4)26.40s (#6)9.28s (#3)10.96s (#3)11.67s (#4)9.58s (#4)12.00s (#4)14.52s (#4)20.09s (#4)10.46s (#4)13.20s (#5)
8.82s (#1)13.29s (#2)6.62s (#2)7.15s (#2)7.43s (#2)9.12s (#3)6.65s (#2)10.66s (#3)12.31s (#1)7.46s (#2)7.56s (#1)
9.93s (#2)15.72s (#3)11.25s (#5)6.86s (#1)10.76s (#3)7.67s (#2)8.17s (#3)9.15s (#2)14.68s (#2)6.84s (#1)8.20s (#2)
12.47s (#3)12.92s (#1)5.83s (#1)32.65s (#13)6.61s (#1)6.51s (#1)5.70s (#1)9.03s (#1)19.28s (#3)7.80s (#3)18.38s (#9)
AutoBench Run 2 - April 2025 - AutoBench