Back to Archive

AutoBench Run 5 - December 2025

Latest AutoBench run with models Gpt 5.2, Claude Opus 4.5, DeepSeek 3.2 Speciale and more

Latest
Date
December 16, 2025
Version
2025-12-16
Models
35
New Models
13

Run data

Model
ScoreAvg Cost ($ Cents)Avg Latency (sec)P99 Latency (sec)Iterations
4.2 (#11)0.32 (#11)317s (#35)811s (#34)283
4.14 (#16)0.47 (#15)310s (#34)833s (#35)288
4.48 (#1)81.88 (#35)261s (#33)784s (#33)303
4.32 (#6)1.86 (#24)248s (#32)729s (#32)287
4.38 (#5)10.80 (#32)227s (#31)627s (#30)310
4.13 (#17)1.25 (#22)187s (#30)630s (#31)306
4.2 (#12)8.12 (#31)180s (#29)562s (#29)293
4.12 (#18)0.99 (#21)171s (#28)477s (#28)308
4.3 (#7)11.39 (#33)170s (#27)477s (#27)307
3.86 (#29)0.54 (#17)163s (#26)425s (#24)306
4.39 (#4)17.26 (#34)144s (#25)373s (#22)313
3.99 (#24)0.71 (#18)137s (#24)473s (#26)308
4.43 (#2)7.36 (#30)130s (#23)434s (#25)312
4.11 (#19)0.09 (#5)125s (#22)410s (#23)311
4.17 (#15)3.79 (#26)111s (#21)317s (#19)312
3.98 (#25)0.19 (#8)105s (#20)337s (#21)302
4.06 (#21)0.34 (#13)100s (#19)269s (#17)309
4.29 (#8)0.91 (#20)93s (#18)258s (#16)312
3.94 (#27)0.51 (#16)90s (#17)198s (#10)307
4.29 (#9)6.48 (#28)87s (#16)222s (#13)313
4.11 (#20)0.33 (#12)83s (#15)329s (#20)312
4.03 (#23)0.75 (#19)78s (#14)227s (#14)312
4.41 (#3)6.85 (#29)76s (#12)186s (#9)312
3.78 (#32)0.18 (#7)76s (#13)240s (#15)311
4.18 (#13)0.11 (#6)75s (#11)292s (#18)292
4.21 (#10)0.27 (#10)69s (#10)207s (#11)306
3.5 (#34)0.08 (#4)67s (#9)212s (#12)311
4.17 (#14)2.12 (#25)66s (#8)174s (#7)312
4.06 (#22)3.94 (#27)61s (#7)132s (#3)277
3.81 (#30)0.38 (#14)52s (#6)147s (#5)306
3.47 (#35)1.30 (#23)52s (#5)135s (#4)312
3.78 (#31)0.07 (#2)39s (#4)183s (#8)310
3.57 (#33)0.05 (#1)31s (#3)154s (#6)306
3.88 (#28)0.08 (#3)24s (#2)57s (#1)312
3.95 (#26)0.21 (#9)20s (#1)69s (#2)313
AutoBench Run 5 - December 2025 - AutoBench