Back to Archive

AutoBench Agentic Run 1 - April 2026

The first AutoBench run to measure agentic performance of top LLMs

Past
Date
April 16, 2026
Version
2026-04-16
Models
31
New Models
22

Run data

Model
ScoreAvg Cost ($ Cents)Avg Latency (sec)P99 Latency (sec)Iterations
2.91 (#12)2.03 (#29)87s (#28)241s (#26)104
2.78 (#19)0.43 (#22)93s (#29)262s (#29)113
3.02 (#5)5.82 (#31)129s (#31)329s (#31)132
2.70 (#22)0.02 (#4)52s (#21)58s (#8)169
2.82 (#18)0.11 (#14)23s (#8)135s (#17)179
2.66 (#25)1.56 (#27)57s (#24)139s (#18)181
2.92 (#8)0.13 (#16)55s (#23)174s (#23)182
2.70 (#23)0.07 (#9)72s (#26)245s (#28)183
2.53 (#30)0.02 (#3)14s (#6)114s (#16)186
2.92 (#11)1.54 (#26)37s (#13)98s (#14)187
2.63 (#27)0.08 (#10)103s (#30)279s (#30)189
2.72 (#21)0.05 (#7)85s (#27)241s (#27)192
2.90 (#13)0.09 (#11)28s (#11)68s (#10)193
2.62 (#28)0.10 (#12)9s (#1)22s (#1)193
2.92 (#9)0.14 (#17)44s (#17)44s (#6)194
2.69 (#24)0.05 (#6)11s (#2)41s (#5)194
2.92 (#10)0.86 (#24)44s (#18)152s (#20)195
2.27 (#31)0.03 (#5)41s (#15)76s (#11)195
3.13 (#2)1.95 (#28)46s (#19)147s (#19)197
3.06 (#4)0.50 (#23)66s (#25)177s (#24)197
2.65 (#26)0.01 (#1)43s (#16)166s (#22)197
2.99 (#6)0.20 (#19)48s (#20)109s (#15)197
2.84 (#15)0.12 (#15)36s (#12)96s (#12)197
3.17 (#1)2.56 (#30)38s (#14)97s (#13)198
2.55 (#29)0.06 (#8)54s (#22)156s (#21)198
2.89 (#14)0.28 (#20)13s (#3)26s (#2)198
3.10 (#3)1.33 (#25)26s (#10)198s (#25)198
2.99 (#7)0.32 (#21)26s (#9)54s (#7)198
2.84 (#16)0.14 (#18)13s (#4)36s (#4)198
2.83 (#17)0.11 (#13)14s (#5)31s (#3)198
2.76 (#20)0.02 (#2)18s (#7)63s (#9)198
AutoBench Agentic Run 1 - April 2026 - AutoBench