Back to Archive

AutoBench Agentic Run 1 - April 2026

The first AutoBench run to measure agentic performance of top LLMs

Latest
Date
April 19, 2026
Version
2026-04-19
Models
32
New Models
1

Run data

Model
ScoreAvg Cost ($ Cents)Avg Latency (sec)P99 Latency (sec)Iterations
3.13 (#6)6.33 (#32)131s (#32)326s (#32)145
2.71 (#25)0.08 (#10)106s (#31)285s (#31)190
2.78 (#23)0.43 (#22)93s (#30)262s (#30)113
2.91 (#15)2.03 (#29)87s (#29)241s (#28)104
2.79 (#22)0.05 (#7)75s (#28)241s (#27)184
2.80 (#20)0.06 (#9)69s (#27)245s (#29)187
3.15 (#5)0.51 (#23)60s (#26)183s (#26)197
2.66 (#27)1.56 (#27)57s (#25)139s (#20)181
3.02 (#9)0.13 (#16)54s (#24)178s (#25)187
2.64 (#29)0.06 (#8)54s (#23)129s (#18)197
3.16 (#4)1.98 (#28)47s (#22)149s (#21)193
3.07 (#8)0.19 (#19)46s (#21)107s (#16)198
2.79 (#21)0.02 (#4)45s (#20)174s (#24)183
2.92 (#14)0.14 (#17)44s (#19)135s (#19)194
2.99 (#12)0.84 (#24)43s (#18)151s (#22)193
2.65 (#28)0.01 (#1)43s (#17)166s (#23)197
2.27 (#32)0.03 (#5)41s (#16)76s (#12)195
3.24 (#2)2.58 (#30)38s (#15)98s (#15)198
2.84 (#16)0.12 (#15)36s (#14)96s (#14)197
3.00 (#11)1.52 (#26)33s (#13)79s (#13)189
3.01 (#10)0.10 (#11)27s (#12)58s (#9)193
3.21 (#3)1.33 (#25)26s (#11)58s (#10)198
3.10 (#7)0.33 (#21)26s (#10)55s (#8)199
2.82 (#19)0.12 (#14)23s (#9)114s (#17)179
3.30 (#1)2.72 (#31)21s (#8)47s (#7)187
2.76 (#24)0.02 (#2)18s (#7)63s (#11)198
2.82 (#18)0.11 (#13)14s (#6)31s (#3)198
2.84 (#17)0.14 (#18)13s (#5)36s (#4)198
2.98 (#13)0.28 (#20)13s (#4)23s (#2)198
2.61 (#31)0.02 (#3)12s (#3)41s (#5)192
2.69 (#26)0.05 (#6)11s (#2)41s (#6)194
2.62 (#30)0.10 (#12)9s (#1)22s (#1)193
AutoBench Agentic Run 1 - April 2026 - AutoBench