Back to Archive

AutoBench Agentic Run 1 - April 2026

The first AutoBench run to measure agentic performance of top LLMs

Latest
Date
April 19, 2026
Version
2026-04-19
Models
32
New Models
1

Run data

Model
AutoBenchAAI IndexTerminal-benchGDPval-AATau2-Bench Telecom
2.27 (#32)7 (#31)7 (#31)0 (#31)18 (#31)
2.61 (#31)32 (#25)14 (#28)26 (#21)44 (#26)
2.62 (#30)22 (#29)16 (#27)18 (#26)41 (#27)
2.64 (#29)53 (#14)36 (#13)35 (#11)91 (#12)
2.65 (#28)28 (#26)11 (#30)8 (#29)60 (#24)
2.66 (#27)37 (#24)17 (#26)17 (#28)73 (#20)
2.69 (#26)26 (#28)17 (#25)18 (#27)25 (#30)
2.71 (#25)19 (#30)14 (#29)4 (#30)41 (#28)
2.76 (#24)38 (#23)24 (#23)22 (#23)66 (#22)
2.78 (#23)48 (#18)42 (#8)34 (#15)76 (#19)
2.79 (#22)56 (#11)35 (#16)34 (#16)95 (#6)
2.79 (#21)41 (#20)36 (#14)31 (#17)60 (#23)
2.8 (#20)40 (#22)29 (#19)25 (#22)68 (#21)
2.82 (#19)26 (#27)24 (#22)21 (#24)31 (#29)
2.82 (#18)44 (#19)27 (#21)21 (#25)94 (#8)
2.84 (#17)53 (#15)31 (#18)31 (#18)89 (#13)
2.84 (#16)49 (#17)24 (#24)27 (#19)93 (#9)
2.91 (#15)59 (#9)52 (#4)46 (#6)83 (#16)
2.92 (#14)55 (#12)32 (#17)35 (#13)96 (#3)
2.98 (#13)50 (#16)39 (#10)35 (#12)80 (#17)
2.99 (#12)40 (#21)27 (#20)34 (#14)55 (#25)
3 (#11)54 (#13)38 (#12)27 (#20)93 (#10)
3.01 (#10)61 (#7)39 (#11)51 (#5)85 (#15)
3.02 (#9)59 (#10)35 (#15)39 (#10)96 (#4)
3.07 (#8)62 (#6)44 (#6)43 (#8)95 (#7)
3.1 (#7)63 (#5)41 (#9)46 (#7)95 (#5)
3.13 (#6)68 (#2)58 (#1)59 (#1)87 (#14)
3.15 (#5)67 (#3)43 (#7)52 (#4)98 (#1)
3.16 (#4)63 (#4)53 (#3)58 (#2)76 (#18)
3.21 (#3)59 (#8)54 (#2)41 (#9)96 (#2)
3.24 (#2)68 (#1)46 (#5)56 (#3)92 (#11)
3.3 (#1)----
AutoBench Agentic Run 1 - April 2026 - AutoBench