Agentbench

A Comprehensive Benchmark to Evaluate LLMs as Agents

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Python