
ITBench-AA: frontier models score below 50% on enterprise IT agentic tasks
Artificial Analysis and IBM released ITBench-AA, the first benchmark measuring frontier model performance on agentic enterprise IT tasks, revealing significant gaps in current capabilities.
Why it matters — Clear evidence that general-purpose models still struggle with real enterprise automation—defines the actual difficulty level for builders targeting IT operations.
