Back to directory
VisualAgentBench logo

VisualAgentBench

Free267

Towards Large Multimodal Models as Visual Foundation Agents

About

VisualAgentBench tests how well multimodal AI models can complete visual tasks that require reasoning and action. It's built for researchers evaluating large language models with vision capabilities. The benchmark includes real-world scenarios where agents must understand images and make decisions based on what they see.

Key Features

  • gpt
  • llm-agent
  • multimodal-large-language-models

Pricing

Free

Open source. You supply your own LLM API keys.

Categories

General

Details

VerifiedJune 6, 2026
GitHub starsโ˜… 267