Back to directory

VisualAgentBench
Free267Towards Large Multimodal Models as Visual Foundation Agents
About
VisualAgentBench tests how well multimodal AI models can complete visual tasks that require reasoning and action. It's built for researchers evaluating large language models with vision capabilities. The benchmark includes real-world scenarios where agents must understand images and make decisions based on what they see.
Key Features
- gpt
- llm-agent
- multimodal-large-language-models
Pricing
FreeOpen source. You supply your own LLM API keys.
Categories
General
