CU-Benchmarks
updated
visualwebbench/VisualWebBench
Viewer
• Updated • 1.54k • 1.3k
• 18
Updated • 167
• 6
Viewer
• Updated • 86k • 431
• 12
Viewer
• Updated • 1.27k • 3.04k
• 50
Viewer
• Updated • 1.27k • 1.58k
• 8
Benchmark
• Updated • 10.3k
• 67
Preview
• Updated • 1.51k
• 16
Preview
• Updated • 988
• 26
Viewer
• Updated • 168k • 542
• 5
Preview
• Updated • 14
osunlp/Multimodal-Mind2Web
Viewer
• Updated • 14.2k • 5.87k
• 96
Viewer
• Updated • 259 • 289
• 2
Viewer
• Updated • 253 • 9.46k
• 127
Viewer
• Updated • 7.74k • 24.2k
• 26
xlangai/ubuntu_osworld_file_cache
Updated • 1.39M
• 32
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Paper
• 2409.08264
• Published • 48
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Paper
• 2405.14573
• Published
Viewer
• Updated • 1.21k • 106
• 6