Qwen/Qwen-Image-Bench
Image-Text-to-Text • 27B • Updated • 23.8k • 63
None defined yet.
Native Active Perception as Reasoning for Omni-Modal Understanding
Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification