arxiv:2507.22827

ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Published on Oct 20, 2025

· Submitted by

Jiaming Han on Jul 31, 2025

#1 Paper of the day

Upvote

101

Authors:

Yilei Jiang ,

Yaozhi Zheng ,

Yuxuan Wan ,

Jiaming Han ,

Qunzhong Wang ,

Abstract

A modular multi-agent framework named ScreenCoder decomposes UI design-to-code translation into grounding, planning, and generation stages, achieving superior layout accuracy and code correctness through specialized agents and fine-tuned multimodal models.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Automating the transformation of user interface (UI) designs into front-end code holds significant promise for accelerating software development and democratizing design workflows. While multimodal large language models (MLLMs) can translate images to code, they often fail on complex UIs, struggling to unify visual perception, layout planning, and code synthesis within a single monolithic model, which leads to frequent perception and planning errors. To address this, we propose ScreenCoder, a modular multi-agent framework that decomposes the task into three interpretable stages: grounding, planning, and generation. By assigning these distinct responsibilities to specialized agents, our framework achieves significantly higher robustness and fidelity than end-to-end approaches. Furthermore, ScreenCoder serves as a scalable data engine, enabling us to generate high-quality image-code pairs. We use this data to fine-tune open-source MLLM via a dual-stage pipeline of supervised fine-tuning and reinforcement learning, demonstrating substantial gains in its UI generation capabilities. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in layout accuracy, structural coherence, and code correctness. Our code is made publicly available at https://github.com/leigest519/ScreenCoder.

View arXiv page View PDF Project page GitHub 2.69k Add to collection

Community

csuhan

Paper author Paper submitter Jul 31, 2025

•

edited Jul 31, 2025

ScreenCoder is a modular multi-agent framework that advances UI-to-code generation by integrating visual grounding, hierarchical planning, and adaptive code synthesis.

Try it at: https://huggingface.co/spaces/Jimmyzheng-10/ScreenCoder