zai-org/UI2Code_N
Python
Captured source
source ↗zai-org/UI2Code_N
Language: Python
Stars: 73
Forks: 13
Open issues: 6
Created: 2025-11-10T13:39:35Z
Pushed: 2026-05-02T11:57:35Z
Default branch: main
Fork: no
Archived: no
README: UI2Code^N: UI-to-Code Generation as Interactive Visual Optimization
UI2Code^N reformulates UI-to-code as an interactive visual optimization problem. By embedding code generation in a closed-loop process of execution, visual inspection, and iterative refinement driven by rendered visual feedback, it more accurately reflects real-world UI development workflows. It unifies three key capabilities: UI drafting, UI editing, and UI polishing.
To address the non-differentiability of visual objectives and the noise of absolute visual evaluators, we propose Relative Visual Policy Optimization (RVPO), a preference-based reinforcement learning method that optimizes relative visual rankings among rendered candidates under execution feedback.
(Left) The VLM first performs UI drafting to generate an initial code draft $C^{(0)}$, which is rendered into $R^{(0)}$. Using visual feedback from the rendering, the same VLM iteratively performs UI polishing to produce refined code $C^{(t)}$. (Middle) Relative Visual Policy Optimization (RVPO), the proposed reinforcement learning algorithm used to optimize both UI drafting and UI polishing. (Right) Performance consistently improves with additional refinement steps, highlighting the iterative nature of real-world UI development.
Method Overview
UI2Code^N follows an interactive UI-to-code paradigm that fundamentally departs from prior single-turn generation approaches. We formalize this process as a feedback-driven transformation:
$$\mathcal{F}_{\theta}(I, C, R, E) \rightarrow C^{\prime}$$
where $I$ denotes the target UI image, $C$ the current code, $R = \text{Render}(C)$ the rendered output, $E$ optional modification instructions, and $C^{\prime}$ the updated code. The optimization objective is to find code $C^{*}$ that minimizes an implicit visual discrepancy $\mathcal{D}$:
$$C^{*} = \arg\min_{C} \mathcal{D}(I, \text{Render}(C))$$
1. Instantiations of Visual Optimization
This interactive paradigm naturally unifies three key capabilities by defining how feedback and constraints are introduced:
- UI Drafting: Initializes the optimization process by producing a first-pass code approximation from the target UI screenshot: $C^{(0)} = \mathcal{F}_{\theta}(I)$.
- UI Polishing (Visual Refinement): Iteratively improves code quality by explicitly comparing the rendered execution feedback against the target UI. This enables test-time scaling: $C^{(t+1)} = \mathcal{F}_{\theta}(I, C^{(t)}, R^{(t)})$.
- UI Editing: Acts as a conditional variant of refinement where localized code updates are guided by explicit natural language modification instructions $E$: $C^{\prime} = \mathcal{F}_{\theta}(I, C, E)$.
2. Relative Visual Policy Optimization (RVPO)
The optimization objective is defined over rendered UI outcomes, which are non-differentiable. Furthermore, absolute visual scoring by VLM judges is often noisy. To address this, we optimize a rank-based surrogate objective measuring expected preference:
$$\mathcal{L}_{\text{rank}}(\theta) = \mathbb{E}_{y \sim \pi_{\theta}(\cdot|x)} \left[ \mathbb{E}_{y^{\prime} \sim \pi_{\theta}(\cdot|x)} [p_{\psi}(y > y^{\prime}|x)] \right]$$
- Tournament-based Reward: We sample $N$ candidates and perform pairwise comparisons. Each candidate $y_i$ is assigned a scalar reward based on its aggregate win count within the group: $W_i = \sum_{j \neq i} \mathbb{1}[\mathcal{C}_{\psi}(x, y_i, y_j) = 1]$.
- Policy Optimization with GRPO: We compute group-normalized advantages $A_i$ and update the policy using the clipped surrogate objective, ensuring stable learning under execution feedback.
Table of Contents
- [Table of Contents](#table-of-contents)
- [Demo](#demo)
- [Model](#model)
- [Quick Start](#quick-start)
- [Evaluation](#evaluation)
- [Result](#result)
- [Experimental results on UI-to-Code and UI Polishing benchmarks](#experimental-results-on-ui-to-code-and-ui-polishing-benchmarks)
- [Reward Design](#reward-design)
- [Citation](#citation)
Demo
We provide a ready-to-run demo script that deploys UI2Code^N, allowing users to experience interactive UI-to-code generation, editing, and polishing directly through a command-line or web-based interface.
Web Interface Mode
cd demo bash run_demo_web.sh
Once the web demo starts, open your browser and visit:
http://127.0.0.1:7860
Command-Line Demo (Local Setup)
After downloading the model, run the following command to launch the demo::
cd demo bash run_demo.sh
This demo will:
- Load pretrained checkpoints for UI2Code^N and initialize the visual-language pipeline.
- Accept a UI screenshot and a user prompt as input.
- Generate corresponding front-end code (e.g., HTML/CSS/React) with high fidelity to the visual layout.
🎬 A short demonstration is provided below, featuring UI-to-code generation, UI editing, and UI polishing. The demo highlights how UI2Code^N enables seamless transitions between these capabilities within a unified interactive workflow.
https://github.com/user-attachments/assets/3196a27d-3543-4029-9f36-429ec2acc7ff
UI2Code^N achieves performance comparable to leading closed-source models such as Claude-4-Sonnet and GPT-5.
Model
UI2Code^N is built on GLM-4.1V-9B-Base, which is publicly available on Hugging Face. Welcome to download and use it!
Quick Start
First, please install the required dependencies using the following command:
apt-get install poppler-utils pip install transformers==4.57.1 # Optional pip install vllm==0.10.2 sglang==0.5.2 pip install playwright
Then, run the following code:
from transformers import AutoProcessor, AutoModelForImageTextToText
import torch
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"url": "https://raw.githubusercontent.com/zheny2751-dotcom/UI2Code-N/main/assets/example.png"
},
{
"type": "text",
"text": "Who pretended to be Little Red Riding Hood's grandmother"
}
],
}
]
processor = AutoProcessor.from_pretrained("zai-org/UI2Code_N")
model = AutoModelForImageTextToText.from_pretrained(
pretrained_model_name_or_path="zai-org/UI2Code_N",
torch_dtype=torch.bfloat16,
device_map="auto",
)
inputs =…Excerpt shown — open the source for the full document.
Notability
notability 4.0/10New repo, moderate stars