What does this repo signal mean?

Zhipu AI (GLM) published zai-org/UI2Code_N (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo zai-org/UI2Code_N · language Python · New repo, moderate stars. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Zhipu AI (GLM) Repo: zai-org/UI2Code_N

Captured source

source ↗

GitHub/github.com/zai-org/UI2Code_N

zai-org/UI2Code_N repository metadata

Source ↗

published Nov 10, 2025seen Jun 5captured Jun 11http 200method plain

zai-org/UI2Code_N

Language: Python

Stars: 73

Forks: 13

Open issues: 6

Created: 2025-11-10T13:39:35Z

Pushed: 2026-05-02T11:57:35Z

Default branch: main

Fork: no

Archived: no

README: UI2Code^N: UI-to-Code Generation as Interactive Visual Optimization

UI2Code^N reformulates UI-to-code as an interactive visual optimization problem. By embedding code generation in a closed-loop process of execution, visual inspection, and iterative refinement driven by rendered visual feedback, it more accurately reflects real-world UI development workflows. It unifies three key capabilities: UI drafting, UI editing, and UI polishing.

To address the non-differentiability of visual objectives and the noise of absolute visual evaluators, we propose Relative Visual Policy Optimization (RVPO), a preference-based reinforcement learning method that optimizes relative visual rankings among rendered candidates under execution feedback.

(Left) The VLM first performs UI drafting to generate an initial code draft $C^{(0)}$, which is rendered into $R^{(0)}$. Using visual feedback from the rendering, the same VLM iteratively performs UI polishing to produce refined code $C^{(t)}$. (Middle) Relative Visual Policy Optimization (RVPO), the proposed reinforcement learning algorithm used to optimize both UI drafting and UI polishing. (Right) Performance consistently improves with additional refinement steps, highlighting the iterative nature of real-world UI development.

Method Overview

UI2Code^N follows an interactive UI-to-code paradigm that fundamentally departs from prior single-turn generation approaches. We formalize this process as a feedback-driven transformation:

$$\mathcal{F}_{\theta}(I, C, R, E) \rightarrow C^{\prime}$$

where $I$ denotes the target UI image, $C$ the current code, $R = \text{Render}(C)$ the rendered output, $E$ optional modification instructions, and $C^{\prime}$ the updated code. The optimization objective is to find code $C^{*}$ that minimizes an implicit visual discrepancy $\mathcal{D}$:

$$C^{*} = \arg\min_{C} \mathcal{D}(I, \text{Render}(C))$$

1. Instantiations of Visual Optimization

This interactive paradigm naturally unifies three key capabilities by defining how feedback and constraints are introduced:

UI Drafting: Initializes the optimization process by producing a first-pass code approximation from the target UI screenshot: $C^{(0)} = \mathcal{F}_{\theta}(I)$.
UI Polishing (Visual Refinement): Iteratively improves code quality by explicitly comparing the rendered execution feedback against the target UI. This enables test-time scaling: $C^{(t+1)} = \mathcal{F}_{\theta}(I, C^{(t)}, R^{(t)})$.
UI Editing: Acts as a conditional variant of refinement where localized code updates are guided by explicit natural language modification instructions $E$: $C^{\prime} = \mathcal{F}_{\theta}(I, C, E)$.

2. Relative Visual Policy Optimization (RVPO)

The optimization objective is defined over rendered UI outcomes, which are non-differentiable. Furthermore, absolute visual scoring by VLM judges is often noisy. To address this, we optimize a rank-based surrogate objective measuring expected preference:

$$\mathcal{L}_{\text{rank}}(\theta) = \mathbb{E}_{y \sim \pi_{\theta}(\cdot|x)} \left[ \mathbb{E}_{y^{\prime} \sim \pi_{\theta}(\cdot|x)} [p_{\psi}(y > y^{\prime}|x)] \right]$$

Tournament-based Reward: We sample $N$ candidates and perform pairwise comparisons. Each candidate $y_i$ is assigned a scalar reward based on its aggregate win count within the group: $W_i = \sum_{j \neq i} \mathbb{1}[\mathcal{C}_{\psi}(x, y_i, y_j) = 1]$.
Policy Optimization with GRPO: We compute group-normalized advantages $A_i$ and update the policy using the clipped surrogate objective, ensuring stable learning under execution feedback.

[Table of Contents](#table-of-contents)
[Demo](#demo)
[Model](#model)
[Quick Start](#quick-start)
[Evaluation](#evaluation)
[Result](#result)
[Experimental results on UI-to-Code and UI Polishing benchmarks](#experimental-results-on-ui-to-code-and-ui-polishing-benchmarks)
[Reward Design](#reward-design)
[Citation](#citation)

Demo

We provide a ready-to-run demo script that deploys UI2Code^N, allowing users to experience interactive UI-to-code generation, editing, and polishing directly through a command-line or web-based interface.

Web Interface Mode

cd demo
bash run_demo_web.sh

Once the web demo starts, open your browser and visit:

http://127.0.0.1:7860

Command-Line Demo (Local Setup)

After downloading the model, run the following command to launch the demo::

cd demo
bash run_demo.sh

This demo will:

Load pretrained checkpoints for UI2Code^N and initialize the visual-language pipeline.
Accept a UI screenshot and a user prompt as input.
Generate corresponding front-end code (e.g., HTML/CSS/React) with high fidelity to the visual layout.

🎬 A short demonstration is provided below, featuring UI-to-code generation, UI editing, and UI polishing. The demo highlights how UI2Code^N enables seamless transitions between these capabilities within a unified interactive workflow.

https://github.com/user-attachments/assets/3196a27d-3543-4029-9f36-429ec2acc7ff

UI2Code^N achieves performance comparable to leading closed-source models such as Claude-4-Sonnet and GPT-5.

Model

UI2Code^N is built on GLM-4.1V-9B-Base, which is publicly available on Hugging Face. Welcome to download and use it!

Quick Start

First, please install the required dependencies using the following command:

apt-get install poppler-utils
pip install transformers==4.57.1
# Optional
pip install vllm==0.10.2 sglang==0.5.2
pip install playwright

Then, run the following code:

from transformers import AutoProcessor, AutoModelForImageTextToText
import torch

messages = [
{
"role": "user",
"content": [
{
"type": "image",
"url": "https://raw.githubusercontent.com/zheny2751-dotcom/UI2Code-N/main/assets/example.png"
},
{
"type": "text",
"text": "Who pretended to be Little Red Riding Hood's grandmother"
}
],
}
]
processor = AutoProcessor.from_pretrained("zai-org/UI2Code_N")
model = AutoModelForImageTextToText.from_pretrained(
pretrained_model_name_or_path="zai-org/UI2Code_N",
torch_dtype=torch.bfloat16,
device_map="auto",
)
inputs =...

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

New repo, moderate stars