Tencent-Hunyuan/Precise
Python
Captured source
source ↗Tencent-Hunyuan/Precise
Language: Python
License: NOASSERTION
Stars: 15
Forks: 0
Open issues: 0
Created: 2026-05-22T02:07:30Z
Pushed: 2026-05-25T04:07:35Z
Default branch: main
Fork: no
Archived: no
README:
News
- Code for Precise-SDE is available in this repository, including the Precise sampler, FLUX.2 Klein training entrypoint, evaluation scripts, and reward server integrations.
Table of Contents
- [Abstract](#abstract)
- [Training Curves](#training-curves)
- [Setup](#setup)
- [Models](#models)
- [Training](#training)
- [Evaluation](#evaluation)
- [Reward Server Setup](#reward-server-setup)
- [Acknowledgements](#acknowledgements)
Abstract
Reinforcement learning (RL) is an effective way to improve prompt alignment and perceptual quality in diffusion and flow-matching generators. For online RL on flow-matching models, a central step is turning the deterministic sampling trajectory into a stochastic policy, usually by replacing the reverse-time ODE with an SDE. The stochastic sampler is therefore part of the policy: it controls exploration, affects denoising stability, and determines the action probabilities used by policy-gradient optimization.
We decompose stochastic sampler design into two coupled problems: choosing an exploration schedule that balances diversity and stability, and discretizing the resulting SDE faithfully at the small step counts used in RL. Existing samplers expose failure modes in this regime: Euler-style stochastic samplers can introduce excess discretization noise, while coefficient-preserving rules can bias the marginal distribution. We propose Precise, an SDE-consistent stochastic sampler with a logSNR-derived exploration schedule and a closed-form finite-step transition. The key approximation freezes the clean-latent posterior mean, which keeps the denoising trajectory faithful while avoiding excess noise.
Across FLUX.2 Klein experiments, Precise improves reward optimization speed and stability, reaches state-of-the-art alignment scores on in-domain rewards such as PickScore and HPSv2.1, and requires less wall-clock training time to match the best in-domain performance of prior samplers.
Training Curves
The main experiments compare Precise against Dance-GRPO, Flow-GRPO, and CPS under matched training recipes. Higher is better for all plotted rewards.
FLUX.2 Klein, 20 NFE
Setup
Install uv, then create the root environment from the repository root:
uv sync --project .
GPU training requires a CUDA-capable Linux environment compatible with the PyTorch and accelerator versions pinned in pyproject.toml.
Models
External model and reward weights are resolved through pinned Hugging Face checkpoints unless you configure a local model mirror. Override local roots with:
export PRECISE_SDE_MODEL_ROOT=/path/to/models
Pinned checkpoints:
| Name | Hugging Face checkpoint | Revision | | --- | --- | --- | | FLUX.2 Klein | black-forest-labs/FLUX.2-klein-base-4B | a3b4f4849157f664bdbc776fd7453c2783562f4d | | CLIP | openai/clip-vit-large-patch14 | 32bd64288804d66eefd0ccbe215aa642df71cc41 | | OpenCLIP ViT-H | laion/CLIP-ViT-H-14-laion2B-s32B-b79K | 1c2b8495b28150b8a4922ee1c8edee224c284c0c | | PickScore | yuvalkirstain/PickScore_v1 | a4e4367c6dfa7288a00c550414478f865b875800 | | HPSv2 | xswu/HPSv2 | 697403c78157020a1ae59d23f111aa58ced35b0a | | ImageReward | zai-org/ImageReward | 5736be03b2652728fb87788c9797b0570450ab72 | | UnifiedReward v2 | CodeGoat24/UnifiedReward-2.0-qwen35-9b | f01548b009741e12ff9817ed91dba94701ed9579 | | GenEval Mask2Former | tsbpp/geneval_mask2former | 22b5a198cedf6b45e45165cf1c865d58de4a2832 |
Training
Use the launcher rather than calling trainer scripts directly:
bash launch/train.sh --flux --reward mix --sde precise --noise-level 1.5 --step 20
Supported model selectors:
--flux
Supported rewards:
mixpickscoregeneval
Supported SDE modes:
preciseflow_grpocpsdance_grpodance_precise
The launcher selects the trainer, config entrypoint, and PRECISE_SDE_LAUNCH_* environment together. The FLUX.2 Klein config builder lives in config/flux2_klein.py; the shared trainer is precise_sde/train/rl_trainer.py.
Evaluation
eval/infer_eval.sh runs through the root uv project and supports FLUX.2 Klein checkpoints. Pass at least one checkpoint base explicitly:
bash eval/infer_eval.sh \
--flux \
--ckpt-base checkpoints/logs/run-name/checkpoints \
--eval-config '1000|precise|0|20|pickscore|{"clipscore": 1.0}'Use repeated --ckpt-base and --eval-config arguments to evaluate multiple runs in one invocation.
Reward Server Setup
Remote reward services are intentionally isolated from the main training environment when they need heavyweight or version-sensitive dependencies.
GenEval
GenEval runs as a separate nested uv project:
bash precise_sde/rewards/servers/geneval/bootstrap.sh bash precise_sde/rewards/servers/geneval/start_server.sh uv run --project precise_sde/rewards/servers/geneval \ python precise_sde/rewards/servers/geneval/check_server.py
The server binds to 127.0.0.1:18085 by default. Override the client URL with PRECISE_SDE_GENEVAL_URL.
UnifiedReward
UnifiedReward v2 should run in a fresh conda environment rather than the repo root uv environment:
conda create -n vllm python=3.12 -y conda activate vllm pip install -r precise_sde/rewards/servers/unified_reward/requirements.txt bash precise_sde/rewards/servers/unified_reward/start_server.sh
Probe a running server with:
uv run --project . python precise_sde/rewards/servers/unified_reward/test_api.py \ --base-url http://127.0.0.1:8080 --tests 1,2,4
Point training at the server with PRECISE_SDE_UNIFIEDREWARD_URL or PRECISE_SDE_UNIFIEDREWARD_URLS. The exact request and response contract expected by training is documented in precise_sde/rewards/servers/unified_reward/README.md.
Acknowledgements
This repository builds on the Flow-GRPO training codebase and compares with Flow-GRPO, Dance-GRPO, and CPS samplers. The experiments use FLUX.2 Klein as the…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Low-star new repo, minimal traction.