What does this repo signal mean?

Tencent Hunyuan published Tencent-Hunyuan/Thinking-Free_Policy_Initialization (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo Tencent-Hunyuan/Thinking-Free_Policy_Initialization · language Python · New repo, moderate traction.. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Safety and policy in the data-business radar.

Tencent Hunyuan Repo: Tencent-Hunyuan/Thinking-Free_Policy_Initialization

Captured source

source ↗

GitHub/github.com/Tencent-Hunyuan/Thinking-Free_Policy_Initialization

Tencent-Hunyuan/Thinking-Free_Policy_Initialization repository metadata

Source ↗

published Nov 6, 2025seen Jun 5captured Jun 11http 200method plain

Tencent-Hunyuan/Thinking-Free_Policy_Initialization

Description: The official code of [ICLR 2026] TFPI: Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners

Language: Python

License: NOASSERTION

Stars: 103

Forks: 12

Open issues: 0

Created: 2025-11-06T07:38:13Z

Pushed: 2026-01-27T11:59:26Z

Default branch: main

Fork: no

Archived: no

README:

1Hunyuan LLM Department, Tencent&emsp;

2The Hong Kong University of Science and Techology&emsp;

3The University of Hong Kong&emsp;

Overview

Thinking-Free Policy Initialization (TFPI), a simple yet effective adaptation to Reinforcement Learning with Verifiable Reward (RLVR) that bridges long Chain-of-Thought (CoT) distillation and standard RLVR. TFPI employs a simple *ThinkingFree* operation, explicitly discarding the thinking content via a direct append, to reduce token usage during inference. Training with *ThinkingFree*-adapted inputs improves performance and lowers token consumption, even in the original slow-thinking mode. Extensive experiments across various benchmarks have shown that TFPI accelerates RL convergence, achieves a higher performance ceiling, and yields more token-efficient reasoning models without specialized rewards or complex training designs. With TFPI only, we can train a 4B model to reach 89.0% accuracy on AIME24 and 65.5% on LiveCodeBench with extremely low training compute.

📝 News

[2026/01/26] Our paper is accepted to ICLR 2026.
[2025/12/22] We released the codes.
[2025/11/7] We released the model checkpoints.
[2025/9/30] We released the paper!

🚀 Quick Start

Installation

1. Environment setup

conda create -n TFPI python=3.10 -y
conda activate TFPI

2. Requirements installation

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
pip install vllm==0.8.5.post1
pip install -e .
pip install vertexai
pip install sentence_transformers
pip install flash-attn==2.7.4.post1 --no-build-isolation

Run Training

The training dataset is the training prompts in Polaris-53K.

First, download and transform the format of training data using the following Python script:

python scripts/download_train.py

The training data is saved in ./data/train/tfpi-polaris53k.parquet

Next, adapt the training script in "./scripts/train/qwen3-4b-tfpi.sh" by setting the WandB key, model path and dataset path.

Finally, run the following commands at the master node:

bash ./scripts/ray_start.sh # start ray
bash ./scripts/train/qwen3-4b-tfpi.sh # submit training

Run Evaluation

First, download the evaluation datasets using

hf download xx18/TFPI-EVA --repo-type=dataset --local-dir ./data/eval

All test datasets are downloaded to the folder data/eval.

for evaluation, use:

bash ./scripts/ray_start.sh # start ray, use pssh to run on multiple nodes if necessary
bash scripts/eval/start_generate.sh

The resulted metrics and evaluation outputs will be saved under the folder your_model_path/eval_results

For IFEval, please refer to the official repo IFEval evaluation.

🤗 Datasets and Models

we are open-sourcing our complete codes, and training details for the research community. All our resulted checkpoints can be found in TFPI Collection.

🤝 Acknowledgement

We are deeply grateful for the following GitHub...

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New repo, moderate traction.