ByteDance-Seed/In-Place-TTT
Python
Captured source
source ↗ByteDance-Seed/In-Place-TTT
Language: Python
License: Apache-2.0
Stars: 231
Forks: 24
Open issues: 5
Created: 2026-04-07T05:50:45Z
Pushed: 2026-04-21T03:50:51Z
Default branch: main
Fork: no
Archived: no
README:
You can get to know us better through the following channels👇
In-Place Test-Time Training
Seamlessly Endowing LLMs with Test-Time Training Ability
Guhao Feng\*, Shengjie Luo\*, Kai Hua, Ge Zhang, Wenhao Huang, Di He, Tianle Cai
In-Place TTT is a drop-in test-time training method for Transformer LLMs. This repository provides the training, checkpoint conversion, inference, and evaluation stack built on VeOmni, together with recommended configs for Qwen3-8B and LLaMA-3.1-8B.
News
[2026/03] The codebase is open-sourced.
[2026/02] In-Place TTT is accepted to ICLR 2026 as an Oral presentation.
Table of Contents
- [In-Place Test-Time Training](#in-place-test-time-training)
- [News](#news)
- [Table of Contents](#table-of-contents)
- [Introduction](#introduction)
- [Getting Started](#getting-started)
- [Environment Setup](#environment-setup)
- [Data Preparation](#data-preparation)
- [Recommended Config](#recommended-config)
- [Training](#training)
- [Checkpoint Conversion](#checkpoint-conversion)
- [Evaluation](#evaluation)
- [Features](#features)
- [License](#license)
- [Citation](#citation)
- [About ByteDance Seed Team](#about-bytedance-seed-team)
Introduction
Current large language models follow a static "train then deploy" paradigm. Once deployed, model weights are frozen and cannot adapt to new information encountered during inference. This limits long-context reasoning, where useful information arrives progressively and the model would benefit from updating itself as it reads.
In-Place Test-Time Training (In-Place TTT) addresses this by updating a subset of model parameters, the MLP down-projection fast weights, during inference. Unlike prior TTT approaches that require architectural side modules or external memory, In-Place TTT stays inside the standard Transformer block and remains compatible with off-the-shelf autoregressive LLMs.
The method is centered around three ideas:
1. Architectural compatibility. Fast weights live in the existing MLP down-projection matrix, so no extra attention heads or memory modules are introduced. 2. LM-aligned objective. The fast-weight update is aligned with next-token prediction instead of a generic reconstruction target. 3. Chunk-wise update. Long sequences are split into chunks so updates can be computed efficiently and scaled to long contexts.

As used in this repo, the end-to-end workflow is:
1. Provide your own VeOmni-compatible processed dataset and base model assets. 2. Launch continual pretraining with VeOmni through train.sh and tasks/train_torch.py. 3. Export DCP checkpoints into HuggingFace format with scripts/merge_dcp_to_hf.py. 4. Run TTT-aware inference and RULER evaluation with inference_model/, eval.sh, and eval_config/.
The repository includes recommended training configs for Qwen3-8B and LLaMA-3.1-8B, checkpoint conversion utilities, and a full RULER evaluation pipeline via OpenCompass from 4K to 256K context lengths.
Getting Started
Environment Setup
Step 1. Install PyTorch and FlashAttention:
pip3 install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128 wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp311-cp311-linux_x86_64.whl pip3 install flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp311-cp311-linux_x86_64.whl rm flash_attn-2.8.3+cu12torch2.8cxx11abiTRUE-cp311-cp311-linux_x86_64.whl
Step 2. Install VeOmni from the validated commit:
pip3 install "veomni @ git+https://github.com/ByteDance-Seed/VeOmni.git@9b91e164bea9e17f17ed490aab5e076c2335ca25"
Step 3. Install the remaining dependencies:
pip3 install liger-kernel pip3 install byted-wandb torchdata blobfile datasets diffusers tiktoken timm pip3 install transformers==4.57.3 pip3 install opt_einsum einops pip3 uninstall -y byted-wandb wandb pip3 install byted-wandb
Step 4. Optionally verify the installed VeOmni source:
python3 - **Tip:** Set `ttt_target: input_embed` for from-scratch pretraining, or `ttt_target: hidden_states` for continual training.
model: model_path: /path/to/your_base_model foundation: ttt_layers: [0, 6, 12, 18, 24, 30, 36] ttt_mode: true ttt_proj: true ttt_lr: 3 ttt_chunk: 4096
data: train_path: /path/to/your_data train_size: 20000000000 dataloader_type: native datasets_type: iterable data_type: plaintext max_seq_len: 65536 text_keys: content_split drop_last: true
train: output_dir: /path/to/your_output_dir data_parallel_mode: fsdp2 global_batch_size: 64 micro_batch_size: 1 optimizer: adamw lr: 5.0e-6 lr_warmup_ratio: 0.02 lr_decay_style: cosine lr_decay_ratio: 0.90 weight_decay: 0.1 max_grad_norm: 1.0 max_steps: 5000 enable_mixed_precision: true enable_gradient_checkpointing: true enable_full_shard: true init_device: meta ckpt_manager: dcp save_steps: 500 save_hf_weights: true use_wandb: true
The corresponding recommended config files are: - `configs/pretrain/qwen3_longct.yaml` - `configs/pretrain/llama3_longct.yaml` ### Training Quick smoke run:
bash train.sh tasks/train_torch.py configs/pretrain/qwen3_longct.yaml \ --train.output_dir /path/to/your_output_dir \ --train.max_steps 1 \ --train.use_wandb false
Recommended Qwen config override:
bash train.sh tasks/train_torch.py configs/pretrain/qwen3_longct.yaml \ --train.wandb_project your_wandb_project \ --train.wandb_name your_run_name \ --train.output_dir /path/to/your_output_dir \ --model.foundation '{"ttt_layers":[0,6,12,18,24,30,36],"ttt_mode":true,"ttt_proj":true,"ttt_lr":3,"ttt_chunk":4096}'
Recommended LLaMA config override:
bash train.sh tasks/train_torch.py configs/pretrain/llama3_longct.yaml \ --train.wandb_project your_wandb_project \ --train.wandb_name your_run_name \ --train.output_dir /path/to/your_output_dir \ --model.foundation '{"ttt_layers":[0,6,12,18,24,30,36],"ttt_mode":true,"ttt_proj":true,"ttt_lr":3,"ttt_chunk":4096}'
### Checkpoint Conversion Convert VeOmni DCP checkpoints into HuggingFace…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10New repo with moderate stars