RepoByteDance (Doubao/Seed)ByteDance (Doubao/Seed)published Jun 18, 2026seen 16h

ByteDance-Seed/UAM

Python

Open original ↗

Captured source

source ↗
published Jun 18, 2026seen 16hcaptured 16hhttp 200method plain

ByteDance-Seed/UAM

Description: UAM: A Dual-Stream Perspective on Forgetting in VLA Training

Language: Python

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 0

Created: 2026-06-18T04:22:51Z

Pushed: 2026-06-25T12:27:52Z

Default branch: main

Fork: no

Archived: no

README:

UAM: Unified-Action-Model

This repository contains the UAM implementation for vision-language-action training and evaluation. Building on top of released Bagel, UAM keeps the pretrained VLM backbone and adds an action-oriented dorsal branch trained with visual dynamics prediction and action flow matching.

Project page: https://cladernyjorn.github.io/Unified-Action-Model.github.io/

Paper: UAM: A Dual-Stream Perspective on Forgetting in VLA Training

Paper Overview

UAM studies a common failure mode in VLA fine-tuning: directly adapting a pretrained VLM for robot control can improve action prediction while damaging the model's original semantic and multimodal understanding ability. The paper refers to this as an embodiment tax. UAM addresses it with a bridge-and-decoupling design: the pretrained VLM keeps semantic perception, a dorsal expert learns visual dynamics as an intermediate bridge, and an action expert focuses on control.

The key idea is to avoid forcing one shared stream to carry semantics, dynamics, and control at the same time. Instead, UAM uses a Mixture-of-Transformers style routing structure to connect:

  • Pretrained VLM: preserves visual-language semantics and general multimodal understanding.
  • Dorsal Expert: models visual dynamics and bridges the semantic-control gap.
  • Action Expert: predicts robot actions from the bridged representation.

This design is intended to make action learning less destructive to the base model's VLM capability while still improving action accuracy and out-of-distribution generalization.

The paper also evaluates whether the model retains general VLM ability after action-only training. UAM is trained without freezing the base parameters and still preserves strong multimodal benchmark performance compared with other VLA methods.

Code Scope

This repository open-sources the pipeline code for training and testing UAM. General multimodal language/VQA evaluation is run through vlmevalkit rather than reimplemented here.

The released code path is intentionally narrowed to UAM robot training and testing:

  • Model code: modeling/uam
  • Training entry: train/train_uam.py
  • Canonical training script: scripts/train_uam.sh
  • Inference implementation: inferencer_uam.py
  • Single-step inference: infz_calvin_step10_act.py, infz_aloha_act.py, infz_robotwin_act.py
  • Task environment wrappers: eval/model_wrapper.py, eval/aloha_model_wrapper.py, eval/robotwin_model_wrapper.py
  • CALVIN checkpoint evaluation: eval/eval_ckpts.py
  • UAM VLM/VQA capability wrapper: eval/vqa_model_wrapper.py

Dataset Preparation

The public configs are:

  • data/configs/calvin_goal_predict-256-act-multiview.yaml
  • data/configs/aloha_step24_goal-256-act-near2-normalize-endpose_proprio-normalize.yaml
  • data/configs/robotwin_step3x16_goal_clean_imageaug.yaml

Dataset paths are set manually in [data/dataset_info.py](/Users/bytedance/Desktop/Freelunch/UAM-codes-public/data/dataset_info.py). Before training/evaluation, edit these constants directly:

CALVIN_TRAIN_DIR = "/path/to/calvin/task_ABC_D/training"
CALVIN_VAL_DIR = "/path/to/calvin/task_ABC_D/validation"
ALOHA_DIR = "/path/to/aloha/task30_1"
ALOHA_JSONL = "/path/to/aloha/task30_1/data_info.jsonl"
ROBOTWIN_DIR = "/path/to/robotwin"
ROBOTWIN_JSONL = "/path/to/robotwin_index.jsonl"

CALVIN

Download the official CALVIN dataset and use the Task ABCD split. Then set:

CALVIN_TRAIN_DIR = "/path/to/calvin/task_ABC_D/training"
CALVIN_VAL_DIR = "/path/to/calvin/task_ABC_D/validation"

The CALVIN loader expects the standard official folder layout, including the lang_clip_resnet50/auto_lang_ann.npy annotations under the split directory.

RoboTwin

UAM code supports the official RoboTwin data format directly. The only extra file required by this training code is a content index that lists each episode hdf5 file and its corresponding instruction file.

The index format follows [Robotwin_subset/robotwin_16_subset.jsonl](/Users/bytedance/Desktop/Freelunch/UAM-codes-public/Robotwin_subset/robotwin_16_subset.jsonl). Despite the .jsonl suffix, the loader reads it with json.load, so the file should be a JSON list:

[
{
"data_path": "/path/to/robotwin/task_name/demo_clean/data/episode0.hdf5",
"instruction_path": "/path/to/robotwin/task_name/demo_clean/instructions/episode0.json"
},
{
"data_path": "/path/to/robotwin/task_name/demo_clean/data/episode1.hdf5",
"instruction_path": "/path/to/robotwin/task_name/demo_clean/instructions/episode1.json"
}
]

Then point ROBOTWIN_JSONL in data/dataset_info.py to that index file. The bundled Robotwin_subset/robotwin_16_subset.jsonl is kept as a 16-task example/index template; update its paths to match your local RoboTwin data root before training.

Training

Install first:

pip install -r requirements.txt

Pass the pretrained base model path explicitly, then choose a task:

TASK=calvin scripts/train_uam.sh /path/to/BAGEL-7B-MoT
TASK=aloha scripts/train_uam.sh /path/to/BAGEL-7B-MoT
TASK=robotwin scripts/train_uam.sh /path/to/BAGEL-7B-MoT

Useful overrides:

TASK=calvin \
FREEZE_UND=true \
FREEZE_VIT=true \
ACTION_SAMPLE_MODE=beta \
NUM_GPU=8 \
scripts/train_uam.sh /path/to/BAGEL-7B-MoT

The script forwards to train/train_uam.py and writes model_args.json, data_args.json, and training_args.json into the run directory. These files are required by the evaluation wrappers.

Single-Step Inference

The following files are kept as task-specific single-step inference tests:

python infz_calvin_step10_act.py /path/to/checkpoint/model.safetensors \
--basemodel_dir /path/to/BAGEL-7B-MoT \
--test_sample_dir /path/to/calvin_sample_dir

python infz_aloha_act.py /path/to/checkpoint/model.safetensors \
--basemodel_dir /path/to/BAGEL-7B-MoT \
--test_sample_dir /path/to/aloha_sample_dir

python infz_robotwin_act.py /path/to/checkpoint/model.safetensors \
--basemodel_dir /path/to/BAGEL-7B-MoT \
--test_sample_dir...

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

Routine new repo from ByteDance Seed lab.