RepoByteDance (Doubao/Seed)ByteDance (Doubao/Seed)published Nov 10, 2025seen 5d

ByteDance-Seed/ConfRover

Python

Open original ↗

Captured source

source ↗
published Nov 10, 2025seen 5dcaptured 13hhttp 200method plain

ByteDance-Seed/ConfRover

Description: [Neurips 2025] Official Implementation of ConfRover models

Language: Python

License: Apache-2.0

Stars: 26

Forks: 2

Open issues: 1

Created: 2025-11-10T23:52:02Z

Pushed: 2026-02-03T19:32:15Z

Default branch: main

Fork: no

Archived: no

README:

An illustration of protein dynamics generated by MD simulation (left), by sampling from the energy landscape defined by a physical model (right).

ConfRover is a deep generative model that learns to produce dynamic protein trajectories directly from MD data. It generates conformational frames in a trajectory autoregressively, sampling each next frame conditioned on historical context. Building on a causal transformer architecture widely used in language models, ConfRover enables efficient training and jointly learns the distribution of protein conformations and their temporal evolution at coarse time steps, offering a fast proxy for expensive MD simulations.

By modeling different dependency patterns, ConfRover supports various tasks:

| Tasks | Input condition | | --- |--- | | Forward simulation | amino acid sequence $\mathbf{s}$, starting frame $\mathbf{x}_1$,number of frames $L$, time interval (stride) $\Delta t$| | IID sampling | amino acid sequence $\mathbf{s}$, number of samples $N$ | | State interpolation |amino acid sequence $\mathbf{s}$, starting frame $\mathbf{x}_1$, ending frame $\mathbf{x}_2$, number of frames $L$, time interval (stride) $\Delta t$|

See our paper and website for more details.

(A) ConfRover unifies the learning of conformational distributions and their temporal evolution through frame-level conditioning, enabling multiple tasks. (B) Its general autoregressive formulation captures temporal dependencies directly from data using a causal transformer.

Updates

  • [2025-11] ConfRover v1.0 released!
  • [2025-09] ConfRover is accepted to NeurIPS 2025!

> #### 🔔 Stay Updated > ConfRover is under active development. Click Watch to follow new updates and releases.

Pretrained models

We provide a list of pretrained models from the ConfRover family:

| Model | Best for | Downloaded Checkpoints | | --- | --- |--- | | ConfRover-base-20M-v1.0 | Forward simulation and IID sampling | | | ConfRover-interp-20M-v1.0 | State interpolation | |

Pretrained model checkpoints can also be downloaded through ConfRover.from_pretrained(model_name).

Quick Start

Installation

# [recommended] use conda environment
conda create -n confrover python=3.10
conda activate confrover

# clone ConfRover repository
git clone https://github.com/ByteDance-Seed/ConfRover.git
cd ConfRover

# first install confrover and other dependencies, then openfold (requires torch pre-installed)
pip install . && pip install --no-build-isolation .[openfold]

> ConfRover has been tested on NVIDIA H100 with CUDA 12.6

Python API

ConfRover is installed as a Python package and provides a simple API for generating conformations or trajectories for a single case. See following snippet and examples\ folder:

from confrover.model import ConfRover

# Load pretrained model
model = ConfRover.from_pretrained("ConfRover-base-20M-v1.0") # see method for optional arguments

# Move to GPU
model.to("cuda:0")

# Task 1: forward simulation
model.generate(
case_id="6j56_A",
seqres="ARQREIEMNRQQRFFRIPFIRPADQYKDPQSKKKGWWYAHFDGPWIARQMELHPDKPPILLVAGKDDMEMCELNLEETGLTRKRGAEILPRQFEEIWERCGGIQYLQNAIESRQARPTYATAMLQSLLK",
task_mode="forward",
output_dir="/path/to/output/fwd/",
n_replicates=1,
n_frames=10, # total number of frames (including the starting frame)
stride_in_10ps=256, # time interval between frames in the unit of 10 ps.
conditions="/path/to/examples/6j56_A_start.pdb", # start frame
)

# Task 2: Independent ensemble sampling
model.generate(
case_id="6j56_A",
seqres="ARQREIEMNRQQRFFRIPFIRPADQYKDPQSKKKGWWYAHFDGPWIARQMELHPDKPPILLVAGKDDMEMCELNLEETGLTRKRGAEILPRQFEEIWERCGGIQYLQNAIESRQARPTYATAMLQSLLK",
task_mode="iid",
output_dir="/path/to/output/iid/",
n_replicates=50, # number of conformation samples
)

# Task 3: interpolating two conformations
model.generate(
case_id="6j56_A",
seqres="ARQREIEMNRQQRFFRIPFIRPADQYKDPQSKKKGWWYAHFDGPWIARQMELHPDKPPILLVAGKDDMEMCELNLEETGLTRKRGAEILPRQFEEIWERCGGIQYLQNAIESRQARPTYATAMLQSLLK",
task_mode="interp",
output_dir="/path/to/output/interp/",
n_replicates=5,
n_frames=9,
stride_in_10ps=256,
conditions = [
"/path/to/examples/6j56_A_start.pdb",
"/path/to/examples/6j56_A_end.pdb",
],
)

Method ConfRover.generate() is designed for simple runs or integrating ConfRover into customized pipelines. For batch generation, we recommend using the command line interface.

Command line interface

ConfRover provides a command line interface for parallel generation over multiple GPUs. A .json` manifest file is required to specify the generation tasks and cases.

confrover generate \
--job_config \
--output \
--model \
[...]

# See `confrover generate --help` for detailed arguments.

Input manifest format

ConfRover uses JSON files to define generation tasks for forward simulation, interpolation, and IID sampling. The file specifies basic dataset information and a list of cases, each describing the protein name, animo acid sequence, and optional conditioning frames for trajectory generation. Conditioning frames can be provided from conformations in .pdb files or from specific frames in an .xtc trajectory file (using frame indices). Following examples show the format to define each generation jobs.

1. Forward simulation: generate protein motion trajectories from an initial conformation ("condition") at a specified stride.

{
"name": "job_name",
"task_mode": "forward",
"n_replicates": 1, // number of replicated trajectories for each case.
"n_frames": 100, // number of frames in each generated trajectory (including the conditioning frame).
"stride_in_10ps": 120, // interval between frames in the unit of 10 ps.
"cases": [
// Option 1: starting from a pair of .pdb files
{
"case_id": "7jfl_C", // case_id must be unique
"seqres": "SALQDLLRTLKSPSSPQQQQQVLNILKSNPQLMAAFIKQRTAKYVAN", // amino acid sequence
"conditions": "/path/to/7jfl_C.pdb" // path to the starting .pdb file
},
// Option 2: starting from a time frame defined in a .xtc file
{
"case_id": "7lp1_A",
"seqres": "VTQSFLPPGWEMRIAPNGRPFFIDHNTKTTTWEDPRLKF",
"conditions": {…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low stars, routine new repo.