ByteDance-Seed/ConfRover
Python
Captured source
source ↗ByteDance-Seed/ConfRover
Description: [Neurips 2025] Official Implementation of ConfRover models
Language: Python
License: Apache-2.0
Stars: 26
Forks: 2
Open issues: 1
Created: 2025-11-10T23:52:02Z
Pushed: 2026-02-03T19:32:15Z
Default branch: main
Fork: no
Archived: no
README:
An illustration of protein dynamics generated by MD simulation (left), by sampling from the energy landscape defined by a physical model (right).
ConfRover is a deep generative model that learns to produce dynamic protein trajectories directly from MD data. It generates conformational frames in a trajectory autoregressively, sampling each next frame conditioned on historical context. Building on a causal transformer architecture widely used in language models, ConfRover enables efficient training and jointly learns the distribution of protein conformations and their temporal evolution at coarse time steps, offering a fast proxy for expensive MD simulations.
By modeling different dependency patterns, ConfRover supports various tasks:
| Tasks | Input condition | | --- |--- | | Forward simulation | amino acid sequence $\mathbf{s}$, starting frame $\mathbf{x}_1$,number of frames $L$, time interval (stride) $\Delta t$| | IID sampling | amino acid sequence $\mathbf{s}$, number of samples $N$ | | State interpolation |amino acid sequence $\mathbf{s}$, starting frame $\mathbf{x}_1$, ending frame $\mathbf{x}_2$, number of frames $L$, time interval (stride) $\Delta t$|
See our paper and website for more details.
(A) ConfRover unifies the learning of conformational distributions and their temporal evolution through frame-level conditioning, enabling multiple tasks. (B) Its general autoregressive formulation captures temporal dependencies directly from data using a causal transformer.
Updates
- [2025-11] ConfRover v1.0 released!
- [2025-09] ConfRover is accepted to NeurIPS 2025!
> #### 🔔 Stay Updated > ConfRover is under active development. Click Watch to follow new updates and releases.
Pretrained models
We provide a list of pretrained models from the ConfRover family:
| Model | Best for | Downloaded Checkpoints | | --- | --- |--- | | ConfRover-base-20M-v1.0 | Forward simulation and IID sampling | | | ConfRover-interp-20M-v1.0 | State interpolation | |
Pretrained model checkpoints can also be downloaded through ConfRover.from_pretrained(model_name).
Quick Start
Installation
# [recommended] use conda environment conda create -n confrover python=3.10 conda activate confrover # clone ConfRover repository git clone https://github.com/ByteDance-Seed/ConfRover.git cd ConfRover # first install confrover and other dependencies, then openfold (requires torch pre-installed) pip install . && pip install --no-build-isolation .[openfold]
> ConfRover has been tested on NVIDIA H100 with CUDA 12.6
Python API
ConfRover is installed as a Python package and provides a simple API for generating conformations or trajectories for a single case. See following snippet and examples\ folder:
from confrover.model import ConfRover
# Load pretrained model
model = ConfRover.from_pretrained("ConfRover-base-20M-v1.0") # see method for optional arguments
# Move to GPU
model.to("cuda:0")
# Task 1: forward simulation
model.generate(
case_id="6j56_A",
seqres="ARQREIEMNRQQRFFRIPFIRPADQYKDPQSKKKGWWYAHFDGPWIARQMELHPDKPPILLVAGKDDMEMCELNLEETGLTRKRGAEILPRQFEEIWERCGGIQYLQNAIESRQARPTYATAMLQSLLK",
task_mode="forward",
output_dir="/path/to/output/fwd/",
n_replicates=1,
n_frames=10, # total number of frames (including the starting frame)
stride_in_10ps=256, # time interval between frames in the unit of 10 ps.
conditions="/path/to/examples/6j56_A_start.pdb", # start frame
)
# Task 2: Independent ensemble sampling
model.generate(
case_id="6j56_A",
seqres="ARQREIEMNRQQRFFRIPFIRPADQYKDPQSKKKGWWYAHFDGPWIARQMELHPDKPPILLVAGKDDMEMCELNLEETGLTRKRGAEILPRQFEEIWERCGGIQYLQNAIESRQARPTYATAMLQSLLK",
task_mode="iid",
output_dir="/path/to/output/iid/",
n_replicates=50, # number of conformation samples
)
# Task 3: interpolating two conformations
model.generate(
case_id="6j56_A",
seqres="ARQREIEMNRQQRFFRIPFIRPADQYKDPQSKKKGWWYAHFDGPWIARQMELHPDKPPILLVAGKDDMEMCELNLEETGLTRKRGAEILPRQFEEIWERCGGIQYLQNAIESRQARPTYATAMLQSLLK",
task_mode="interp",
output_dir="/path/to/output/interp/",
n_replicates=5,
n_frames=9,
stride_in_10ps=256,
conditions = [
"/path/to/examples/6j56_A_start.pdb",
"/path/to/examples/6j56_A_end.pdb",
],
)Method ConfRover.generate() is designed for simple runs or integrating ConfRover into customized pipelines. For batch generation, we recommend using the command line interface.
Command line interface
ConfRover provides a command line interface for parallel generation over multiple GPUs. A .json` manifest file is required to specify the generation tasks and cases.
confrover generate \ --job_config \ --output \ --model \ [...] # See `confrover generate --help` for detailed arguments.
Input manifest format
ConfRover uses JSON files to define generation tasks for forward simulation, interpolation, and IID sampling. The file specifies basic dataset information and a list of cases, each describing the protein name, animo acid sequence, and optional conditioning frames for trajectory generation. Conditioning frames can be provided from conformations in .pdb files or from specific frames in an .xtc trajectory file (using frame indices). Following examples show the format to define each generation jobs.
1. Forward simulation: generate protein motion trajectories from an initial conformation ("condition") at a specified stride.
{
"name": "job_name",
"task_mode": "forward",
"n_replicates": 1, // number of replicated trajectories for each case.
"n_frames": 100, // number of frames in each generated trajectory (including the conditioning frame).
"stride_in_10ps": 120, // interval between frames in the unit of 10 ps.
"cases": [
// Option 1: starting from a pair of .pdb files
{
"case_id": "7jfl_C", // case_id must be unique
"seqres": "SALQDLLRTLKSPSSPQQQQQVLNILKSNPQLMAAFIKQRTAKYVAN", // amino acid sequence
"conditions": "/path/to/7jfl_C.pdb" // path to the starting .pdb file
},
// Option 2: starting from a time frame defined in a .xtc file
{
"case_id": "7lp1_A",
"seqres": "VTQSFLPPGWEMRIAPNGRPFFIDHNTKTTTWEDPRLKF",
"conditions": {…Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Low stars, routine new repo.