RepoDeepSeekDeepSeekpublished Feb 26, 2025seen 6d

deepseek-ai/DualPipe

Python

Open original ↗

Captured source

source ↗
published Feb 26, 2025seen 6dcaptured 8hhttp 200method plain

deepseek-ai/DualPipe

Description: A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Language: Python

License: MIT

Stars: 2965

Forks: 326

Open issues: 5

Created: 2025-02-26T13:29:57Z

Pushed: 2026-01-14T06:34:59Z

Default branch: main

Fork: no

Archived: no

README:

DualPipe

DualPipe is an innovative bidirectional pipeline parallelism algorithm introduced in the DeepSeek-V3 Technical Report. It achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles. For detailed information on computation-communication overlap, please refer to the profile data.

Schedules

![dualpipe](images/dualpipe.png)

Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions. The micro-batches in the reverse direction are symmetric to those in the forward direction, so we omit their batch ID for illustration simplicity. Two cells enclosed by a shared black border have mutually overlapped computation and communication

DualPipeV

DualPipeV is a concise V-shape schedule derived from DualPipe using a "cut-in-half" procedure, introduced by Sea AI Lab as "Cut-in-half" in their blog post. Thanks to them for this efficient schedule!

Schedules

![dualpipev](images/dualpipev.png)

Example DualPipeV scheduling for 4 PP ranks (8 PP stages) and 10 micro-batches.

Pipeline Bubbles and Memory Usage Comparison (based on the same number of PP stages)

| Method | Bubble | Parameter Per Device | Activation Per Device | #Devices | |-------------|---------------------------------|----------------------|-----------------------|----------| | 1F1B | (*PP*-1)(𝐹+𝐵) | 1× | *PP* | *PP* | | ZB1P | (*PP*-1)(𝐹+𝐵-2𝑊) | 1× | *PP* | *PP* | | DualPipe | (*PP*/2-1)(𝐹&𝐵+𝐵-3𝑊) | 2× | *PP*+1 | *PP* | | DualPipeV | (*PP*/2-1)(𝐹&𝐵+𝐵-3𝑊) | 2× | *PP*+1 | *PP*/2 |

*PP* denotes the number of pp stages (even). 𝐹 denotes the execution time of a forward chunk, 𝐵 denotes the execution time of a full backward chunk, 𝑊 denotes the execution time of a "backward for weights" chunk, and 𝐹&𝐵 denotes the execution time of two mutually overlapped forward and backward chunks.

Quick Start

The usage is shown in the following example:

python examples/example_dualpipe.py
python examples/example_dualpipev.py

Note: For real-world applications, you will need to implement a custom overlapped_forward_backward method tailored to your specific module.

Requirements

  • PyTorch 2.0 and above

Developers

DualPipe was created and developed by Jiashi Li and Chengqi Deng and Wenfeng Liang.

Citation

@misc{deepseekai2025deepseekv3technicalreport,
title={DeepSeek-V3 Technical Report},
author={DeepSeek-AI},
year={2025},
eprint={2412.19437},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.19437},
}

Notability

notability 7.0/10

Notable repo with strong community traction.