ForkBasetenBasetenpublished Feb 6, 2026seen 5d

basetenlabs/ms-swift

forked from modelscope/ms-swift

Open original ↗

Captured source

source ↗
published Feb 6, 2026seen 5dcaptured 13hhttp 200method plain

basetenlabs/ms-swift

Description: Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...) (AAAI 2025).

Language: Python

License: Apache-2.0

Stars: 1

Forks: 0

Open issues: 0

Created: 2026-02-06T18:05:47Z

Pushed: 2026-05-29T17:15:54Z

Default branch: main

Fork: yes

Parent repository: modelscope/ms-swift

Archived: no

README:

SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning)

ModelScope Community Website

中文 &nbsp | &nbsp English &nbsp

Paper &nbsp | English Documentation &nbsp | &nbsp 中文文档 &nbsp

📖 Table of Contents

  • [Groups](#-Groups)
  • [Introduction](#-introduction)
  • [News](#-news)
  • [Installation](#%EF%B8%8F-installation)
  • [Quick Start](#-quick-Start)
  • [Usage](#-Usage)
  • [License](#-License)
  • [Citation](#-citation)

☎ Groups

You can contact us and communicate with us by adding our group:

Discord Group | WeChat Group :-------------------------:|:-------------------------: |

📝 Introduction

🍲 ms-swift is a large model and multimodal large model fine-tuning and deployment framework provided by the ModelScope community. It now supports training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment for 600+ text-only large models and 400+ multimodal large models. Large models include: Qwen3, Qwen3.5, InternLM3, GLM4.5, Mistral, DeepSeek-R1, Llama4, etc. Multimodal large models include: Qwen3-VL, Qwen3-Omni, Llava, InternVL3.5, MiniCPM-V-4, Ovis2.5, GLM4.5-V, DeepSeek-VL2, etc.

🍔 In addition, ms-swift integrates the latest training technologies, including Megatron parallelism techniques such as TP, PP, CP, EP to accelerate training, as well as numerous GRPO algorithm family reinforcement learning algorithms including: GRPO, DAPO, GSPO, SAPO, CISPO, RLOO, Reinforce++, etc. to enhance model intelligence. ms-swift supports a wide range of training tasks, including preference learning algorithms such as DPO, KTO, RM, CPO, SimPO, ORPO, as well as Embedding, Reranker, and sequence classification tasks. ms-swift provides full-pipeline support for large model training, including acceleration for inference, evaluation, and deployment modules using vLLM, SGLang, and LMDeploy, as well as model quantization using GPTQ, AWQ, BNB, and FP8 technologies.

Why Choose ms-swift?

  • 🍎 Model Types: Supports 600+ text-only large models, 400+ multimodal large models, and All-to-All full modality models from training to deployment full pipeline, with Day-0 support for popular models.
  • Dataset Types: Built-in 150+ datasets for pre-training, fine-tuning, human alignment, multimodal and various other tasks, with support for custom datasets. Users only need to prepare datasets for one-click training.
  • Hardware Support: Supports A10/A100/H100, RTX series, T4/V100, CPU, MPS, and domestic hardware Ascend NPU, etc.
  • Lightweight Training: Supports lightweight fine-tuning methods such as LoRA, QLoRA, DoRA, LoRA+, LLaMAPro, LongLoRA, LoRA-GA, ReFT, RS-LoRA, Adapter, LISA, etc.
  • Quantized Training: Supports training on BNB, AWQ, GPTQ, AQLM, HQQ, EETQ quantized models, requiring only 9GB training resources for 7B models.
  • Memory Optimization: GaLore, Q-Galore, UnSloth, Liger-Kernel, Flash-Attention 2/3, and Ulysses and Ring-Attention sequence parallelism techniques support, reducing memory consumption for long-text training.
  • Distributed Training: Supports distributed data parallelism (DDP), device_map simple model parallelism, DeepSpeed ZeRO2 ZeRO3, FSDP/FSDP2, and Megatron distributed training technologies.
  • 🍓 Multimodal Training: Supports multimodal packing technology to improve training speed by 100%+, supports mixed modality data training with text, images, video and audio, and supports independent control of vit/aligner/llm.
  • Agent Training: Supports Agent templates, allowing one dataset to be used for training different models.
  • 🍊 Training Tasks: Supports pre-training and instruction fine-tuning, as well as training tasks such as DPO, GKD, KTO, RM, CPO, SimPO, ORPO, and supports Embedding/Reranker and sequence classification tasks.
  • 🥥 Megatron Parallelism: Provides TP/PP/SP/CP/ETP/EP/VPP parallel strategies to significantly boost MoE model training speed. Supports full-parameter and LoRA training methods for 300+ pure text large models and 100+ multimodal large models. Supports CPT/SFT/GRPO/DPO/KTO/RM training tasks.
  • 🍉 Reinforcement Learning: Built-in rich GRPO family algorithms, including GRPO, DAPO, GSPO, SAPO, CISPO, CHORD, RLOO, Reinforce++, etc. Supports synchronous and asynchronous vLLM engine inference acceleration, with extensible reward functions, multi-turn inference Schedulers, and environments through plugins.
  • Full-Pipeline Capabilities: Covers the entire workflow of training, inference, evaluation, quantization, and deployment.
  • UI Training: Provides Web-UI interface for training, inference, evaluation, and quantization, completing the full pipeline for large models.
  • Inference Acceleration: Supports Transformers, vLLM, SGLang, and LmDeploy inference acceleration engines, providing OpenAI interfaces for accelerating inference, deployment, and evaluation modules.
  • Model Evaluation: Uses EvalScope as the evaluation backend, supporting 100+ evaluation datasets for evaluating text-only and multimodal models.
  • Model Quantization: Supports quantization export for AWQ, GPTQ, FP8, and BNB. Exported models support inference acceleration using vLLM/SGLang/LmDeploy.

🎉 News

  • 🎁 2026.03.03: ms-swift v4.0 major version is officially released. For release notes, please refer to here. You can provide your suggestions to us in this issue. Thank you for your support.
  • 🎁 2025.11.14: Megatron GRPO is now available! Check out the [docs](./docs/source_en/Megatron-SWIFT/GRPO.md) and [examples](examples/megatron/grpo).
  • 🎁 2025.11.04: Support for [Mcore-Bridge](docs/source_en/Megatron-SWIFT/Mcore-Bridge.md), making Megatron training as simple and easy to use as transformers.
  • 🎁 2025.10.28: Ray [here](docs/source_en/Instruction/Ray.md).
  • 🎁 2025.09.07: Added support for CHORD training…

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Routine fork, trivial traction