What does this fork signal mean?

Nous Research forked NousResearch/Liger-Kernel (forked from linkedin/Liger-Kernel). This fork signal points to upstream code the lab may be inspecting, patching, or building on. High-signal details: repo NousResearch/Liger-Kernel · parent linkedin/Liger-Kernel · Trivial fork with minimal traction. onlylabs links this event to 1 captured evidence page and 6 related fork signals.

Nous Research Fork: NousResearch/Liger-Kernel

Captured source

source ↗

GitHub/github.com/NousResearch/Liger-Kernel

NousResearch/Liger-Kernel repository metadata

Source ↗

published Oct 15, 2025seen Jun 6captured Jun 11http 200method plain

NousResearch/Liger-Kernel

Description: Efficient Triton Kernels for LLM Training

License: BSD-2-Clause

Stars: 5

Forks: 2

Open issues: 0

Created: 2025-10-15T15:54:49Z

Pushed: 2025-10-16T02:57:26Z

Default branch: main

Fork: yes

Parent repository: linkedin/Liger-Kernel

Archived: no

README:

Liger Kernel: Efficient Triton Kernels for LLM Training

Stable Nightly Discord

Latest News 🔥

[2025/03/06] We release a joint blog post on TorchTune × Liger - Peak Performance, Minimized Memory: Optimizing torchtune’s performance with torch.compile & Liger Kernel
[2024/12/11] We release v0.5.0: 80% more memory efficient post training losses (DPO, ORPO, CPO, etc)!
[2024/12/5] We release LinkedIn Engineering Blog - Liger-Kernel: Empowering an open source ecosystem of Triton Kernels for Efficient LLM Training
[2024/11/6] We release v0.4.0: Full AMD support, Tech Report, Modal CI, Llama-3.2-Vision!
[2024/10/21] We have released the tech report of Liger Kernel on Arxiv: https://arxiv.org/pdf/2410.10989
[2024/9/6] We release v0.2.1 (X post). 2500+ Stars, 10+ New Contributors, 50+ PRs, 50k Downloads in two weeks!
[2024/8/31] CUDA MODE talk, Liger-Kernel: Real-world Triton kernel for LLM Training, Slides
[2024/8/23] Official release: check out our X post

Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%. We have implemented Hugging Face Compatible RMSNorm, RoPE, SwiGLU, CrossEntropy, FusedLinearCrossEntropy, and more to come. The kernel works out of the box with Flash Attention, PyTorch FSDP, and Microsoft DeepSpeed. We welcome contributions from the community to gather the best kernels for LLM training.

We've also added optimized Post-Training kernels that deliver up to 80% memory savings for alignment and distillation tasks. We support losses like DPO, CPO, ORPO, SimPO, KTO, JSD, and many more. Check out how we optimize the memory.

You can view the documentation site for additional installation, usage examples, and API references:https://linkedin.github.io/Liger-Kernel/

Supercharge Your Model with Liger Kernel

!Banner

With one line of code, Liger Kernel can increase throughput by more than 20% and reduce memory usage by 60%, thereby enabling longer context lengths, larger batch sizes, and massive vocabularies.

| Speed Up | Memory Reduction | |--------------------------|-------------------------| | !Speed up | !Memory |

> Note: > - Benchmark conditions: LLaMA 3-8B, Batch Size = 8, Data Type = bf16, Optimizer = AdamW, Gradient Checkpointing = True, Distributed Strategy = FSDP1 on 8 A100s. > - Hugging Face models start to OOM at a 4K context length, whereas Hugging Face + Liger Kernel scales up to 16K.

Optimize Post Training with Liger Kernel

We provide optimized post training kernels like DPO, ORPO, SimPO, and more which can reduce memory usage by up to 80%. You can easily use them as python modules.

from liger_kernel.chunked_loss import LigerFusedLinearORPOLoss
orpo_loss = LigerFusedLinearORPOLoss()
y = orpo_loss(lm_head.weight, x, target)

Examples

| Use Case | Description | |------------------------------------------------|---------------------------------------------------------------------------------------------------| | **Hugging Face Trainer** | Train LLaMA 3-8B ~20% faster with over 40% memory reduction on Alpaca dataset using 4 A100s with FSDP | | **Lightning Trainer** | Increase 15% throughput and reduce memory usage by 40% with LLaMA3-8B on MMLU dataset using 8 A100s with DeepSpeed ZeRO3 | | **Medusa Multi-head LLM (Retraining Phase)** | Reduce memory usage by 80% with 5 LM heads and improve throughput by 40% using 8 A100s with FSDP | | **Vision-Language Model SFT** | Finetune Qwen2-VL on image-text data using 4 A100s with FSDP | | **Liger ORPO Trainer** | Align Llama 3.2 using Liger ORPO Trainer with FSDP with 50% memory reduction |

Key Features

Ease of use: Simply patch your Hugging Face model with one line of code, or compose your own model using our Liger Kernel modules.
Time and memory efficient: In the same spirit as Flash-Attn, but for layers like RMSNorm, RoPE, SwiGLU, and CrossEntropy! Increases multi-GPU training throughput by 20% and reduces memory usage by 60% with kernel fusion, in-place replacement, and chunking techniques.
Exact: Computation is exact—no approximations! Both forward and backward passes are implemented with rigorous unit tests and undergo convergence testing against training runs without Liger Kernel to ensure accuracy.
Lightweight: Liger Kernel has minimal dependencies, requiring only Torch and Triton—no extra libraries needed! Say goodbye to dependency...

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Trivial fork with minimal traction