RepoDeepSeekDeepSeekpublished Jan 12, 2026seen 6d

deepseek-ai/Engram

Python

Open original ↗

Captured source

source ↗
published Jan 12, 2026seen 6dcaptured 11hhttp 200method plain

deepseek-ai/Engram

Description: Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

Language: Python

License: Apache-2.0

Stars: 4447

Forks: 340

Open issues: 20

Created: 2026-01-12T05:26:50Z

Pushed: 2026-01-14T01:13:02Z

Default branch: main

Fork: no

Archived: no

README:

1. Introduction

This repository contains the official implementation for the paper: [Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models](Engram_paper.pdf).

> Abstract: While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup. To address this, we explore conditional memory as a complementary sparsity axis, instantiated via Engram, a module that modernizes classic $N$-gram embeddings for $\mathcal{O}(1)$ lookup.

Key Contributions:

  • Sparsity Allocation: We formulate the trade-off between neural computation (MoE) and static memory (Engram), identifying a U-shaped scaling law that guides optimal capacity allocation.
  • Empirical Verification: Under strict iso-parameter and iso-FLOPs constraints, the Engram-27B model demonstrates consistent improvements over MoE baselines across knowledge, reasoning, code and math domains.
  • Mechanistic Analysis: Our analysis suggests that Engram relieves early layers from static pattern reconstruction, potentially preserving effective depth for complex reasoning.
  • System Efficiency: The module employs deterministic addressing, enabling the offloading of massive embedding tables to host memory with minimal inference overhead.

2. Architecture

The Engram module augments the backbone by retrieving static $N$-gram memory and fusing it with dynamic hidden states. The architecture is shown below ([drawio provided](drawio/Engram.drawio)):

3. Evaluation

Scaling Law

---

Large Scale Pre-training

---

Long-context Training

4. Case Study of Engram

5. Quick Start

We recommend using Python 3.8+ and PyTorch.

pip install torch numpy transformers sympy

We provide a standalone implementation to demonstrate the core logic of the Engram module:

python engram_demo_v1.py

> ⚠️ Note: The provided code is a demonstration version intended to illustrate the data flow. It mocks standard components (like Attention/MoE/mHC) to focus on the Engram module.

6. License

The use of Engram models is subject to [the Model License](LICENSE).

7. Contact

If you have any questions, please raise an issue or contact us at [service@deepseek.com](mailto:service@deepseek.com).

Notability

notability 6.0/10

New repo from notable lab, moderate stars