NVIDIA/nv-embedding-cache
C++
Captured source
source ↗NVIDIA/nv-embedding-cache
Description: Fast hierarchical embedding cache for recommenders
Language: C++
License: Apache-2.0
Stars: 18
Forks: 1
Open issues: 0
Created: 2026-01-20T18:20:51Z
Pushed: 2026-05-12T11:36:27Z
Default branch: main
Fork: no
Archived: no
README:
NV Embedding Cache
NV Embedding Cache is a domain-specific SDK for high performance recommender systems embedding lookup. We accelerate embedding lookups with a combination of SW caches in GPU/Host memory and customized CUDA kernels. The main focus is recommender inference with embeddings that exceed the local GPU's memory capacity.
The SDK offers several configurations to support different memory allocations:
- All embeddings are allocated in linear GPU memory: use NVEmbedding with cache_type NoCache(Py) / GPUEmbeddingLayer (C++)
- Some embeddings are cached in GPU memory and all embeddings are in linear memory (Host or other GPUs): use NVEmbedding with cache_type LinearUVM(Py) / LinearUVMEmbeddingLayer (C++)
- Some embeddings are cached in GPU memory, Some embeddings cached in host memory and all embeddings kept in a remote parameter server:
use NVEmbedding with cache_type Hierarchical(Py) / HierarchicalEmbeddingLayer (C++)
** Linear memory in this context, means all embeddings are consecutive in virtual memory space. More specifically, the address of embedding *i* can be computed as start_address + *i* * embedding_size
Getting Started
Prerequisites
- C++17 capable compiler (we test with both GCC 13.3 and Clang 20.1)
- CUDA 12.8+ (earlier version will work with minor code changes)
- CMake 3.18+
- Python 3.10+
- Torch
- (Optional) Redis 7.0.15+ - used in some tests
The provided [Dockerfile](Dockerfile) satisfies these prerequisites. If you're using your own environment, you can skip step (2) in the installation instructions below.
Installation
1. Clone the repo:
git clone git@github.com:NVIDIA/nv-embedding-cache.git cd nv-embedding-cache git submodule update --init --recursive
2. Start the docker container:
docker build -t nve --build-arg START_DIR=$(pwd) --build-arg UID=$(id -u) --build-arg UNAME=$(id -u -n) --build-arg GID=$(id -g) --build-arg GNAME=$(id -g -n) . docker run --cap-add=ALL --net=host --ipc=host --gpus all -it --rm -v $(pwd):$(pwd) nve
3. Build and install the Python bindings (by default in ./build)
pip install .
4. Alternatively, build C++ sources with samples and tests
mkdir build_dir cd build_dir cmake .. make all -j cd -
Documentation & Samples
The [docs](docs/) dir contains our documentation. It's structured as follows:
docs ├── advanced.md # Advanced topics ├── benchmarks.md # Benchmarking instructions ├── c_api.md # C API documentation ├── cpp_api.md # C++ API documentation ├── multi_gpu.md # Multi-GPU support ├── overview.md # SDK Overview <-- Start Here! ├── python_api.md # Python bindings documentation └── samples.md # Samples listing and description
A good place to start is: [docs/overview.md](docs/overview.md).
Samples are listed in [docs/samples.md](docs/samples.md). The basics are covered in [simple_cpp](samples/simple_cpp/) and [pytorch/simple_sample](samples/pytorch/simple_sample/)
Benchmarking scripts are available in [benchmarks/](benchmarks/). See instructions at [docs/benchmarks.md](docs/benchmarks.md)
Citations
- https://developer.nvidia.com/blog/nvidia-platform-delivers-lowest-token-cost-enabled-by-extreme-co-design/
License
The NV Embedding Cache SDK is licensed under the terms of the Apache 2.0 license. See [LICENSE](LICENSE) for more information.
Third-party dependencies
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
Third party dependencies are available as git submodules and can be found at [third_party](third_party). Their respective licenses are listed below.
|Name|License| |-|-| |abseil-cpp|https://github.com/abseil/abseil-cpp/blob/master/LICENSE| |argparse|https://github.com/p-ranav/argparse/blob/master/LICENSE| |dlpack|https://github.com/dmlc/dlpack/blob/main/LICENSE| |googletest|https://github.com/google/googletest/blob/main/LICENSE| |hiredis|https://github.com/redis/hiredis/blob/master/COPYING| |json|https://github.com/nlohmann/json/blob/develop/LICENSE.MIT| |parallel-hashmap|https://github.com/greg7mdp/parallel-hashmap/blob/master/LICENSE| |pybind11|https://github.com/pybind/pybind11/blob/master/LICENSE| |redis-plus-plus|https://github.com/sewenew/redis-plus-plus/blob/master/LICENSE| |rocksdb|https://github.com/facebook/rocksdb/blob/main/LICENSE.Apache|
Notability
notability 2.0/10Low-traffic new repo from NV