What does this fork signal mean?

Nous Research forked NousResearch/nous-llama.cpp (forked from ggml-org/llama.cpp). This fork signal points to upstream code the lab may be inspecting, patching, or building on. High-signal details: repo NousResearch/nous-llama.cpp · parent ggml-org/llama.cpp. onlylabs links this event to 1 captured evidence page and 6 related fork signals.

Nous Research Fork: NousResearch/nous-llama.cpp

Captured source

source ↗

GitHub/github.com/NousResearch/nous-llama.cpp

NousResearch/nous-llama.cpp repository metadata

Source ↗

published Mar 10, 2024seen 5dcaptured 14hhttp 200method plain

NousResearch/nous-llama.cpp

Description: LLM inference in C/C++ - nous research

License: MIT

Stars: 8

Forks: 1

Open issues: 0

Created: 2024-03-10T06:36:11Z

Pushed: 2024-03-15T19:59:13Z

Default branch: master

Fork: yes

Parent repository: ggml-org/llama.cpp

Archived: no

README:

llama.cpp

!llama

Roadmap / Project status / Manifesto / ggml

Inference of Meta's LLaMA model (and others) in pure C/C++

> [!IMPORTANT] > Quantization blind testing: https://github.com/ggerganov/llama.cpp/discussions/5962 > > Vote for which quantization type provides better responses, all other parameters being the same.

Recent API changes

[2024 Mar 8] llama_kv_cache_seq_rm() returns a bool instead of void, and new llama_n_max_seq() returns the upper limit of acceptable seq_id in batches (relevant when dealing with multiple sequences) https://github.com/ggerganov/llama.cpp/pull/5328
[2024 Mar 4] Embeddings API updated https://github.com/ggerganov/llama.cpp/pull/5796
[2024 Mar 3] struct llama_context_params https://github.com/ggerganov/llama.cpp/pull/5849

Hot topics

Initial Mamba support has been added: https://github.com/ggerganov/llama.cpp/pull/5328

----

Table of Contents

Description

Usage

Get the Code

Build

BLAS Build

Prepare and Quantize

Run the quantized model

Memory/Disk Requirements

Quantization

Interactive mode

Constrained output with grammars

Instruct mode

Obtaining and using the Facebook LLaMA 2 model

Seminal papers and background on the models

Perplexity (measuring model quality)

Android

Docker

Contributing

Coding guidelines

Docs

Description

The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.

Plain C/C++ implementation without any dependencies
Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks
AVX, AVX2 and AVX512 support for x86 architectures
1.5-bit, 2-bit, 3-bit, 4-bit, 5-bit, 6-bit, and 8-bit integer quantization for faster inference and reduced memory use
Custom CUDA kernels for running LLMs on NVIDIA GPUs (support for AMD GPUs via HIP)
Vulkan, SYCL, and (partial) OpenCL backend support
CPU+GPU hybrid inference to partially accelerate models larger than the total VRAM capacity

Since its inception, the project has improved significantly thanks to many contributions. It is the main playground for developing new features for the ggml library.

Supported platforms:

[X] Mac OS
[X] Linux
[X] Windows (via CMake)
[X] Docker
[X] FreeBSD

Supported models:

Typically finetunes of the base models below are supported as well.

[X] LLaMA 🦙
[x] LLaMA 2 🦙🦙
[X] Mistral 7B
[x] Mixtral MoE
[X] Falcon
[X] Chinese LLaMA / Alpaca and Chinese LLaMA-2 / Alpaca-2
[X] Vigogne (French)
[X] Koala
[X] Baichuan 1 & 2 + derivations
[X] Aquila 1 & 2
[X] Starcoder models
[X] Refact
[X] Persimmon 8B
[X] MPT
[X] Bloom
[x] Yi models
[X] StableLM models
[x] Deepseek models
[x] Qwen models
[x] PLaMo-13B
[x] Phi models
[x] GPT-2
[x] Orion 14B
[x] InternLM2
[x] CodeShell
[x] Gemma
[x] Mamba

Multimodal models:

HTTP server

[llama.cpp web server](./examples/server) is a lightweight OpenAI API compatible HTTP server that can be used to serve local models and easily connect them to existing clients.

Bindings:

Python: abetlen/llama-cpp-python
Go: go-skynet/go-llama.cpp
Node.js: withcatai/node-llama-cpp
JS/TS (llama.cpp server client): lgrammel/modelfusion
JavaScript/Wasm (works in browser): tangledgroup/llama-cpp-wasm
Ruby: yoshoku/llama_cpp.rb
Rust (nicer API): mdrokz/rust-llama.cpp
Rust (more direct bindings):…

Excerpt shown — open the source for the full document.