What does this repo signal mean?

Microsoft published microsoft/Foundry-Local (C++). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo microsoft/Foundry-Local · language C++. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Microsoft Repo: microsoft/Foundry-Local

Captured source

source ↗

GitHub/github.com/microsoft/Foundry-Local

microsoft/Foundry-Local repository metadata

Source ↗

published Mar 31, 2025seen 2dcaptured 8hhttp 200method plain

microsoft/Foundry-Local

Language: C++

License: NOASSERTION

Stars: 2356

Forks: 323

Open issues: 54

Created: 2025-03-31T21:51:15Z

Pushed: 2026-06-11T01:47:14Z

Default branch: main

Fork: no

Archived: no

README:

Ship on-device AI inside your app

Foundry Local is an end-to-end local AI solution for building applications that run entirely on the user's device. It provides native SDKs (C#, JavaScript, Python, and Rust), a curated catalog of optimized models, and automatic hardware acceleration — all in a lightweight package (~20 MB). The compact size makes it easy to integrate into your application and distribute to end users.

User data never leaves the device, responses start immediately with zero network latency, and your app works offline. No per-token costs, no API keys, no backend infrastructure to maintain, and no Azure subscription required.

Key Features

Lightweight runtime — The runtime handles model acquisition, hardware acceleration, model management, and inference (via ONNX Runtime).

Curated model catalog — A catalog of high-quality models optimized for on-device use across a wide range of consumer hardware. The catalog covers chat completions (for example, GPT OSS, Qwen, DeepSeek, Mistral and Phi) and audio transcription (for example, Whisper). Every model goes through extensive quantization and compression to deliver the best balance of quality and performance. Models are versioned, so your application can pin to a specific version or automatically receive updates.

Automatic hardware acceleration — Foundry Local detects the available hardware on the user's device and selects the best execution provider and device (NPU, GPU or CPU).

Smart model management — Foundry Local handles the full lifecycle of models on end-user devices. Models download automatically on first use, are cached locally for instant subsequent launches, and the best-performing variant is selected for the user's specific hardware.

OpenAI-compatible API — Supports OpenAI request and response formats including the OpenAI Responses API format. If your application already uses the OpenAI SDK, point it to a Foundry Local endpoint with minimal code changes.

Optional local server — An OpenAI-compatible web server for serving models to multiple processes, integrating with tools like LangChain, or experimenting through REST calls. For most embedded application scenarios, use the SDK directly — it runs inference in-process without the overhead of a separate server.

🚀 Quickstart

> [!TIP] > The following shows a quickstart for Python and JavaScript. C# and Rust language bindings are also available. Take a look at the [samples](/samples/) for more details.

JavaScript

1. Install the SDK:

# Windows (recommended for hardware acceleration)
npm install foundry-local-sdk-winml

# macOS/linux
npm install foundry-local-sdk

2. Run your first chat completion:

import { FoundryLocalManager } from 'foundry-local-sdk';

const manager = FoundryLocalManager.create({ appName: 'my-app' });

// Download and load a model (auto-selects best variant for user's hardware)
const model = await manager.catalog.getModel('qwen2.5-0.5b');
await model.download((progress) => {
process.stdout.write(`\rDownloading... ${progress.toFixed(2)}%`);
});
await model.load();

// Create a chat client and get a completion
const chatClient = model.createChatClient();
const response = await chatClient.completeChat([
{ role: 'user', content: 'What is the golden ratio?' }
]);

console.log(response.choices[0]?.message?.content);

// Unload the model when done
await model.unload();

Python

1. Install the SDK:

# Windows (recommended for hardware acceleration)
pip install foundry-local-sdk-winml

# macOS/Linux
pip install foundry-local-sdk

2. Run your first chat completion:

from foundry_local_sdk import Configuration, FoundryLocalManager

config = Configuration(app_name="foundry_local_samples")
FoundryLocalManager.initialize(config)
manager = FoundryLocalManager.instance

# Select and load a model from the catalog
model = manager.catalog.get_model("qwen2.5-0.5b")
model.download()
model.load()

# Get a chat client
client = model.get_chat_client()

# Create and send message
messages = [
{"role": "user", "content": "What is the golden ratio?"}
]
response = client.complete_chat(messages)
print(f"Response: {response.choices[0].message.content}")

model.unload()

💬 Audio Transcription (Speech-to-Text)

The SDK also supports audio transcription via Whisper models (available in JavaScript, C#, Python and Rust):

import { FoundryLocalManager } from 'foundry-local-sdk';

const manager = FoundryLocalManager.create({ appName: 'my-app' });

const whisperModel = await manager.catalog.getModel('whisper-tiny');
await whisperModel.download();
await whisperModel.load();

const audioClient = whisperModel.createAudioClient();
audioClient.settings.language = 'en';

// Transcribe an audio file
const result = await audioClient.transcribe('recording.wav');
console.log('Transcription:', result.text);

// Or stream in real-time
for await (const chunk of audioClient.transcribeStreaming('recording.wav')) {
process.stdout.write(chunk.text);
}

await whisperModel.unload();

> [!TIP] > A single FoundryLocalManager can manage both chat and audio models simultaneously. See the [chat-and-audio sample](samples/js/chat-and-audio-foundry-local/) for a complete example.

📦 Samples

Explore complete working examples in the [samples/](samples/) folder:

| Language | Samples | Highlights | |----------|---------|------------| | [C#](samples/cs/) | 12 | Native chat, audio transcription, tool calling, model management, web server, tutorials | | [JavaScript](samples/js/) | 12 | Native chat, audio, Electron app, Copilot SDK, LangChain, tool calling, tutorials | | [Python](samples/python/) | 9 | Chat completions, audio transcription, LangChain, tool calling, tutorials | | [Rust](samples/rust/) | 8 | Native chat, audio transcription, tool calling, web server, tutorials |

🖥️ CLI

The Foundry Local CLI lets you explore models and experiment interactively.

Install (public preview):

Download the asset for your platform from the `cli-preview-0.10.0`…

Excerpt shown — open the source for the full document.