ReleaseMicrosoftMicrosoftpublished May 5, 2026seen 2d

microsoft/Foundry-Local v1.1.0

microsoft/Foundry-Local

Open original ↗

Captured source

source ↗
published May 5, 2026seen 2dcaptured 9hhttp 200method plain

v1.1.0 Foundry Local

Repository: microsoft/Foundry-Local

Tag: v1.1.0

Published: 2026-05-05T20:34:34Z

Prerelease: no

Release notes:

🚀 Foundry Local v1.1.0 Release Notes

We're excited to announce Foundry Local v1.1.0 — packed with new capabilities for on-device AI! This release brings expanded platform support, new model types, and performance improvements across the board.

---

🆕 What's New

🎯 .NET netstandard2.0 / net8.0 Support

The C# SDK now targets both `net8.0` and `netstandard2.0`, broadening compatibility to .NET Framework 4.6.1+, .NET Core 2.0+, Xamarin, Unity, and more. Ship on-device AI to virtually any .NET application!

> 📖 [C# SDK Documentation](sdk/cs/README.md)

---

🎙️ Live Audio Transcription

Real-time speech-to-text is here! Stream microphone audio directly to the SDK and receive transcription results as they arrive — no cloud round-trips, no latency. Built on the Nemotron ASR model with an OpenAI Realtime-compatible API surface.

Python

audio_client = model.get_audio_client()
session = audio_client.create_live_transcription_session()
session.settings.sample_rate = 16000
session.settings.channels = 1
session.settings.language = "en"

session.start()

# Push audio
session.append(pcm_bytes)

# Read results (typically on a background thread)
for result in session.get_stream():
print(result.content[0].text) # transcribed text
print(result.is_final) # True for final results

session.stop()

📖 [Python live audio transcription sample](samples/python/live-audio-transcription/)

JavaScript

const audioClient = model.createAudioClient();
const session = audioClient.createLiveTranscriptionSession();
session.settings.sampleRate = 16000;
session.settings.channels = 1;
session.settings.language = 'en';

await session.start();

// Push audio
await session.append(pcmBytes);

// Read results
for await (const result of session.getStream()) {
console.log(result.content[0].text); // transcribed text
console.log(result.is_final); // true for final results
}

await session.stop();

📖 [JavaScript live audio transcription sample](samples/js/live-audio-transcription/)

C#

var audioClient = await model.GetAudioClientAsync();
var session = audioClient.CreateLiveTranscriptionSession();
session.Settings.SampleRate = 16000;
session.Settings.Channels = 1;
session.Settings.Language = "en";

await session.StartAsync();

// Push audio
await session.AppendAsync(pcmBytes);

// Read results
await foreach (var result in session.GetStream())
{
Console.WriteLine(result.Content[0].Text); // transcribed text
Console.WriteLine(result.IsFinal); // true for final results
}

await session.StopAsync();

📖 [C# live audio transcription sample](samples/cs/live-audio-transcription/)

Rust

let audio_client = model.create_audio_client();
let session = audio_client.create_live_transcription_session();
session.start(None).await?;

// Push audio
session.append(&pcm_bytes).await?;

// Read results
let mut stream = session.get_stream().await?;
while let Some(result) = stream.next().await {
let r = result?;
if let Some(content) = r.content.first() {
println!("{}", content.text); // transcribed text
println!("{}", r.is_final); // true for final results
}
}

session.stop().await?;

📖 [Rust live audio transcription sample](samples/rust/live-audio-transcription/)

---

📐 Embeddings

Generate text embeddings entirely on-device for semantic search, RAG, clustering, and more. The new `qwen3-0.6b-embedding` model delivers high-quality vector representations in a compact footprint.

Python

model = manager.catalog.get_model("qwen3-0.6b-embedding")
model.download()
model.load()

client = model.get_embedding_client()

# Single embedding
response = client.generate_embedding("The quick brown fox jumps over the lazy dog")
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")

# Batch embeddings
batch_response = client.generate_embeddings([
"Machine learning is a subset of artificial intelligence",
"The capital of France is Paris",
"Rust is a systems programming language",
])

📖 [Python embeddings sample](samples/python/embeddings/)

JavaScript

const model = await manager.catalog.getModel('qwen3-0.6b-embedding');
await model.download();
await model.load();

const embeddingClient = model.createEmbeddingClient();

// Single embedding
const response = await embeddingClient.generateEmbedding(
'The quick brown fox jumps over the lazy dog'
);
console.log(`Dimensions: ${response.data[0].embedding.length}`);

// Batch embeddings
const batchResponse = await embeddingClient.generateEmbeddings([
'Machine learning is a subset of artificial intelligence',
'The capital of France is Paris',
'Rust is a systems programming language'
]);

📖 [JavaScript embeddings sample](samples/js/embeddings/)

C#

var model = await catalog.GetModelAsync("qwen3-0.6b-embedding");
await model.DownloadAsync();
await model.LoadAsync();

var embeddingClient = await model.GetEmbeddingClientAsync();

// Single embedding
var response = await embeddingClient.GenerateEmbeddingAsync(
"The quick brown fox jumps over the lazy dog");
var embedding = response.Data[0].Embedding;
Console.WriteLine($"Dimensions: {embedding.Count}");

// Batch embeddings
var batchResponse = await embeddingClient.GenerateEmbeddingsAsync([
"Machine learning is a subset of artificial intelligence",
"The capital of France is Paris",
"Rust is a systems programming language"
]);

📖 [C# embeddings sample](samples/cs/embeddings/)

Rust

📖 [Rust embeddings sample](samples/rust/embeddings/)

---

👁️ Qwen 3.5 Vision Language Model

Introducing Qwen 3.5 VL — a multimodal vision-language model that runs entirely on-device. Analyze images, understand visual content, and answer questions about what's in a picture — all without sending data to the cloud.

model = manager.catalog.get_model("qwen3.5-vision")
model.download()
model.load()

---

📦 JavaScript SDK — Koffi Dependency Removed

The JavaScript SDK no longer depends on koffi for native interop. This results in a leaner dependency tree, faster installs, and fewer compatibility issues across platforms and Node.js versions.

  • Smaller `node_modules` — no more large native FFI dependency
  • Fewer platform quirks — prebuilt N-API addon replaces runtime FFI binding
  • Faster install times — less to…

Excerpt shown — open the source for the full document.