RepoMicrosoftMicrosoftpublished Apr 10, 2026seen 5d

microsoft/Build26-BRK230-build-smarter-ai-systems-in-foundry-as-models-and-costs-evolve

Open original ↗

Captured source

source ↗

microsoft/Build26-BRK230-build-smarter-ai-systems-in-foundry-as-models-and-costs-evolve

Description: Microsoft Build 2026 · Build smarter AI systems in Microsoft Foundry as models and costs evolve · Learn to hill climb across quality, cost and latency with a model playbook

License: MIT

Stars: 11

Forks: 14

Open issues: 1

Created: 2026-04-10T22:42:30Z

Pushed: 2026-06-06T03:57:57Z

Default branch: main

Fork: no

Archived: no

README:

Microsoft Build 2026

BRK230: Build Smarter AI Systems in Foundry as Models and Costs Evolve

Discover how to quickly choose, integrate, and validate AI models inside Microsoft Foundry. Learn techniques for navigating thousands of model options, benchmarking performance, and streamlining your workflow with deep IDE support. Build faster, ship smarter, and stay on top of the evolving AI landscape.

| _Click the banner to visit the session page & watch replay_ | |:---:| | [![Thumbnail](./img/brk230-thumbnail.jpg)](https://build.microsoft.com/en-US/sessions/brk230) |

Introduction

Model lifecycles are now measured in months, sometimes weeks, and production agents need to deploy and use multiple models to get the right fit for each task. _"What model should I use?"_ is the wrong question to ask. Instead, the right question is: _"How do I build an AI system that keeps getting smarter, faster, safer, and more cost-efficient as models evolve?"_

In this session, walk through the developer's workflow as they tackle that system-design problem from the initial plan to the final deployed product. By the end of the session, you should get a sense for the model playbook you can use to apply these levers to your own system development and scenarios.

![Playbook](./img/model-playbook.png)

Scenario: Compliant Trip Planning

_World Wide Importers_ is a fictitious enterprise company that requires its employees to travel all over the world to conduct business. They have complex travel policies that need to be taken into account when making plans and submitting expenses. So they decided to build a _Travel Concierge_, an AI assistant that can help employees handle travel planning and expenses in a compliant way.

![Scenario](./img/compliance-scenario.png)

This demo walks through that system-design problem end-to-end using Microsoft Foundry. We follow the travel request from an employee — decomposing it into the jobs a production agent actually has to do: routing intent, reading a receipt image, answering a policy question, planning the trip, and calling tools. For each job, we show how to pick, evaluate, route, fine-tune, and operate the right model.

Developer Challenges

The session is organized around four challenges every AI developer faces:

| | | | |:---|:---|:---| | Select | _Which model fits my task?_ | Browse over 11,000 models in the Foundry catalog (Azure OpenAI, Claude, MAI, DeepSeek, Mistral, Grok, Llama, Cohere, Fireworks AI, and more) and shortlist candidates| | Evaluate| *Is the model getting better?*|Define quality, latency, and cost criteria, then compare models side by side on your own data with built-in and custom evaluators. | | Optimize| *How do I reduce cost?*|Apply model routing, prompt caching, batch inference, provisioned throughput, structured outputs, and distillation or fine-tuning to cut cost without losing quality.| | Operate| *Will it hold up in production?*| Deploy with managed endpoints, versioning, rollback, monitoring, responsible AI guardrails, and governance.| | | |

Watch as we iterative refine our system using a series of Microsoft Foundry levers for cost, latency and quality optimization. The end result - a solution that meets our targets for lower cost and latency, and higher quality by applying playbook ideas like _decomposition_ (replace single frontier model with multi-model approach, using smaller and cheaper models that "fit" these tasks), _distillation_ (using frontier model as teacher, to transfer knowledge to a smaller, cheaper "student" without losing quality) and _custom evaluation_ (creating more tailored metrics to capture and correct domain-specific quality gaps).

Learning Outcomes

By the end of this session, you will be able to:

  • Reframe model selection from a one-time decision into a continuous system-design problem with both an agent loop and a model loop.
  • Decompose a user request into discrete jobs to be done and map each job to the right model tier (nano, mini, or frontier) instead of overusing a single frontier model.
  • Build an evaluation harness in Foundry by defining quality, latency, and cost criteria, generating synthetic eval data, and comparing models side by side with quality, risk and safety, and agent evaluators.
  • Apply the full cost optimization stack: model router, prompt and semantic caching, structured outputs, batch inference, provisioned throughput, and fine-tuning or distillation.
  • Use distillation (teacher to student fine-tuning) to get frontier-quality answers at small-model cost and latency.
  • Operate a production AI system with managed deployments, versioning, rollback, monitoring, content safety, and governance.

💬 Keep Learning with Copilot

Try these prompts with GitHub Copilot to explore the topics from this session. Open Copilot Chat in Visual Studio Code (Ctrl+Alt+I on Windows or Linux, Cmd+Shift+I on Mac), paste a prompt, and see what you learn. Try connecting the [Microsoft Learn MCP Server](#-microsoft-learn-mcp-server) for the latest official documentation.

Use these as a starting point, or write your own:

  • *"Explain the difference between an agent loop and a model loop in Microsoft Foundry, and when I should invest in each."*
  • *"I have a multi-step agent using one frontier model for every call. Walk me through how to decompose the workload into jobs to be done and pick a cheaper model tier for each."*
  • *"Show me how to set up an evaluation run in Microsoft Foundry that compares two models on quality, latency, and cost using my own dataset."*
  • *"What is knowledge distillation, and how would I use a teacher model in Foundry to fine-tune a smaller student model for a policy Q&A task?"*
  • *"Compare prompt caching, semantic caching, batch inference, provisioned throughput, and model router in Foundry. When should I use each one to reduce cost?"*
  • *"Generate a checklist for taking an AI agent to production on…

Excerpt shown — open the source for the full document.

Notability

notability 1.0/10

Low stars, trivial repo

Microsoft has a repo signal matching evals and quality, infrastructure.