microsoft/foundry-agent-voice-mode-sample
Python
Captured source
source ↗microsoft/foundry-agent-voice-mode-sample
Description: Voice-mode hosted agent on Microsoft Foundry — FastAPI broker + browser mic + Voice Live realtime + sample tools and labs.
Language: Python
License: MIT
Stars: 2
Forks: 0
Open issues: 0
Created: 2026-05-28T21:25:36Z
Pushed: 2026-05-29T10:17:30Z
Default branch: main
Fork: no
Archived: no
README:
foundry-agent-voice-mode-sample
A runnable sample that wires the browser microphone to Agent voice mode in Microsoft Foundry. A small FastAPI broker holds the credentials, opens a WebSocket to the Voice Live realtime endpoint, and binds the session to a hosted Foundry agent. The agent answers travel questions using tool calls and replies in natural speech.
A three-part workshop in labs/ walks you from a basic voice loop to a fully bound hosted agent.
What this repository is
- A sample. Clone it, fill in
.env, runscripts/start-local.ps1, and talk to the agent in your browser. - A workshop. Three progressive labs under
labs/teach the pattern step by step. - A reference. The exact Voice Live URL contract that the Foundry portal uses is encoded in
voicelive/server/voicelive_session.pyand probed byscripts/test-session.ps1. Use it as a regression test when the API changes.
It is not a production library. The broker is intentionally small so the pattern is easy to fork.
What is in this repository
voicelive/serveris a FastAPI broker that holds credentials, builds the upstream WebSocket URL, and relays audio frames in both directions.voicelive/clientis a small static page that captures microphone audio, ships PCM16 frames over WebSocket, and renders transcripts with markdown.voicelive/config/session.jsonis the first frame the browser sends after the socket opens. It pins the voice, the noise reduction, the echo cancellation, and the semantic VAD.agent/contains the Foundry agent definition, the system prompt, and three sample tools (weather, flight status, hotel details).infra/contains a Bicep template that provisions the Foundry resource, the project, and the model deployment.labs/is the three-part workshop.docs/blog/contains a stand-alone HTML blog post that summarises the architecture for an external audience.
Quick start (sample path)
1. Copy .env.sample to .env and fill in the values from your Foundry resource. 2. Create a Python virtual environment and install requirements.txt. 3. Run scripts/start-local.ps1 to launch the broker on http://127.0.0.1:8000. 4. Open the URL in a browser, allow microphone access, and start talking. 5. Run scripts/test-session.ps1 for a non-interactive smoke test.
Workshop path
If you would rather build up to the full sample one step at a time, work through the labs in order.
- [labs/lab1-basic-voice.md](labs/lab1-basic-voice.md) runs the broker against a plain model with no agent binding.
- [labs/lab2-add-tools.md](labs/lab2-add-tools.md) adds three Python tools and shows how Voice Live invokes them.
- [labs/lab3-hosted-agent.md](labs/lab3-hosted-agent.md) creates the hosted agent in the Foundry portal and wires it into the broker.
Each lab is self-contained and ends with a working checkpoint, so you can stop after any lab and still have something that runs.
Architecture
The browser never sees an Azure key or token. The broker performs the upstream handshake with either a bearer token from DefaultAzureCredential or an API key, then pipes frames in both directions.
The Voice Live WebSocket URL is built as follows.
wss://.cognitiveservices.azure.com/voice-live/realtime ?api-version=2025-10-01 &model= &agent-project-name= &agent-id= &agent-access-token= &authorization=Bearer%20
The model query parameter must match the agent display name. The authorization value must be URL-encoded.
Documentation
- [docs/architecture.md](docs/architecture.md) explains the request flow and the auth model.
- [docs/demo-flow.md](docs/demo-flow.md) is a script you can read aloud during a demo.
- [docs/deployment.md](docs/deployment.md) covers provisioning, the Foundry portal agent flow, and Container Apps hosting.
- [docs/troubleshooting.md](docs/troubleshooting.md) lists the common failures and how to fix them.
- [docs/blog/voice-live-hosted-agent.html](docs/blog/voice-live-hosted-agent.html) is the long-form write-up of the pattern.
Deploying to Azure
The end-to-end deployment is documented in [docs/deployment.md](docs/deployment.md). The summary is as follows.
1. Provision the Foundry resource, project, and model with the Bicep template in infra/. 2. Create the agent in the Foundry portal. The SDK path in scripts/deploy-agent.ps1 produces an Assistants-style agent that lacks the microsoft.voice-live.enabled metadata required by the working URL shape, so the portal is currently the only reliable route. 3. Assign the Cognitive Services User role to the identity that will run the broker. 4. Set the broker environment variables and run it locally, or deploy it to Azure Container Apps with a managed identity.
Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for the contributor guide and [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md) for the community standards. Security issues should be reported privately as described in [SECURITY.md](SECURITY.md). Questions and help requests are covered in [SUPPORT.md](SUPPORT.md).
License
This project is released under the MIT License. See LICENSE for the full text.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorised use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark and Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos is subject to those third parties' policies.
Notability
notability 3.0/10Low traction sample repo