openai/realtime-voice-component
TypeScript
Captured source
source ↗openai/realtime-voice-component
Language: TypeScript
License: Apache-2.0
Stars: 873
Forks: 108
Open issues: 1
Created: 2026-03-12T19:31:35Z
Pushed: 2026-04-27T22:57:27Z
Default branch: main
Fork: no
Archived: no
README:
realtime-voice-component
React/browser voice controls for tool-constrained UIs built on OpenAI Realtime.
> Warning > This repository is an open-source reference implementation. It is useful for > education, demos, and local adoption, but it is not a promise of long-term > product support or a production-ready UI kit.
Distribution Status
This repo is intended to be shared as a GitHub reference implementation. It is not currently published to npm, and package.json remains marked as private.
The code is licensed under Apache-2.0. See [LICENSE](./LICENSE).
What This Package Is
This package is for apps where:
- your app defines the exact actions the assistant can take
- tools stay app-owned and narrow
- the UI remains responsible for the visible state change
- you want a React-friendly controller and an optional launcher widget
The package is intentionally opinionated about browser UI flows. It is not a general-purpose orchestration framework and it is not a replacement for raw Realtime transports.
Choose The Right Layer
Use this package when you want a React/browser layer for voice-driven UI:
- a reusable controller with React bindings
- a packaged launcher widget
- optional visible confirmation via the ghost cursor
- a pattern centered on app-owned tools, not free-form browser automation
Use raw Realtime when you want lower-level transport and session control:
- custom audio handling
- non-React runtimes
- your own UI surface and state model from scratch
Use `openai-agents-js` when you need a broader headless SDK:
- agent orchestration and handoffs
- richer hosted-tool and MCP flows
- server-side or multi-runtime agent systems beyond a browser UI package
Demo App
The repo’s [demo/](./demo) app is the main runnable teaching surface. It shows:
- a starter theme-switching flow
- a multi-step form flow
- a richer shared-state chess flow
- shared controller reuse across multiple screens
- optional wake-word experimentation layered on top of the runtime
Run it locally with:
cp demo/.env.example demo/.env.local # edit demo/.env.local and set OPENAI_API_KEY npm install npm run demo
Package Shape
defineVoiceTool()turns a Zod-backed app action into a Realtime function
tool.
createVoiceControlController()owns the session, transport, tool execution,
transcript assembly, and connection lifecycle.
useVoiceControl()binds React to either an external controller or an
internally owned one.
VoiceControlWidgetis a launcher UI on top of the controller.useGhostCursor()andGhostCursorOverlayare optional visible confirmation
helpers.
Recommended Default Flow
For most browser apps in this repo, the recommended path is:
1. proxy the browser SDP + session config through your own /session endpoint 2. register one narrow tool that maps to one real app action 3. start with the theme demo or a small controller-based integration 4. send current UI state back into the session after visible changes
Turn Detection Defaults
The controller uses Realtime server_vad by default. For text and tool-only sessions, it also sets interrupt_response: false so new speech does not cancel an in-flight text response or tool call. That matters when your UI does not play assistant audio back to the user.
If you override audio.input.turnDetection, use a server VAD shape like this as the starting point for tool-only UI control:
{
type: "server_vad",
threshold: 0.5,
prefixPaddingMs: 300,
silenceDurationMs: 200,
createResponse: true,
interruptResponse: false,
}Integrating With An Existing App
The most reliable retrofit pattern is:
- keep your app as the source of truth
- create one explicit controller for one voice surface
- put a small app-owned wrapper between tools and your real handlers
- keep the widget launcher-focused
In practice, this avoids most of the confusing failure modes we hit while integrating the package into a larger app.
Before you wire anything, choose ownership:
- single-screen ownership: the controller belongs to one screen and can be
created with that screen's tools immediately
- shared shell or provider ownership: the controller lives above scene-level UI
because the same session should stay alive across scene, tab, or route changes
That choice affects where the controller lives, whether tools are known at creation time, and whether a neutral bootstrap controller is the right shape.
Step-by-Step Guide
1. Install the package like a normal app dependency. Use your package manager to install realtime-voice-component from the local checkout path and import realtime-voice-component/styles.css from your app. Prefer a normal dependency install over reaching directly into the package source tree from your app.
2. Add a `/session` endpoint in your app backend. Have the browser send SDP plus session config to your app server, and have your server forward that request to POST https://api.openai.com/v1/realtime/calls. Keep the multipart body intact unless you intentionally need to override session settings.
Example:
app.post("/session", async (request, response) => {
const contentType = request.header("content-type");
const realtimeResponse = await fetch("https://api.openai.com/v1/realtime/calls", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
...(contentType ? { "Content-Type": contentType } : {}),
},
body: request,
duplex: "half",
});
response
.status(realtimeResponse.status)
.type(realtimeResponse.headers.get("content-type") ?? "application/sdp")
.send(await realtimeResponse.text());
});3. Create a small app-owned voice wrapper. Build a wrapper or adapter around your existing app state and handlers. Good wrapper methods are things like:
getState()setPrompt()setScenario()startRun()stopRun()sendToast()
Example:
const stateRef = useRef({
prompt,
runStatus,
scenarioId,
});
stateRef.current = {
prompt,
runStatus,
scenarioId,
};
const voiceAdapter = useMemo(
() => ({
getState: () => stateRef.current,
sendToast: (message: string) => {
toast(message);
},
setPrompt,
setScenario: setScenarioId,
startRun,
stopRun,…Excerpt shown — open the source for the full document.
Notability
notability 6.0/10New OpenAI repo with good traction