What does this repo signal mean?

OpenAI published openai/realtime-voice-component (TypeScript). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo openai/realtime-voice-component · language TypeScript · New OpenAI repo with good traction. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

OpenAI Repo: openai/realtime-voice-component

Captured source

source ↗

GitHub/github.com/openai/realtime-voice-component

openai/realtime-voice-component repository metadata

Source ↗

published Mar 12, 2026seen Jun 5captured Jun 11http 200method plain

openai/realtime-voice-component

Language: TypeScript

License: Apache-2.0

Stars: 873

Forks: 108

Open issues: 1

Created: 2026-03-12T19:31:35Z

Pushed: 2026-04-27T22:57:27Z

Default branch: main

Fork: no

Archived: no

README:

realtime-voice-component

React/browser voice controls for tool-constrained UIs built on OpenAI Realtime.

> Warning > This repository is an open-source reference implementation. It is useful for > education, demos, and local adoption, but it is not a promise of long-term > product support or a production-ready UI kit.

Distribution Status

This repo is intended to be shared as a GitHub reference implementation. It is not currently published to npm, and package.json remains marked as private.

The code is licensed under Apache-2.0. See [LICENSE](./LICENSE).

What This Package Is

This package is for apps where:

your app defines the exact actions the assistant can take
tools stay app-owned and narrow
the UI remains responsible for the visible state change
you want a React-friendly controller and an optional launcher widget

The package is intentionally opinionated about browser UI flows. It is not a general-purpose orchestration framework and it is not a replacement for raw Realtime transports.

Choose The Right Layer

Use this package when you want a React/browser layer for voice-driven UI:

a reusable controller with React bindings
a packaged launcher widget
optional visible confirmation via the ghost cursor
a pattern centered on app-owned tools, not free-form browser automation

Use raw Realtime when you want lower-level transport and session control:

custom audio handling
non-React runtimes
your own UI surface and state model from scratch

Use `openai-agents-js` when you need a broader headless SDK:

agent orchestration and handoffs
richer hosted-tool and MCP flows
server-side or multi-runtime agent systems beyond a browser UI package

Demo App

The repo’s [demo/](./demo) app is the main runnable teaching surface. It shows:

a starter theme-switching flow
a multi-step form flow
a richer shared-state chess flow
shared controller reuse across multiple screens
optional wake-word experimentation layered on top of the runtime

Run it locally with:

cp demo/.env.example demo/.env.local
# edit demo/.env.local and set OPENAI_API_KEY
npm install
npm run demo

Package Shape

defineVoiceTool() turns a Zod-backed app action into a Realtime function

tool.

createVoiceControlController() owns the session, transport, tool execution,

transcript assembly, and connection lifecycle.

useVoiceControl() binds React to either an external controller or an

internally owned one.

VoiceControlWidget is a launcher UI on top of the controller.
useGhostCursor() and GhostCursorOverlay are optional visible confirmation

helpers.

Recommended Default Flow

For most browser apps in this repo, the recommended path is:

1. proxy the browser SDP + session config through your own /session endpoint 2. register one narrow tool that maps to one real app action 3. start with the theme demo or a small controller-based integration 4. send current UI state back into the session after visible changes

Turn Detection Defaults

The controller uses Realtime server_vad by default. For text and tool-only sessions, it also sets interrupt_response: false so new speech does not cancel an in-flight text response or tool call. That matters when your UI does not play assistant audio back to the user.

If you override audio.input.turnDetection, use a server VAD shape like this as the starting point for tool-only UI control:

{
type: "server_vad",
threshold: 0.5,
prefixPaddingMs: 300,
silenceDurationMs: 200,
createResponse: true,
interruptResponse: false,
}

Integrating With An Existing App

The most reliable retrofit pattern is:

keep your app as the source of truth
create one explicit controller for one voice surface
put a small app-owned wrapper between tools and your real handlers
keep the widget launcher-focused

In practice, this avoids most of the confusing failure modes we hit while integrating the package into a larger app.

Before you wire anything, choose ownership:

single-screen ownership: the controller belongs to one screen and can be

created with that screen's tools immediately

shared shell or provider ownership: the controller lives above scene-level UI

because the same session should stay alive across scene, tab, or route changes

That choice affects where the controller lives, whether tools are known at creation time, and whether a neutral bootstrap controller is the right shape.

Step-by-Step Guide

1. Install the package like a normal app dependency. Use your package manager to install realtime-voice-component from the local checkout path and import realtime-voice-component/styles.css from your app. Prefer a normal dependency install over reaching directly into the package source tree from your app.

2. Add a `/session` endpoint in your app backend. Have the browser send SDP plus session config to your app server, and have your server forward that request to POST https://api.openai.com/v1/realtime/calls. Keep the multipart body intact unless you intentionally need to override session settings.

Example:

app.post("/session", async (request, response) => {
const contentType = request.header("content-type");

const realtimeResponse = await fetch("https://api.openai.com/v1/realtime/calls", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
...(contentType ? { "Content-Type": contentType } : {}),
},
body: request,
duplex: "half",
});

response
.status(realtimeResponse.status)
.type(realtimeResponse.headers.get("content-type") ?? "application/sdp")
.send(await realtimeResponse.text());
});

3. Create a small app-owned voice wrapper. Build a wrapper or adapter around your existing app state and handlers. Good wrapper methods are things like:

getState()
setPrompt()
setScenario()
startRun()
stopRun()
sendToast()

Example:

const stateRef = useRef({
prompt,
runStatus,
scenarioId,
});
stateRef.current = {
prompt,
runStatus,
scenarioId,
};

const voiceAdapter = useMemo(
() => ({
getState: () => stateRef.current,
sendToast: (message: string) => {
toast(message);
},
setPrompt,
setScenario: setScenarioId,
startRun,
stopRun,...

Excerpt shown — open the source for the full document.

Notability

notability 6.0/10

New OpenAI repo with good traction