RepoStepFunStepFunpublished Jun 1, 2026seen 10h

stepfun-ai/Step-Realtime-CLI

TypeScript

Open original ↗

Captured source

source ↗
published Jun 1, 2026seen 10hcaptured 10hhttp 200method plain

stepfun-ai/Step-Realtime-CLI

Language: TypeScript

License: MIT

Stars: 18

Forks: 7

Open issues: 16

Created: 2026-06-01T06:00:45Z

Pushed: 2026-06-11T06:34:56Z

Default branch: main

Fork: no

Archived: no

README:

Step Realtime CLI

English | 简体中文

step-realtime-cli is a terminal-based AI coding assistant. You can interact with it via text or realtime voice for everyday tasks such as reading code, editing files, and running commands.

Demo

![Step Realtime CLI demo](docs/assets/demo.gif)

Key capabilities

  • Voice coding: run step voice and, with headphones on, issue spoken instructions; the assistant parses repository context, applies edits, and confirms changes verbally.
  • Text chat: run step in any working directory to enter the interactive terminal UI and start a task with natural language.
  • One-shot tasks: submit a single request via step exec "..." and receive the result when execution completes.
  • Session resumption: session state is persisted automatically and can be resumed at any time via step resume.
  • Read-only planning mode: run step exec --mode plan "..." so the assistant only reads the code and proposes a plan, which the user reviews and approves before any changes are applied.

Quick start

Requirements

  • macOS / Linux, Node.js 20+
  • A StepFun API key (a single key may be used for both the coding model and realtime voice; a different provider's key may be configured for the coding side if preferred)

Choose your region

StepFun operates two independent sites; pick the one that matches where your API key was issued. The two sites do not share accounts or keys.

| Region | Console | API endpoint | Installer | | --- | --- | --- | --- | | Mainland China (default) | https://platform.stepfun.com/ | https://api.stepfun.com | bash scripts/setup.sh | | Overseas | https://platform.stepfun.ai/ | https://api.stepfun.ai | bash scripts/setup-overseas.sh |

scripts/setup-overseas.sh runs the same flow as scripts/setup.sh and then rewrites ~/.step-cli/config.json so both the realtime WebSocket and the models-proxy base URL point at api.stepfun.ai. All other flags (--skip-build, --force-config, --uninstall, …) are forwarded verbatim.

Audio dependencies

scripts/setup.sh (and scripts/setup-overseas.sh) enables AEC by default and will detect or install Chrome automatically. In this default mode, audio capture and playback are handled by Chrome (BrowserAudioDriver), and no additional system-level audio utilities are required.

When AEC is disabled via step aec off (or falls back because Chrome is unavailable), realtime voice switches to the system command-line audio drivers, which require:

  • macOS: sox, installable via brew install sox
  • Linux: ALSA utilities arecord / aplay, typically provided by alsa-utils (e.g. sudo apt install alsa-utils)

One-shot install

git clone step-realtime-cli
cd step-realtime-cli

# Mainland China (platform.stepfun.com)
bash scripts/setup.sh

# Overseas (platform.stepfun.ai)
# bash scripts/setup-overseas.sh

The installer installs dependencies, builds the executable, registers step on your shell PATH, and initializes the voice components (VAD / AEC).

After installation completes, perform the following two steps:

1. Edit ~/.step-cli/config.json and replace the two apiKey placeholders with valid keys:

  • model.apiKey — coding model
  • voice.realtime.apiKey — realtime voice (ASR/TTS)
  • When using StepFun, the same value may be used for both fields

2. Open a new shell so that the updated PATH takes effect.

Then, from any directory:

step voice # realtime voice conversation
step # interactive text UI
step "summarize src/index.ts" # one-shot task

Uninstall

bash scripts/uninstall.sh

This removes the installed executable and PATH entry, while preserving ~/.step-cli/config.json and existing session history.

Voice mode

step voice

Once started, simply begin speaking. The assistant performs speech recognition, repository operations, and voice replies concurrently in realtime.

> Using headphones is strongly recommended: it significantly reduces echo and false triggering caused by speaker output being re-captured by the microphone, and improves both recognition accuracy and conversation stability.

Input modes

  • duplex (continuous, default): suitable for natural conversation; relies on VAD to determine when an utterance ends.
  • ptt (push-to-talk): more reliable in noisy environments.

VAD (voice activity detection)

The default mode is energy, which is suitable for quiet environments. For noisy or speaker-out setups, switch to the more accurate silero model:

step vad set silero # switch to silero
step vad status # show current selection

AEC (acoustic echo cancellation)

When speakers are used instead of headphones, TTS output may be re-captured by the microphone and cause feedback. Enabling AEC mitigates this issue:

step aec on # enable AEC
step aec status # show AEC status (also verifies Chrome availability)

AEC requires Chrome to be installed locally. On macOS, the CLI will suggest brew install --cask google-chrome if Chrome is not detected. AEC is not required when using headphones.

Speech rate

Adjust voice.defaults.speedRatio in ~/.step-cli/config.json. The valid range is 0.5 – 2.0, with a default of 1.1.

Common commands

step # launch the interactive UI in the current directory
step "look at this bug" # one-shot task
step voice # realtime voice conversation
step resume # resume a previous session
step exec --mode plan "..." # read-only planning mode (does not modify files)
step config show # display the effective configuration
step config sync --write # add newly introduced configuration fields after upgrade
step theme # export the current theme for customization

For the full command list, run step --help.

Configuration

All configuration resides in ~/.step-cli/config.json. Typical adjustments include:

  • Switch models: update model.model and model.apiKey
  • Update voice API key: update voice.realtime.apiKey
  • VAD / AEC: use the commands listed above rather than editing the JSON manually
  • After upgrade: run step config sync --write to populate newly added configuration fields (existing values are preserved)
step config path #…

Excerpt shown — open the source for the full document.