RepoOpenAIOpenAIpublished Jun 23, 2025seen 6d

openai/gpt-oss

Python

Open original ↗

Captured source

source ↗
published Jun 23, 2025seen 6dcaptured 13hhttp 200method plain

openai/gpt-oss

Description: gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Language: Python

License: Apache-2.0

Stars: 20154

Forks: 2094

Open issues: 90

Created: 2025-06-23T16:43:33Z

Pushed: 2026-06-09T22:51:22Z

Default branch: main

Fork: no

Archived: no

README:

Try gpt-oss · Guides · Model card · OpenAI blog

Download gpt-oss-120b and gpt-oss-20b on Hugging Face

Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.

We're releasing two flavors of these open models:

  • gpt-oss-120b — for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters)
  • gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)

Both models were trained using our [harmony response format][harmony] and should only be used with this format; otherwise, they will not work correctly.

Table of Contents

  • [Highlights](#highlights)
  • [Inference examples](#inference-examples)
  • [About this repository](#about-this-repository)
  • [Setup](#setup)
  • [Download the model](#download-the-model)
  • [Reference PyTorch implementation](#reference-pytorch-implementation)
  • [Reference Triton implementation (single GPU)](#reference-triton-implementation-single-gpu)
  • [Reference Metal implementation](#reference-metal-implementation)
  • [Harmony format & tools](#harmony-format--tools)
  • [Clients](#clients)
  • [Tools](#tools)
  • [Other details](#other-details)
  • [Contributing](#contributing)

Highlights

  • Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
  • Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
  • Full chain-of-thought: Provides complete access to the model's reasoning process, facilitating easier debugging and greater trust in outputs. This information is not intended to be shown to end users.
  • Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
  • Agentic capabilities: Use the models' native capabilities for function calling, [web browsing](#browser), [Python code execution](#python), and Structured Outputs.
  • MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making gpt-oss-120b run on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and the gpt-oss-20b model run within 16GB of memory. All evals were performed with the same MXFP4 quantization.

Inference examples

Transformers

You can use gpt-oss-120b and gpt-oss-20b with the Transformers library. If you use Transformers' chat template, it will automatically apply the [harmony response format][harmony]. If you use model.generate directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony][harmony] package.

from transformers import pipeline
import torch

model_id = "openai/gpt-oss-120b"

pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype="auto",
device_map="auto",
)

messages = [
{"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]

outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Learn more about how to use gpt-oss with Transformers.

vLLM

vLLM recommends using `uv` for Python dependency management. You can use vLLM to spin up an OpenAI-compatible web server. The following command will automatically download the model and start the server.

uv pip install --pre vllm==0.10.1+gptoss \
--extra-index-url https://wheels.vllm.ai/gpt-oss/ \
--extra-index-url https://download.pytorch.org/whl/nightly/cu128 \
--index-strategy unsafe-best-match

vllm serve openai/gpt-oss-20b

Learn more about how to use gpt-oss with vLLM.

Offline Serve Code:

  • run this code after installing proper libraries as described, while additionally installing this:
  • uv pip install openai-harmony
# source .oss/bin/activate

import os
os.environ["VLLM_USE_FLASHINFER_SAMPLER"] = "0"

import json
from openai_harmony import (
HarmonyEncodingName,
load_harmony_encoding,
Conversation,
Message,
Role,
SystemContent,
DeveloperContent,
)

from vllm import LLM, SamplingParams
import os

# --- 1) Render the prefill with Harmony ---
encoding = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS)

convo = Conversation.from_messages(
[
Message.from_role_and_content(Role.SYSTEM, SystemContent.new()),
Message.from_role_and_content(
Role.DEVELOPER,
DeveloperContent.new().with_instructions("Always respond in riddles"),
),
Message.from_role_and_content(Role.USER, "What is the weather like in SF?"),
]
)

prefill_ids = encoding.render_conversation_for_completion(convo, Role.ASSISTANT)

# Harmony stop tokens (pass to sampler so they won't be included in output)
stop_token_ids = encoding.stop_tokens_for_assistant_actions()

# --- 2) Run vLLM with prefill ---
llm = LLM(
model="openai/gpt-oss-20b",
trust_remote_code=True,
gpu_memory_utilization = 0.95,
max_num_batched_tokens=4096,
max_model_len=5000,
tensor_parallel_size=1
)

sampling = SamplingParams(
max_tokens=128,
temperature=1,
stop_token_ids=stop_token_ids,
)

outputs = llm.generate(
prompt_token_ids=[prefill_ids], # batch of size 1
sampling_params=sampling,
)

# vLLM gives you both text and token IDs
gen = outputs[0].outputs[0]
text = gen.text
output_tokens = gen.token_ids # `xcode-select --install`
- On Linux: These reference implementations require CUDA
- On Windows: These reference implementations have not been tested on Windows. Try using solutions like Ollama if you are trying to run the model locally.

### Installation

If you want to try any of the code you can install it directly from [PyPI](https://pypi.org/project/gpt-oss/)

if you just need the tools

pip install gpt-oss

if you want to try the torch implementation

pip install gpt-oss[torch]

if you want to try the triton implementation

pip install gpt-oss[triton]

If you want to modify the code or try the metal…

Excerpt shown — open the source for the full document.

Notability

notability 9.0/10

OpenAI open-sources GPT model, massive stars.