openai/gpt-oss
Python
Captured source
source ↗openai/gpt-oss
Description: gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Language: Python
License: Apache-2.0
Stars: 20154
Forks: 2094
Open issues: 90
Created: 2025-06-23T16:43:33Z
Pushed: 2026-06-09T22:51:22Z
Default branch: main
Fork: no
Archived: no
README:
Try gpt-oss · Guides · Model card · OpenAI blog
Download gpt-oss-120b and gpt-oss-20b on Hugging Face
Welcome to the gpt-oss series, OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases.
We're releasing two flavors of these open models:
gpt-oss-120b— for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters)gpt-oss-20b— for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters)
Both models were trained using our [harmony response format][harmony] and should only be used with this format; otherwise, they will not work correctly.
Table of Contents
- [Highlights](#highlights)
- [Inference examples](#inference-examples)
- [About this repository](#about-this-repository)
- [Setup](#setup)
- [Download the model](#download-the-model)
- [Reference PyTorch implementation](#reference-pytorch-implementation)
- [Reference Triton implementation (single GPU)](#reference-triton-implementation-single-gpu)
- [Reference Metal implementation](#reference-metal-implementation)
- [Harmony format & tools](#harmony-format--tools)
- [Clients](#clients)
- [Tools](#tools)
- [Other details](#other-details)
- [Contributing](#contributing)
Highlights
- Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment.
- Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs.
- Full chain-of-thought: Provides complete access to the model's reasoning process, facilitating easier debugging and greater trust in outputs. This information is not intended to be shown to end users.
- Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning.
- Agentic capabilities: Use the models' native capabilities for function calling, [web browsing](#browser), [Python code execution](#python), and Structured Outputs.
- MXFP4 quantization: The models were post-trained with MXFP4 quantization of the MoE weights, making
gpt-oss-120brun on a single 80GB GPU (like NVIDIA H100 or AMD MI300X) and thegpt-oss-20bmodel run within 16GB of memory. All evals were performed with the same MXFP4 quantization.
Inference examples
Transformers
You can use gpt-oss-120b and gpt-oss-20b with the Transformers library. If you use Transformers' chat template, it will automatically apply the [harmony response format][harmony]. If you use model.generate directly, you need to apply the harmony format manually using the chat template or use our [openai-harmony][harmony] package.
from transformers import pipeline
import torch
model_id = "openai/gpt-oss-120b"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype="auto",
device_map="auto",
)
messages = [
{"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]
outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])Learn more about how to use gpt-oss with Transformers.
vLLM
vLLM recommends using `uv` for Python dependency management. You can use vLLM to spin up an OpenAI-compatible web server. The following command will automatically download the model and start the server.
uv pip install --pre vllm==0.10.1+gptoss \ --extra-index-url https://wheels.vllm.ai/gpt-oss/ \ --extra-index-url https://download.pytorch.org/whl/nightly/cu128 \ --index-strategy unsafe-best-match vllm serve openai/gpt-oss-20b
Learn more about how to use gpt-oss with vLLM.
Offline Serve Code:
- run this code after installing proper libraries as described, while additionally installing this:
uv pip install openai-harmony
# source .oss/bin/activate
import os
os.environ["VLLM_USE_FLASHINFER_SAMPLER"] = "0"
import json
from openai_harmony import (
HarmonyEncodingName,
load_harmony_encoding,
Conversation,
Message,
Role,
SystemContent,
DeveloperContent,
)
from vllm import LLM, SamplingParams
import os
# --- 1) Render the prefill with Harmony ---
encoding = load_harmony_encoding(HarmonyEncodingName.HARMONY_GPT_OSS)
convo = Conversation.from_messages(
[
Message.from_role_and_content(Role.SYSTEM, SystemContent.new()),
Message.from_role_and_content(
Role.DEVELOPER,
DeveloperContent.new().with_instructions("Always respond in riddles"),
),
Message.from_role_and_content(Role.USER, "What is the weather like in SF?"),
]
)
prefill_ids = encoding.render_conversation_for_completion(convo, Role.ASSISTANT)
# Harmony stop tokens (pass to sampler so they won't be included in output)
stop_token_ids = encoding.stop_tokens_for_assistant_actions()
# --- 2) Run vLLM with prefill ---
llm = LLM(
model="openai/gpt-oss-20b",
trust_remote_code=True,
gpu_memory_utilization = 0.95,
max_num_batched_tokens=4096,
max_model_len=5000,
tensor_parallel_size=1
)
sampling = SamplingParams(
max_tokens=128,
temperature=1,
stop_token_ids=stop_token_ids,
)
outputs = llm.generate(
prompt_token_ids=[prefill_ids], # batch of size 1
sampling_params=sampling,
)
# vLLM gives you both text and token IDs
gen = outputs[0].outputs[0]
text = gen.text
output_tokens = gen.token_ids # `xcode-select --install`
- On Linux: These reference implementations require CUDA
- On Windows: These reference implementations have not been tested on Windows. Try using solutions like Ollama if you are trying to run the model locally.
### Installation
If you want to try any of the code you can install it directly from [PyPI](https://pypi.org/project/gpt-oss/)if you just need the tools
pip install gpt-oss
if you want to try the torch implementation
pip install gpt-oss[torch]
if you want to try the triton implementation
pip install gpt-oss[triton]
If you want to modify the code or try the metal…
Excerpt shown — open the source for the full document.
Notability
notability 9.0/10OpenAI open-sources GPT model, massive stars.