WritingOpenAIOpenAIpublished Dec 5, 2024seen 6d

OpenAI o1 System Card

Open original ↗

Captured source

source ↗
published Dec 5, 2024seen 6dcaptured 2dhttp 200method exa

OpenAI o1 System Card | OpenAI

Updated: December 5, 2024

OpenAI o1 System Card

Read the System Card Contributions

Loading…

Share

OpenAI o1 System Card

Key Areas of Evaluation

Disallowed Content

Training Data Regurgitation

Hallucinations

Bias

Preparedness Scorecard

Cybersecurity

Low

CBRN

Medium

Persuasion

Medium

Model Autonomy

Low

Scorecard ratings

  • Low
  • Medium
  • High
  • Critical

Only models with a post-mitigation score of "medium" or below can be deployed.Only models with a post-mitigation score of "high" or below can be developed further.

Improved or comparable performance from previous flagship OpenAI model

Introduction

The o1 model series is trained with large-scale reinforcement learning to reason using chain-of-thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignmentA. This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain-of-thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence. Our results underscore the need for building robust alignment methods, extensively stress-testing their efficacy, and maintaining meticulous risk management protocols. This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1‑mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations.

Model data and training

The o1 large language model family is trained with reinforcement learning to perform complex reasoning. o1 thinks before it answers—it can produce a long chain of thought before responding to the user. OpenAI o1 is the next model in this series (previously OpenAI o1‑preview), while OpenAI o1‑mini is a faster version of this model that is particularly effective at coding. Through training, the models learn to refine their thinking process, try different strategies, and recognize their mistakes. Reasoning allows o1 models to follow specific guidelines and model policies we’ve set, helping them act in line with our safety expectations. This means they are better at providing helpful answers and resisting attempts to bypass safety rules, to avoid producing unsafe or inappropriate content.

The two models were pre-trained on diverse datasets, including a mix of publicly available data, proprietary data accessed through partnerships, and custom datasets developed in-house, which collectively contribute to the models’ robust reasoning and conversational capabilities.

Select Public Data: Both models were trained on a variety of publicly available datasets, including web data and open-source datasets. Key components include reasoning data and scientific literature. This ensures that the models are well-versed in both general knowledge and technical topics, enhancing their ability to perform complex reasoning tasks.

Proprietary Data from Data Partnerships: To further enhance the capabilities of o1 and o1‑mini, we formed partnerships to access high-value non-public datasets. These proprietary data sources include paywalled content, specialized archives, and other domain-specific datasets that provide deeper insights into industry-specific knowledge and use cases.

Data Filtering and Refinement: Our data processing pipeline includes rigorous filtering to maintain data quality and mitigate potential risks. We use advanced data filtering processes to reduce personal information from training data. We also employ a combination of our Moderation API and safety classifiers to prevent the use of harmful or sensitive content, including explicit materials such as CSAM.

Scope of testing

As part of our commitment to iterative deployment, we continuously refine and improve our models. The evaluations described in this System Card pertain to the full family of o1 models, and exact performance numbers for the model used in production may vary slightly depending on system updates, final parameters, system prompt, and other factors.

More concretely, for o1, evaluations on the following checkpointsB are included:

  • o1‑near‑final‑checkpoint
  • o1‑dec5‑release

Between o1‑near‑final‑checkpoint and the releases thereafter, improvements included better format following and instruction following, which were incremental post-training improvements (the base model remained the same). We determined that prior frontier testing results are applicable for these improvements. Evaluations in our safety evaluations⁠, as well as chain-of-thought safety⁠ and multilingual evaluations⁠ were conducted on o1‑dec5‑release, while external red teaming and preparedness evaluations⁠ were conducted on o1‑near‑final‑checkpoint.C

Observed safety challenges and evaluations

In addition to advancing language model capabilities, the o1 family’s ability to reason in context provides new opportunities for improving the safety of the model. The o1 models are our most robust models to date, achieving substantial improvements on our hardest jailbreak evaluations. They are also more aligned to the OpenAI policy, reaching state-of-the-art performance on our hardest internal benchmarks for evaluating adherence to our content guidelines.

The o1 model family represents a transition from fast, intuitive thinking to now also using slower, more deliberate reasoning. While we find it exciting that reasoning can significantly improve the enforcement of our safety policies, we are mindful that these new capabilities could form the basis for dangerous applications. In this section, we outline the safety evaluations we conducted on this model, spanning harmfulness, jailbreak robustness, hallucinations, and bias evaluations. We then investigate risks involving the chain of thought itself, and describe our ongoing research on chain of thought deception monitoring. Finally, we detail the results of our external red teaming campaign.

Additionally, as part of our continued effort to partner with external experts, a set of pre-deployment evaluations were conducted on a version of the o1 model by the U.S. AI Safety Institute (US AISI) and the UK Safety Institute (UK AISI), not included…

Excerpt shown — open the source for the full document.

Notability

notability 8.0/10

Flagship model system card release.

OpenAI has a writing signal matching evals and quality, safety and policy.