Operator System Card
Captured source
source ↗Operator System Card | OpenAI
January 23, 2025
Operator System Card
This report outlines the safety work carried out prior to releasing Operator including external red teaming, frontier risk evaluations according to our Preparedness Framework, and an overview of the mitigations we built in to address key risk areas.
Read the System Card Contributions
Loading…
Share
Operator System Card
Specific areas of risk
Harmful tasks
Model mistakes
Prompt injections
Preparedness Scorecard
CBRN
Low
Cybersecurity
Low
Persuasion
Medium
Model autonomy
Low
Scorecard ratings
- Low
- Medium
- High
- Critical
Only models with a post-mitigation score of "medium" or below can be deployed.Only models with a post-mitigation score of "high" or below can be developed further.
Introduction
Operator is a research preview of our Computer-Using Agent (CUA) model, which combines GPT‑4o’s vision capabilities with advanced reasoning through reinforcement learning. It interprets screenshots and interacts with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a computer screen—just as people do. Operator’s ability to use a computer enables it to interact with the same tools and interfaces that people rely on daily, unlocking the potential to assist with an unparalleled range of tasks.
Users can direct Operator to perform a wide variety of everyday tasks using a browser (e.g., ordering groceries, booking reservations, purchasing event tickets) all under the direction and oversight of the user. This represents an important step towards a future where ChatGPT is not only capable of answering questions, but can take actions on a user’s behalf.
While Operator has the potential to broaden access to technology, its capabilities introduce additional risk vectors. These include vulnerabilities like prompt injection attacks where malicious instructions in third-party websites can mislead the model away from the user’s intended actions. There’s also the possibility of the model making mistakes that are challenging to reverse or being used to execute harmful or disallowed tasks at a user’s request. To address these risks, we have implemented a multi-layered approach to safety, including proactive refusals of high-risk tasks, confirmation prompts before critical actions, and active monitoring systems to detect and mitigate potential threats.
Drawing on OpenAI’s established safety frameworks and the safety work already conducted for the underlying GPT‑4o model1, this system card details our multi-layered approach for testing and deploying Operator safely. It outlines the risk areas we identified and the model and product mitigations we implemented to address novel vulnerabilities.
Model data and training
As discussed in our accompanying research blog post 2, Operator is trained to use a computer in the same way a person would use one: by visually perceiving the computer screen and using a cursor and keyboard. We use a combination of supervised learning on specialized data and reinforcement learning to achieve this goal. Supervised learning teaches the model the base level of perception and input control needed to read computer screens and accurately click on user interface elements. Reinforcement learning then gives the model important, higher-level capabilities such as reasoning, error correction, and the ability to adapt to unexpected events.
Operator was trained on diverse datasets, including select publicly available data, mostly collected from industry-standard machine learning datasets and web crawls, as well as datasets developed by human trainers that demonstrated how to solve tasks on a computer.
Risk identification
To thoroughly understand the risks associated with enabling a model to take actions on the internet on behalf of the user, we performed a comprehensive evaluation informed by previous deployments, third-party red teaming exercises, and internal testing. We also incorporated feedback from Legal, Security, and Policy teams, aiming to identify both immediate and emerging challenges.
Policy creation
We assessed user goals (referred to as “tasks”) and the steps a model could take to fulfill those user goals (referred to as “actions”) to identify risky tasks and actions and develop mitigating safeguards. Our intention is to ensure the model refuses unsafe tasks and gives the user appropriate oversight and control over its actions.
In developing policy, we categorized tasks and actions by their risk severity, considering the potential for harm to the user or others, and the ease of reversing any negative outcomes. For instance, a user task might be to purchase a pair of new shoes, which involves actions like searching online for shoes, proceeding to the retailer’s checkout page, and completing the purchase on the user’s behalf. If the wrong pair of shoes is purchased, the action could inconvenience and frustrate the user. To address such risks, we created a policy requiring safeguards for risky actions like completing a purchase.
These safeguards include measures like requiring human oversight at key steps and explicit confirmation before proceeding on certain actions. This approach applies to model actions such as conducting financial transactions, sending emails, deleting calendar events, and more to ensure users maintain visibility and control when assisted by the model. In some cases where the risk is determined to be too significant, we fully restrict the model from assisting with certain tasks, such as selling or purchasing stocks.
We aim to mitigate potential risks to users and others by encouraging the model to adhere to this policy of human-in-the-loop safeguards across tasks and actions (detailed in the Risk Mitigation section below).
Red teaming
OpenAI engaged a cohort of vetted external red teamers located across twenty countries and fluent in two dozen languages to test the model’s capabilities, safety measures, and resilience against adversarial inputs. Prior to external red teaming, OpenAI first conducted an internal red teaming exercise with representatives from our Safety, Security and Product teams. The goal was to identify potential risks using a model with no model-level or product-level mitigations in place, and red teamers were instructed to intervene before the model could cause any real-world harm. Based on the findings from that internal exercise, we added initial safety mitigations…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10Substantive research post from OpenAI
OpenAI has a writing signal matching evals and quality, safety and policy, product and customer.