How should AI systems behave, and who should decide?
Captured source
source ↗How should AI systems behave, and who should decide? | OpenAI
February 16, 2023
How should AI systems behave, and who should decide?
We’re clarifying how ChatGPT’s behavior is shaped and our plans for improving that behavior, allowing more user customization, and getting more public input into our decision-making in these areas.
Illustration: Justin Jay Wang
Loading…
Share
OpenAI’s mission is to ensure that artificial general intelligence (AGI)A benefits all of humanity. We therefore think a lot about the behavior of AI systems we build in the run-up to AGI, and the way in which that behavior is determined. Since our launch of ChatGPT, users have shared outputs that they consider politically biased, offensive, or otherwise objectionable. In many cases, we think that the concerns raised have been valid and have uncovered real limitations of our systems which we want to address. We’ve also seen a few misconceptions about how our systems and policies work together to shape the outputs you get from ChatGPT.
Below, we summarize:
- How ChatGPT’s behavior is shaped;
- How we plan to improve ChatGPT’s default behavior;
- Our intent to allow more system customization; and
- Our efforts to get more public input on our decision-making.
Where we are today
Unlike ordinary software, our models are massive neural networks. Their behaviors are learned from a broad range of data, not programmed explicitly. Though not a perfect analogy, the process is more similar to training a dog than to ordinary programming. An initial “pre-training” phase comes first, in which the model learns to predict the next word in a sentence, informed by its exposure to lots of Internet text (and to a vast array of perspectives). This is followed by a second phase in which we “fine-tune” our models to narrow down system behavior.
As of today, this process is imperfect. Sometimes the fine-tuning process falls short of our intent (producing a safe and useful tool) and the user’s intent (getting a helpful output in response to a given input). Improving our methods for aligning AI systems with human values is a top priority for our company, particularly as AI systems become more capable.
A two step process: Pre-training and fine-tuning
The two main steps involved in building ChatGPT work as follows:
First, we “pre-train” models by having them predict what comes next in a big dataset that contains parts of the Internet. They might learn to complete the sentence “instead of turning left, she turned ___.” By learning from billions of sentences, our models learn grammar, many facts about the world, and some reasoning abilities. They also learn some of the biases present in those billions of sentences.
Then, we “fine-tune” these models on a more narrow dataset that we carefully generate with human reviewers who follow guidelines that we provide them. Since we cannot predict all the possible inputs that future users may put into our system, we do not write detailed instructions for every input that ChatGPT will encounter. Instead, we outline a few categories in the guidelines that our reviewers use to review and rate possible model outputs for a range of example inputs. Then, while they are in use, the models generalize from this reviewer feedback in order to respond to a wide array of specific inputs provided by a given user.
The role of reviewers and OpenAI’s policies in system development
In some cases, we may give guidance to our reviewers on a certain kind of output (for example, “do not complete requests for illegal content”). In other cases, the guidance we share with reviewers is more high-level (for example, “avoid taking a position on controversial topics”). Importantly, our collaboration with reviewers is not one-and-done—it’s an ongoing relationship, in which we learn a lot from their expertise.
A large part of the fine-tuning process is maintaining a strong feedback loop with our reviewers, which involves weekly meetings to address questions they may have, or provide clarifications on our guidance. This iterative feedback process is how we train the model to be better and better over time.
Addressing biases
Many are rightly worried about biases in the design and impact of AI systems. We are committed to robustly addressing this issue and being transparent about both our intentions and our progress. Towards that end, we are sharing a portion of our guidelines that pertain to political and controversial topics. Our guidelines are explicit that reviewers should not favor any political group. Biases that nevertheless may emerge from the process described above are bugs, not features.
While disagreements will always exist, we hope sharing this blog post and these instructions will give more insight into how we view this critical aspect of such a foundational technology. It’s our belief that technology companies must be accountable for producing policies that stand up to scrutiny.
We’re always working to improve the clarity of these guidelines—and based on what we’ve learned from the ChatGPT launch so far, we’re going to provide clearer instructions to reviewers about potential pitfalls and challenges tied to bias, as well as controversial figures and themes. Additionally, as part of ongoing transparency initiatives, we are working to share aggregated demographic information about our reviewers in a way that doesn’t violate privacy rules and norms, since this is an additional source of potential bias in system outputs.
We are currently researching how to make the fine-tuning process more understandable and controllable, and are building on external advances such as rule based rewards and Constitutional AI.
Where we’re going: The building blocks of future systems
In pursuit of our mission, we’re committed to ensuring that access to, benefits from, and influence over AI and AGI are widespread. We believe there are at least three building blocks required in order to achieve these goals in the context of AI system behavior.B
1. Improve default behavior. We want as many users as possible to find our AI systems useful to them “out of the box” and to feel that our technology understands and respects their values.
Towards that end, we are investing in research and engineering to reduce both glaring and subtle biases in how ChatGPT responds to different inputs. In some cases ChatGPT currently refuses outputs that it shouldn’t, and in some cases, it doesn’t refuse when it…
Excerpt shown — open the source for the full document.