WritingAnthropicAnthropicpublished Oct 17, 2023seen 2d

Collective Constitutional Ai Aligning A Language Model With Public Input

Open original ↗

Captured source

source ↗

Collective Constitutional AI: Aligning a Language Model with Public Input \ Anthropic Policy Societal Impacts Collective Constitutional AI: Aligning a Language Model with Public Input Oct 17, 2023

Anthropic and the Collective Intelligence Project recently ran a public input process involving ~1,000 Americans to draft a constitution for an AI system. We did this to explore how democratic processes can influence AI development. In our experiment, we discovered areas where people both agreed with our in-house constitution , and areas where they had different preferences. In this post, we share the resulting publicly sourced constitution, as well as what happened when we trained a new AI system against it using Constitutional AI.

Constitutional AI (CAI) is an Anthropic-developed method for aligning general purpose language models to abide by high-level normative principles written into a constitution. Anthropic’s language model Claude currently relies on a constitution curated by Anthropic employees. This constitution takes inspiration from outside sources like the United Nations Universal Declaration of Human Rights, as well as our own firsthand experience interacting with language models to make them more helpful and harmless.

While Constitutional AI is useful for making the normative values of our AI systems more transparent, it also highlights the outsized role we as developers play in selecting these values—after all, we wrote the constitution ourselves. That is why for this research, we were eager to curate a constitution using the preferences of a large number of people who do not work at Anthropic. We believe that our work may be one of the first instances in which members of the public have collectively directed the behavior of a language model via an online deliberation process. We hope that sharing our very preliminary efforts and findings will help others learn from our successes and failures, and help build upon this work. Designing a Public Input Process to Collectively Draft a Constitution Anthropic partnered with the Collective Intelligence Project to run a public input process using the Polis platform. Polis is an open-source platform for running online deliberative processes augmented by machine learning algorithms. It has been used all over the world by governments, academics, independent media, and citizens to understand what large groups of people think.

We asked approximately 1,000 members of the American public to “Help us pick rules for our AI Chatbot!” (Figure 1). We sought a representative sample of U.S. adults across age, gender, income, and geography (anonymized participant demographics can be found here ). Participants could either vote on existing rules (normative principles), or add their own. In total, participants contributed 1,127 statements to the Polis, and cast 38,252 votes (an average of 34 votes per person). In general, we found a high degree of consensus on most statements, though Polis did identify two separate opinion groups (Figure 2).

Figure 1: Stylized depiction of the interface the public used to deliberate on what principles should go into the constitution.

Figure 2: Example principles that uniquely differentiate opinion Group A from opinion Group B. (Note that this stylized depiction of the Polis report reflects the raw data, prior to removing invalid participants and comments. Final numbers differ slightly.) Analyzing the Publicly Sourced Constitution The raw results can be seen in the Polis Report . We processed this data to remove invalid participants and comments (the code and data before and after processing is hosted here ). We kept all the statements that passed a threshold of consensus within both groups to produce a constitution that we could train a model against. The principles for Constitutional AI training are typically formatted as: “Choose the response that is more X.” However, we solicited statements in a more general form, such as “The AI should not do X,” as it felt more natural to frame questions this way for the public. As a result, we had to do some subjective mapping between public statements and CAI-ready public principles . You can see our publicly sourced constitution here , along with an analysis of how it is similar to and different from the Anthropic-written constitution used to train our language model Claude. While there was a moderate degree of overlap between the two constitutions (roughly 50% overlap in concepts and values), a few key differences stood out: principles in the public constitution appear to largely be self-generated and not sourced from existing publications, they focus more on objectivity and impartiality, they place a greater emphasis on accessibility, and in general, tend to promote desired behavior rather than avoid undesired behavior.

Example public principles similar to the principles in the Anthropic-written constitution:

“Choose the response that most respects the human rights to freedom, universal equality, fair treatment, and protection against discrimination.” “Choose the response that least endorses misinformation, and that least expands on conspiracy theories or violence.”

Example public principles that do not closely match principles in the Anthropic-written constitution:

“Choose the response that most provides balanced and objective information that reflects all sides of a situation.” “Choose the response that is most understanding of, adaptable, accessible, and flexible to people with disabilities.”

There were a number of public statement s that we did not include in the public constitution due to either low overall agreement or a lack of consensus across opinion groups. Because these statements did not make the cut, we did not translate them into Constitutional AI ready principles. Example public statements that did not make it into the public constitution due to low overall agreement: “AI should not be trained with the principles of DEI [diversity, equity, and inclusion]” “AI should not give advice” “AI should be an ordained minister” “AI should have emotion”

Examples of conflicting public statements that did not make it into the public constitution due to lack of consensus across the opinion groups: “The AI should prioritize the interests of the collective or common good over individual preferences or rights.” “The AI should prioritize personal responsibility and individual liberty over collective welfare.”

Training and Evaluating a Model Aligned with…

Excerpt shown — open the source for the full document.