RepoOpenAIOpenAIpublished Jul 19, 2024seen 6d

openai/safety-rbr-code-and-data

Jupyter Notebook

Open original ↗

Captured source

source ↗
published Jul 19, 2024seen 6dcaptured 9hhttp 200method plain

openai/safety-rbr-code-and-data

Description: Code and example data for the paper: Rule Based Rewards for Language Model Safety

Language: Jupyter Notebook

License: MIT

Stars: 208

Forks: 22

Open issues: 1

Created: 2024-07-19T23:15:09Z

Pushed: 2024-07-19T23:24:59Z

Default branch: main

Fork: no

Archived: no

README:

Safety RBR Gold Dataset and Weight Fitting Code

Warning: Content may include language related to racism, erotic themes, self-harm, or other offensive material.

This directory contains complementary code and data for the paper: Rule Based Rewards for Language Model Safety

It contains:

  • Our Safety RBR gold dataset, the small set of human data we used in the this experiment. This dataset was used for prompt tuning and calculating the accuracy of prompt+LLM grader (ex. Table 13 in the paper.) The data lives in data/rbr_gold_data/ and the notebook analyze_RBR_gold_data.ipynb gives further examples for loading the data.
  • Our code for fitting the RBR weights (rbr_weight_fitter.py) along with an example weight_fitting_example.ipynb of usage and visualization.
  • Some example synthetic data and reward model scores to demonstrate the usage of the weight fitting code (data/weight_fitting_data/)

A good starting place is the two notebooks we provide:

Notebooks

1. Weight Fitting Example (weight_fitting_example.ipynb): This notebook provides an example of using the RBR weight fitting code given (rbr_weight_fitter.py) using the example synthetic data we provide. It demonstrates how to load data, fit weights, and visualize the results. 2. RBR Gold Data (rbr_gold_data.ipynb): This notebook covers the RBR Gold dataset, a small set of human-labelled data used for prompt tuning and prompt+LLM grader accuracy calculations. It includes example code for loading the data and some very basic statistical analysis.

License

We are releasing this code and data under the MIT License.

Notability

notability 6.0/10

OpenAI safety research repo, moderate traction.

OpenAI has a repo signal matching data demand, safety and policy.