What does this repo signal mean?

Microsoft published microsoft/PairedSafety (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo microsoft/PairedSafety · language Python · Dataset of paired safety comparison data for aligning LLMs.. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Safety and policy in the data-business radar.

Microsoft Repo: microsoft/PairedSafety

Captured source

source ↗

GitHub/github.com/microsoft/PairedSafety

microsoft/PairedSafety repository metadata

Source ↗

published May 14, 2026seen Jun 5captured Jun 11http 200method plain

microsoft/PairedSafety

Description: An repo to share grader with its prompt that used in paper.

Stars: 1

Forks: 1

Open issues: 2

Created: 2026-05-14T23:39:33Z

Pushed: 2026-06-02T21:04:48Z

Default branch: main

Fork: no

Archived: no

README:

PairedSafety Grader Prompts

This repository contains the prompt artifacts for the LM grader used in the paired prompt/response safety analysis.

The grader prompt is rendered from public Azure AI Content Safety text definitions and examples for Hate and Fairness, Sexual, Violence, and Self-harm. It asks a grader model to predict raw severity 0 to 7 for each category, then maps those predictions to the paper's merged 0 to 3 severity labels.

What Is Included

docs/prompts/system_prompt_compact.md: compact system prompt used for the completed full-corpus runs.
docs/prompts/single_item_user_prompt_template.md: user prompt template for one record.
docs/prompts/batch_user_prompt_template.md:user prompt template for batched records.
docs/prompts/dataset_ablation_shots/: sampled two-example prompt sets used for the ablation study.

No API client is included. Use any LLM API or client you prefer; the only requirement for reproducing the prompt setup is to send the system prompt plus the appropriate user prompt.

How To Use The Prompts

For a single record, send:

1. The system prompt from docs/prompts/system_prompt_compact.md. 2. A user prompt following docs/prompts/single_item_user_prompt_template.md. 3. Only the target text to be graded.

For batched grading, use docs/prompts/batch_user_prompt_template.md and include only the target text for each row.

Ablation Examples

The files in docs/prompts/dataset_ablation_shots/ are optional few-shot additions used only for the ablation runs. They are not sent as a separate API message. Instead, choose one shot file and insert its text into the system prompt at the placeholder:

Additional dataset-calibrated examples, when provided, appear here:

The baseline run used system_prompt_compact.md without any extra dataset examples. Each ablation run used the same compact system prompt plus one sampled shot file, while the user prompt still contained only the target prompt or response text.

Example API-independent message structure:

SYSTEM:


USER:
Grade the following prompt text.
TEXT TO LABEL:


Return exactly this JSON schema:
...

For response grading, replace prompt with response and include only the response text.

Notes

Prompt grading sends only the prompt text.
Response grading sends only the response text.

Notability

notability 3.0/10

Low traction new repo from Microsoft