What does this repo signal mean?

Microsoft published microsoft/social-reasoning-bench (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo microsoft/social-reasoning-bench · language Python · Benchmark for evaluating AI's social reasoning skills.. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Evals and quality in the data-business radar.

Microsoft Repo: microsoft/social-reasoning-bench

Captured source

source ↗

GitHub/github.com/microsoft/social-reasoning-bench

microsoft/social-reasoning-bench repository metadata

Source ↗

published Apr 27, 2026seen Jun 5captured Jun 11http 200method plain

microsoft/social-reasoning-bench

Description: A benchmark to evaluate AI Agents in social domains.

Language: Python

License: MIT

Stars: 16

Forks: 4

Open issues: 9

Created: 2026-04-27T14:05:46Z

Pushed: 2026-06-10T03:32:32Z

Default branch: main

Fork: no

Archived: no

README:

SocialReasoningBench

![Docs](https://github.io/microsoft/social-reasoning-bench)

![hero](docs/vitepress/public/hero-light.png)

Evaluate the social reasoning capabilities of LLM agents in multi-party environments.

Install

Requires Python 3.11+ and uv.

git clone https://github.com/microsoft/social-reasoning-bench.git srbench
cd srbench
uv sync --all-packages --all-groups --all-extras
source .venv/bin/activate

Usage

Evaluate the social reasoning ability of your own LLM. For example's sake, we'll assume your LLM is served as my-model via an OpenAI compatible endpoint at http://localhost:8000.

# To reproduce our results use Gemini as the counterparty.
GEMINI_API_KEY=

# Run the v0.1.0 experiment sweep with your model as the assistant
srbench experiment experiments/v0.1.0 \
--output-base outputs/my-model
--assistant-model openai/my-model \
--assistant-base-url http://localhost:8000/v1 \
--assistant-api-key none
# To just test a few examples per experiment in the sweep
# --set limit=10

# View the results
srbench dashboard outputs/my-model

See [Installation](/installation), [Experiments](/experiments.md), and [LLMs](/llm.md) for detailed instructions.

Notability

notability 3.0/10

Low traction, routine repo release.