microsoft/social-reasoning-bench
Python
Captured source
source ↗microsoft/social-reasoning-bench
Description: A benchmark to evaluate AI Agents in social domains.
Language: Python
License: MIT
Stars: 16
Forks: 4
Open issues: 9
Created: 2026-04-27T14:05:46Z
Pushed: 2026-06-10T03:32:32Z
Default branch: main
Fork: no
Archived: no
README:
SocialReasoningBench


Evaluate the social reasoning capabilities of LLM agents in multi-party environments.
Install
Requires Python 3.11+ and uv.
git clone https://github.com/microsoft/social-reasoning-bench.git srbench cd srbench uv sync --all-packages --all-groups --all-extras source .venv/bin/activate
Usage
Evaluate the social reasoning ability of your own LLM. For example's sake, we'll assume your LLM is served as my-model via an OpenAI compatible endpoint at http://localhost:8000.
# To reproduce our results use Gemini as the counterparty. GEMINI_API_KEY= # Run the v0.1.0 experiment sweep with your model as the assistant srbench experiment experiments/v0.1.0 \ --output-base outputs/my-model --assistant-model openai/my-model \ --assistant-base-url http://localhost:8000/v1 \ --assistant-api-key none # To just test a few examples per experiment in the sweep # --set limit=10 # View the results srbench dashboard outputs/my-model
See [Installation](/installation), [Experiments](/experiments.md), and [LLMs](/llm.md) for detailed instructions.
Notability
notability 3.0/10Low traction, routine repo release.