What does this repo signal mean?

Microsoft published microsoft/BC-Bench (Jupyter Notebook). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo microsoft/BC-Bench · language Jupyter Notebook · New benchmark repo with low stars. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Microsoft Repo: microsoft/BC-Bench

Captured source

source ↗

GitHub/github.com/microsoft/BC-Bench

microsoft/BC-Bench repository metadata

Source ↗

published Sep 29, 2025seen Jun 8captured Jun 11http 200method plain

microsoft/BC-Bench

Description: Inspired by SWE-Bench, for Business Central (AL) ecosystem.

Language: Jupyter Notebook

License: MIT

Stars: 36

Forks: 15

Open issues: 9

Created: 2025-09-29T07:40:02Z

Pushed: 2026-06-11T00:06:20Z

Default branch: main

Fork: no

Archived: no

README:

BC-Bench

![Dataset Validation and Verification](https://github.com/microsoft/BC-Bench/actions/workflows/dataset-validation.yml) ![CI](https://github.com/microsoft/BC-Bench/actions/workflows/CI.yml)

A benchmark for evaluating coding agents on real-world Business Central (AL) development tasks, inspired by SWE-Bench.

Purpose

BC-Bench provides a reproducible evaluation framework for coding agents working on real-world Business Central development tasks:

Measure performance of different models on authentic AL issues
Quantify impact of tooling changes (MCP servers, custom instructions, custom agents, etc)
Track progress with transparent, comparable metrics over time
Rapidly iterate on agent configurations and setups

Dataset

We follow the SWE-Bench schema with BC-specific adjustments:

environment_setup_commit and version are combined into environment_setup_version
project_paths to enumerate AL project roots touched by the fix
problem_statement and hints_text are not included in the jsonl file but stored under [problemstatement](/dataset/problemstatement/) for screenshots in repro steps

Agents Under Evaluation

GitHub Copilot CLI

The GitHub Copilot CLI supports MCP servers, tools, and agent mode. It closely simulates real developers' workflow (both VS Code and Coding Agent), making it an ideal candidate for evaluating automated workflows.

Claude Code

Claude Code is Anthropic's agentic coding tool. It supports MCP servers, custom system prompts, and agent mode. BC-Bench integrates with Claude Code using the same shared configuration as Copilot.

Getting Started

BC-Bench is open source, and you're welcome to fork and adapt it for your own use. We are not accepting external contributions in this repository at this time. You can run evaluations locally and replace the dataset under dataset/ with tasks from your own codebase.

Documentation map

[CONTRIBUTING.md](CONTRIBUTING.md) — fork setup, repo layout, versioning, day-to-day maintainer ops
[EXPERIMENT.md](EXPERIMENT.md) — run an experiment (toggle instructions / skills / agents / MCP / model) against an existing category
[CATEGORIES.md](CATEGORIES.md) — add a new evaluation category (e.g. code-review) alongside bug-fix / test-generation

Notability

notability 3.0/10

New benchmark repo with low stars