RepoInclusionAI (Ant Group)InclusionAI (Ant Group)published Jun 30, 2025seen 5d

inclusionAI/ABench

Python

Open original ↗

Captured source

source ↗
published Jun 30, 2025seen 5dcaptured 9hhttp 200method plain

inclusionAI/ABench

Description: ABench is an evolving open-source benchmark suite designed to rigorously evaluate and enhance Large Language Models (LLMs) on complex cross-domain tasks.

Language: Python

License: Apache-2.0

Stars: 27

Forks: 1

Open issues: 1

Created: 2025-06-30T09:43:51Z

Pushed: 2026-04-17T11:04:50Z

Default branch: main

Fork: no

Archived: no

README:

ABench

🌟 Overview

ABench is an evolving open-source benchmark suite designed to rigorously evaluate and enhance Large Language Models (LLMs) on complex cross-domain tasks. By targeting current model weaknesses, ABench provides systematic challenges in high-difficulty specialized domains, including physics, actuarial science, logical reasoning, law, and psychology.

🎯 Core Objectives

1. Address Evaluation Gaps: Design high-differentiation assessment tasks targeting underperforming question types 2. Establish Unified Standards: Create reliable, comparable benchmarks for multi-domain LLM evaluation 3. Expand Capability Boundaries: Drive continuous optimization of knowledge systems and reasoning mechanisms through challenging innovative problems

📊 Dataset Release Status

| Domain | Description | Status | |----------------|------------------------------------------------------------------------------------------------------------------|------------------------------------| | Physics | 500 university/competition-level physics problems (400 static + 100 dynamic parametric variants) covering 10+ fields from classical mechanics to modern physics | [✅ Released](Physics/README.md) | | Actuary | Curated actuarial exam problems covering core topics: probability statistics, financial mathematics, life/non-life insurance, actuarial models, and risk management | [✅ Released](Actuary/README.md) | | Logic | High-differentiation logical reasoning problems from authoritative tests (LSAT/GMAT/GRE/SBI/Chinese Civil Service Exam) | [✅ Released](Logic/README.md) | | Psychology | Psychological case studies and research questions (objective/subjective) evaluating understanding of human behavior and theories | [✅ Released](Psychology/README.md) | | Law | Authoritative judicial exam materials covering core legal domains: criminal/civil/administrative/procedural/international law | [✅ Released](Law/README.md) |

Notability

notability 2.0/10

Low-star new repo, trivial.