What does this repo signal mean?

Amazon (Nova) published amazon-science/acibench-hallucination-annotations. This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo amazon-science/acibench-hallucination-annotations · Low traction research repo. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Data demand in the data-business radar.

Amazon (Nova) Repo: amazon-science/acibench-hallucination-annotations

Captured source

source ↗

GitHub/github.com/amazon-science/acibench-hallucination-annotations

amazon-science/acibench-hallucination-annotations repository metadata

Source ↗

published May 23, 2025seen 5dcaptured 10hhttp 200method plain

amazon-science/acibench-hallucination-annotations

Description: Expert hallucination labels from the ACI-bench dataset

License: CC-BY-4.0

Stars: 7

Forks: 0

Open issues: 0

Created: 2025-05-23T16:45:26Z

Pushed: 2025-05-23T16:51:49Z

Default branch: main

Fork: no

Archived: no

README:

Natural Hallucination Dataset - ACI-bench Clinical Note Hallucination Annotations

This repository contains expert-annotated hallucination labels from the ACI-bench dataset for evaluating hallucination detection in medical text summarization.

Dataset Overview

The Natural Hallucination (NH) dataset contains expert annotations of hallucinations in clinical summaries, focused on SOAP notes from the ACI-bench collection of clinical conversations.

Annotation Categories & Counts

Expert clinical scribes annotated statements into 4 categories with the following distribution:

No Error: 12,365
Hallucination: 106
Inference: 87
Misunderstanding: 72

Error Severity Distribution

The errors were classified by severity:

Low Severity: 138
High Severity: 87
Not Medically Relevant (NMR): 40

High Severity Categories

The following categories are marked as high severity errors:

Diagnosis
Exam Findings
Lab Testing and Imaging
Medical History
Symptoms
Treatment Plan

Age & Sex errors are considered low severity.

Dataset Format

The released dataset contains:

Original ACI-bench conversation transcripts
Expert annotations of factual errors marked by category
Severity labels for each error
Aggregated error scores per subject

Usage

The annotations can be used to:

Evaluate hallucination detection methods
Analyze different types of factual errors in clinical summarization
Study high vs low severity errors in medical text generation

Citation

If you use this dataset, please cite:

Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization. BN, S., Shing, H.-C., Xu, L., Strong, M., Burnsky, J., Ofor, J., Mason, J. R., Chen, S., Srinivasan, S., Shivade, C., Moriarty, J., & Cohen, J. P. Interspeech 2025

@inproceedings{BN2024fact,
title={Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization},
author={BN, Suhas and Shing, Han-Chin and Xu, Lei and Strong, Mitch and Burnsky, Jon and Ofor, Jessica and Mason, Jordan R and Chen, Susan and Srinivasan, Sundararajan and Shivade, Chaitanya and Moriarty, Jack and Cohen, Joseph Paul},
booktitle={Interspeech},
year={2025},
organization={ISCA}
}

Note

This release contains only the expert annotations on the ACI Bench summaries. The LLM outputs could not be made public due to license issues.

Notability

notability 3.0/10

Low traction research repo

Amazon (Nova) has a repo signal matching data demand.

Data demand