amazon-science/acibench-hallucination-annotations
Captured source
source ↗amazon-science/acibench-hallucination-annotations
Description: Expert hallucination labels from the ACI-bench dataset
License: CC-BY-4.0
Stars: 7
Forks: 0
Open issues: 0
Created: 2025-05-23T16:45:26Z
Pushed: 2025-05-23T16:51:49Z
Default branch: main
Fork: no
Archived: no
README:
Natural Hallucination Dataset - ACI-bench Clinical Note Hallucination Annotations
This repository contains expert-annotated hallucination labels from the ACI-bench dataset for evaluating hallucination detection in medical text summarization.
Dataset Overview
The Natural Hallucination (NH) dataset contains expert annotations of hallucinations in clinical summaries, focused on SOAP notes from the ACI-bench collection of clinical conversations.
Annotation Categories & Counts
Expert clinical scribes annotated statements into 4 categories with the following distribution:
- No Error: 12,365
- Hallucination: 106
- Inference: 87
- Misunderstanding: 72
Error Severity Distribution
The errors were classified by severity:
- Low Severity: 138
- High Severity: 87
- Not Medically Relevant (NMR): 40
High Severity Categories
The following categories are marked as high severity errors:
- Diagnosis
- Exam Findings
- Lab Testing and Imaging
- Medical History
- Symptoms
- Treatment Plan
Age & Sex errors are considered low severity.
Dataset Format
The released dataset contains:
- Original ACI-bench conversation transcripts
- Expert annotations of factual errors marked by category
- Severity labels for each error
- Aggregated error scores per subject
Usage
The annotations can be used to:
- Evaluate hallucination detection methods
- Analyze different types of factual errors in clinical summarization
- Study high vs low severity errors in medical text generation
Citation
If you use this dataset, please cite:
Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization. BN, S., Shing, H.-C., Xu, L., Strong, M., Burnsky, J., Ofor, J., Mason, J. R., Chen, S., Srinivasan, S., Shivade, C., Moriarty, J., & Cohen, J. P. Interspeech 2025
@inproceedings{BN2024fact,
title={Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization},
author={BN, Suhas and Shing, Han-Chin and Xu, Lei and Strong, Mitch and Burnsky, Jon and Ofor, Jessica and Mason, Jordan R and Chen, Susan and Srinivasan, Sundararajan and Shivade, Chaitanya and Moriarty, Jack and Cohen, Joseph Paul},
booktitle={Interspeech},
year={2025},
organization={ISCA}
}Note
This release contains only the expert annotations on the ACI Bench summaries. The LLM outputs could not be made public due to license issues.
Notability
notability 3.0/10Low traction research repo
Amazon (Nova) has a repo signal matching data demand.