RepoAmazon (Nova)Amazon (Nova)published Oct 14, 2025seen 5d

amazon-science/toward-clinical-coding-verification-adaptation

Python

Open original ↗

Captured source

source ↗

amazon-science/toward-clinical-coding-verification-adaptation

Language: Python

License: NOASSERTION

Stars: 8

Forks: 1

Open issues: 0

Created: 2025-10-14T06:57:54Z

Pushed: 2025-10-14T07:20:39Z

Default branch: main

Fork: no

Archived: no

README:

Toward Reliable Clinical Coding with Language Models: Verification and Lightweight Adaptation

This repository contains ICD-10-CM annotations for the paper: "Toward Reliable Clinical Coding with Language Models: Verification and Lightweight Adaptation", EMNLP 2025, Industry Track

Dataset

This repository only contains the double expert-annotated ICD-10-CM annotations used for the paper. To derive the full training and testing data, including the corresponding notes, please follow these steps below:

1. Download https://github.com/wyim/aci-bench to ACI_BENCH_PATH 2. Run

python merge_aci_annotations.py --aci_data_dir ${ACI_BENCH_PATH}/data/challenge_data --annotation_dir annotation --output_dir merged_data

3. In merged_data, you should find:

JSONL files with merged data:

  • train.jsonl (67 records)
  • valid.jsonl (20 records)
  • test.jsonl (120 records - combines all test files)

Each record contains dialogue, clinical note, and associated ICD10 codes.

Citation

If you find this data useful or if you use this for research and development, please cite

@inproceedings{toward-reliable-clinical-coding-verification-adaptation,
title = "Toward Reliable Clinical Coding with Language Models: Verification and Lightweight Adaptation",
author = "Yuan, Zhangdie and
Shing, Han-Chin and
Strong, Mitch and
Shivade, Chaitanya",
booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track",
publisher = "Association for Computational Linguistics",
}

License

This library is licensed under the CC-BY-NC-4.0 License.

Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.

SPDX-License-Identifier: CC-BY-NC-4.0

Notability

notability 3.0/10

Low stars, routine research repo