RepoAmazon (Nova)Amazon (Nova)published Jan 22, 2026seen 5d

amazon-science/multilingual-faithfulness

Python

Open original ↗

Captured source

source ↗

amazon-science/multilingual-faithfulness

Language: Python

License: CC-BY-4.0

Stars: 0

Forks: 0

Open issues: 0

Created: 2026-01-22T15:45:01Z

Pushed: 2026-03-20T20:43:49Z

Default branch: main

Fork: no

Archived: no

README:

Multilingual Faithfulness

A framework for generating synthetic multilingual data to train faithfulness judges for text summarization.

Overview

This repository provides tools to:

  • Generate faithful and unfaithful summaries from multilingual datasets (WikiLingua)
  • Generate labeled training data for faithfulness judges using LLM-as-a-judge

Installation

Scripts run inside the official vLLM Docker container, which bundles compatible versions of vLLM, PyTorch, and Transformers.

docker pull vllm/vllm-openai:latest

Additional Python dependencies (installed inside the container):

pip install hydra-core omegaconf datasets

Project Structure

multilingual-faithfulness/
├── conf/ # Hydra configuration files
│ ├── config.yaml # Main configuration
│ └── task/ # Task-specific configs
│ ├── gen_data.yaml # Training data generation
│ └── gen_summs.yaml # Summary generation
├── data/ # Benchmark datasets (CSV)
│ ├── llm_aggrefact.csv
│ ├── mface.csv
│ └── memerag.csv
├── scripts/ # Executable scripts
│ ├── gen_data.py # Training data generation
│ └── gen_summs.py # Summary generation
├── src/ # Library modules
│ ├── data_loader.py # WikiLingua dataset loader
│ ├── gen_data.py # Data generation functions
│ ├── gen_summs.py # Summary generation functions
│ ├── corrupt.py # Summary corruption strategies
│ ├── llm_inference/ # LLM inference utilities (vLLM)
│ └── utils/ # Helper functions and prompts
├── bash_files/ # Example shell scripts
└── requirements.txt

Usage

All scripts should be run inside the vLLM Docker container:

docker run --gpus all --rm \
-v /path/to/repo:/workspace \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--ipc=host --entrypoint bash \
vllm/vllm-openai:latest -c \
"pip install hydra-core omegaconf datasets && \
cd /workspace && \
python3 scripts/.py "

1. Generate Summaries

Generate faithful and corrupted summaries from WikiLingua:

python3 scripts/gen_summs.py task=gen_summs \
model.base_llm=Qwen/Qwen3-4B-Instruct-2507 \
task.gen_summs.total_datapoints=14000 \
vllm.num_gpus=4 \
vllm.max_model_len=8192

2. Generate Training Data

Create labeled training data for the faithfulness judge:

python3 scripts/gen_data.py task=gen_data \
model.base_llm=Qwen/Qwen3-4B-Instruct-2507 \
task.data_gen.n_samples=1000 \
task.data_gen.summaries_path=./output/data/corrupt_v2 \
vllm.num_gpus=4 \
vllm.max_model_len=8192

Citations

If you use this work, please cite:

@inproceedings{alfano2026multilingual,
title = {Multilingual Self-Taught Faithfulness Evaluators},
author = {Carlo Alfano and Aymen Al Marjani and Zeno Jonke and Amin Mantrach and Saab Mansour and Marcello Federico},
year = {2026},
booktitle = {Findings of the Association for Computational Linguistics: EACL 2026}
}

Security

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.

License

This library is licensed under the CC-BY-4.0 License. See the [LICENSE](LICENSE) file.

Notability

notability 5.0/10

New research repo, substantive but no traction.