amazon-science/post-training-industry-specialized-small-reasoning-models
Captured source
source ↗amazon-science/post-training-industry-specialized-small-reasoning-models
License: MIT-0
Stars: 2
Forks: 0
Open issues: 0
Created: 2025-10-27T23:19:18Z
Pushed: 2025-10-28T05:28:16Z
Default branch: main
Fork: no
Archived: no
README:
Dataset for Efficient Post-Training for Industry-Specialized Reasoning in Small Language Models
Overview
This repository contains the dataset used in our research on efficient post-training techniques for enhancing industry-specialized reasoning capabilities in small language models.
Dataset Summary
This is a reasoning trace dataset based on 6251 samples in the training set of FinQA. The total number of the samples in our dataset is 187506. Three different methods were applied for generating reasoning traces: 62501 samples have the reasoning traces generated by DeepSeek-R1, 62504 samples have those by DeepSeek-R1 Distill Qwen 1.5B, and the reasoning traces of the rest 65021 samples are the DeepSeek-R1 reasoning traces summarized by DeepSeek-R1. To generate the traces, the inferences were executed 10 times per sample of FinQA per trace generation method, and some results from the failed inferences were removed.
Data Fields
example_id(string): Unique ID for each sample in this dataset. (0, 1, ..., 187505.)sample_id(string): ID for distinguishing up to 10 samples generated from a given FinQA sample by a given trace generation method. (0, 1, ..., 9.)id(string): This column is identical to the ID column \texttt{id} in the original FinQA dataset. This column can be used to associate a sample in our dataset with FinQA's.question(string): Question statement prepared by combining the original question with its context information in FinQA.answer(string): Original answer in FinQA.llm_answer(string): Answer generated by DeepSeek-R1 or Distill Qwen 1.5B.llm_reasoning(string): Reasoning trace generated by DeepSeek-R1 or Distill Qwen 1.5B (including the one summarized by DeepSeek R1).is_correct(bool): Whetherllm_answeris correct or not. (TrueorFalse.) Note that FinQA includes 48 samples with the empty string in the answer column. We set tois_correct = Nonefor such samples.model(string): Trace generation model. (deepseek_r1orqwen_1_5b.)is_summarized(bool): If the trace is summarized one or not. (TrueorFalse.)
Security
See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.
License
This library is licensed under the MIT-0 License. See the LICENSE file.
Citation
@article{cai2025efficientindustryspecializedft,
title={Efficient Post-Training for Industry-Specialized Reasoning in Small Language Models},
author={Bill Cai, Sheldon Liu, Tatsuo Azeyanagi, and Tomal Deb},
year={2025}
}
@article{chen2021finqa,
title={FinQA: A Dataset of Numerical Reasoning over Financial Data},
author={Chen, Zhiyu and Chen, Wenhu and Smiley, Charese and Shah, Sameena and Borova, Iana and Langdon, Dylan and Moussa, Reema and Beane, Matt and Huang, Ting-Hao and Routledge, Bryan and Wang, William Yang},
journal={Proceedings of EMNLP 2021},
year={2021}
}
@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
author={DeepSeek-AI},
year={2025},
eprint={2501.12948},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.12948},
}Notability
notability 2.0/10Very low traction, new repo