amazon-science/QualityFlow
Python
Captured source
source ↗amazon-science/QualityFlow
Language: Python
License: NOASSERTION
Stars: 2
Forks: 0
Open issues: 0
Created: 2025-08-28T17:53:51Z
Pushed: 2025-08-28T17:57:36Z
Default branch: main
Fork: no
Archived: no
README:
QualityFlow
Abstract
We introduce QualityFlow, a dynamic agentic workflow for program synthesis. Given the English description of a programming problem and a set of unit tests, the model's goal is to synthesize the correct program that solves the problem and passes the tests. QualityFlow includes large language model (LLM) agents resembling a software development team, including code generation, testing, and self-debugging. We propose the LLM Quality Checker, which explicitly ``imagines'' whether the synthesized programs' execution would conform to the unit tests. The Quality Checks dynamically control the workflow, including actions to submit the final answer, clarify the problem statement, and revert previous workflow steps. Our experiments show that the Quality Checker can precisely accept any correct program, mitigate faulty synthesized tests, and prevent potential workflow deviation. QualityFlow establishes the state-of-the-art results on four program synthesis benchmarks: MBPP, HumanEval, and stricter evaluations from MBPP-EvalPlus and HumanEval-EvalPlus.
Paper
QualityFlow: An Agentic Workflow for Program Synthesis Controlled by LLM Quality Checks https://arxiv.org/pdf/2501.17167
Setup instruction
Install dependencies
conda create -n agentic python=3.12 conda activate agentic pip3 install openai boto3 tqdm pandas datasets sqlalchemy anthropic psutil transformers deepdiff seaborn pymysql jsonlines pip3 install simple_parsing scikit-learn pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Install mxeval: https://github.com/amazon-science/mxeval Install codegeex: https://github.com/THUDM/CodeGeeX
Set your Anthropic API KEY as environment variable
export ANTHROPIC_API_KEY=[YOUR KEY HERE]
Add CodeGeeX to your python path
export PYTHONPATH=.:lib/CodeGeeX/
How to run
For example, running QualityFlow on MBPP:
### without cacher (after installation above, this command should run directly) python run_sota_mbpp_without_cacher.py ### with MariaDB cacher python run_sota_mbpp.py
Cacher is a MariaDB server. Set up your own server with address, user, password at class MariaDBCacherFactory. The cacher will cache all LLM API calls to save time and money, allowing rerun or continuation of past experiments. You can disable cache in command line, which allows you to run this project without database setup.
For example, running custom experiments through commandline
### disable cacher python run_args.py --global_model opus --dataset humaneval --use_cache False ### use MariaDB cacher python run_args.py --global_model opus --dataset humaneval
The experiments to reproduce the paper are in
run_paper_experiments.py
Requirements.txt is provided for ubuntu 20.04, Nvidia A10G, but it's advised to install the libraries manually following instructions above without relying on requirements.txt.
Install MariaDB cacher
sudo apt update sudo apt install mariadb-server sudo systemctl start mariadb sudo systemctl enable mariadb sudo mysql_secure_installation sudo mysql -u root -p
Create DB
sudo mysql -u root -p CREATE DATABASE your_db_name; CREATE USER 'your_username'@'localhost' IDENTIFIED BY 'your_password'; GRANT ALL PRIVILEGES ON your_db_name.* TO 'your_username'@'localhost'; FLUSH PRIVILEGES; EXIT;
Testing
mysql -u your_username -p your_db_name
Project structure
[qualityflow](qualityflow) contains the source code
[workflow.py](qualityflow/workflow.py) contains the QualityFlow workflow
[step1_new_programmer.py](qualityflow/step1_programmer.py) is the QualityFlow programmer
[step2_new_test_designer.py](qualityflow/step2_test_designer.py) is the QualityFlow test designer
[step3_self_debug.py](qualityflow/step3_self_debug.py) is the self-debugger
[com1_code_quality_checker.py](qualityflow/com1_code_quality_checker.py) is the code qualty checker
[com2_reinterpretation.py](qualityflow/com2_clarifier.py) is the clarifier and re-interpreter
[com3_test_quality_checker.py](qualityflow/com3_test_quality_checker.py) is the optional test quality checker
Release notes
Oct 14, 2024
Initial preparation for public release
Feb 14, 2025
ACL submission code base
July 2, 2025
AAAI submission code base
Citation
@article{hu2025qualityflow,
title={Qualityflow: An agentic workflow for program synthesis controlled by llm quality checks},
author={Hu, Yaojie and Zhou, Qiang and Chen, Qihong and Li, Xiaopeng and Liu, Linbo and Zhang, Dejiao and Kachroo, Amit and Oz, Talha and Tripp, Omer},
journal={arXiv preprint arXiv:2501.17167},
year={2025}
}Notability
notability 3.0/10Low traction new repo by Amazon