amazon-science/MigrationBench
Python
Captured source
source ↗amazon-science/MigrationBench
Language: Python
License: Apache-2.0
Stars: 14
Forks: 6
Open issues: 4
Created: 2025-05-14T22:11:19Z
Pushed: 2026-06-10T00:57:35Z
Default branch: main
Fork: no
Archived: no
README:
MigrationBench
- [1. 📖 Overview](#1--overview)
- [1.1 MigrationBench: Dataset and Evaluation Framework](#11-migrationbench-dataset-and-evaluation-framework)
- [1.2 JavaMigration: Migration with LLMs](#12-javamigration-migration-with-llms)
- [2. 🤗 MigrationBench Datasets](#2--migrationbench-datasets)
- [3. Code Migration Evaluation](#3-code-migration-evaluation)
- [3.1 Docker Mode (Recommended)](#31-docker-mode-recommended)
+ [3.1.1 Setup Docker](#311-setup-docker) + [3.1.2 Single Repository Evaluation](#312-single-repository-evaluation) + [3.1.3 Batch Evaluation](#313-batch-evaluation)
- [3.2 Local Mode](#32-local-mode)
+ [3.2.1 Install Java and Maven](#321-install-java-and-maven) + [3.2.2 Install MigrationBench](#322-install-migrationbench) + [3.2.3 Single Repository Evaluation](#323-single-repository-evaluation) + [3.2.4 Batch Evaluation](#324-batch-evaluation)
- [3.3 Predictions File Format](#33-predictions-file-format)
- [4. 📚 Citation](#4--citation)
1. 📖 Overview
MigrationBench provides an automated and robust framework for evaluating code migration success.
1.1 MigrationBench: Dataset and Evaluation Framework
The name MigrationBench is used for both the dataset and the evaluation framework for code migration success:
1. 🤗 MigrationBench is a large-scale code migration benchmark dataset at the repository level, across multiple programming languages.
- Current and initial release includes
java 8repositories with themavenbuild system, as of May 2025.
2. MigrationBench (current Github package) is the evaluation framework to assess code migration success, from java 8 to 17 or any other long-term support (LTS) versions.
The evaluation is an *approximation* for functional equivalence by checking the following: 1. The repo is able to build and pass all tests 1. Compiled classes' major versions are consistent with the target java version
52and61forjava 8and17respectively
1. Test methods are invariant after code migration 1. Number of test cases is non-decreasing after code migration 1. The repos' dependency libraries match their *latest* major versions
- Optional for minimal migration by definition, while
- Required for maximal migration
1.2 JavaMigration: Migration with LLMs
JavaMigration is a separate Github package to conduct code migration with LLMs as a baseline solution, and it relies on the current package for the final evaluation.
2. 🤗 MigrationBench Datasets
There are three datasets in 🤗 MigrationBench:
- All repositories included in the datasets are available on GitHub, under the
MITorApache-2.0license.
| Index | Dataset | Size | Notes | |-------|-----------------------------------------------|-------|-----------------------------------------------------------------------------------------------------| | 1 | 🤗 `AmazonScience/migration-bench-java-full` | 5,102 | Each repo has a test directory or at least one test case | | 2 | 🤗 `AmazonScience/migration-bench-java-selected` | 300 | A subset of 🤗 `migration-bench-java-full` | | 3 | 🤗 `AmazonScience/migration-bench-java-utg` | 4,814 | The unit test generation (utg) dataset, disjoint with 🤗 `migration-bench-java-full`|
3. Code Migration Evaluation
MigrationBench supports two evaluation modes:
1. Docker Mode (Recommended): Runs evaluations in isolated Docker containers. No need to install Java or Maven locally. 2. Local Mode: Runs evaluations directly on your machine. Requires Java 17 and Maven 3.9.6 installed locally.
3.1 Docker Mode (Recommended)
Docker mode provides a consistent evaluation environment without requiring local Java/Maven installation. Each evaluation runs in an isolated container, making it ideal for batch processing and reproducible results.
Benefits:
- ✅ No local Java/Maven installation needed
- ✅ Consistent environment across different machines
- ✅ Parallel execution (multiple containers)
- ✅ Easy setup and onboarding
3.1.1 Setup
1. Install Docker:
Follow the official Docker installation guide:
- macOS: https://docs.docker.com/desktop/install/mac-install/
- Windows: https://docs.docker.com/desktop/install/windows-install/
- Linux: https://docs.docker.com/engine/install/
2. Verify Docker:
docker --version
3. Install MigrationBench:
git clone https://github.com/amazon-science/MigrationBench.git cd MigrationBench pip install -r requirements.txt -e .
That's it! The Docker image will be built automatically on first run.
3.1.2 Single Repository Evaluation
To run a single repository evaluation in Docker:
GITHUB_URL=https://github.com/0xShamil/java-xid GIT_DIFF_FILE=/path/to/java-xid.diff python run_eval.py --github_url $GITHUB_URL --git_diff_filename $GIT_DIFF_FILE --use_docker
Evaluate with a migrated repository directory:
MIGRATED_DIR=/path/to/migrated/repo python run_eval.py --github_url $GITHUB_URL --migrated_root_dir $MIGRATED_DIR --use_docker
Force rebuild the Docker image:
python run_eval.py --github_url $GITHUB_URL --git_diff_filename $GIT_DIFF_FILE --use_docker --build_docker_image
3.1.3 Batch Evaluation
To…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Low stars, benchmark repo