RepoAmazon (Nova)Amazon (Nova)published May 14, 2025seen 5d

amazon-science/MigrationBench

Python

Open original ↗

Captured source

source ↗
published May 14, 2025seen 5dcaptured 13hhttp 200method plain

amazon-science/MigrationBench

Language: Python

License: Apache-2.0

Stars: 14

Forks: 6

Open issues: 4

Created: 2025-05-14T22:11:19Z

Pushed: 2026-06-10T00:57:35Z

Default branch: main

Fork: no

Archived: no

README:

MigrationBench

  • [1. 📖 Overview](#1--overview)
  • [1.1 MigrationBench: Dataset and Evaluation Framework](#11-migrationbench-dataset-and-evaluation-framework)
  • [1.2 JavaMigration: Migration with LLMs](#12-javamigration-migration-with-llms)
  • [2. 🤗 MigrationBench Datasets](#2--migrationbench-datasets)
  • [3. Code Migration Evaluation](#3-code-migration-evaluation)
  • [3.1 Docker Mode (Recommended)](#31-docker-mode-recommended)

+ [3.1.1 Setup Docker](#311-setup-docker) + [3.1.2 Single Repository Evaluation](#312-single-repository-evaluation) + [3.1.3 Batch Evaluation](#313-batch-evaluation)

  • [3.2 Local Mode](#32-local-mode)

+ [3.2.1 Install Java and Maven](#321-install-java-and-maven) + [3.2.2 Install MigrationBench](#322-install-migrationbench) + [3.2.3 Single Repository Evaluation](#323-single-repository-evaluation) + [3.2.4 Batch Evaluation](#324-batch-evaluation)

  • [3.3 Predictions File Format](#33-predictions-file-format)
  • [4. 📚 Citation](#4--citation)

1. 📖 Overview

MigrationBench provides an automated and robust framework for evaluating code migration success.

1.1 MigrationBench: Dataset and Evaluation Framework

The name MigrationBench is used for both the dataset and the evaluation framework for code migration success:

1. 🤗 MigrationBench is a large-scale code migration benchmark dataset at the repository level, across multiple programming languages.

  • Current and initial release includes java 8 repositories with the maven build system, as of May 2025.

2. MigrationBench (current Github package) is the evaluation framework to assess code migration success, from java 8 to 17 or any other long-term support (LTS) versions.

The evaluation is an *approximation* for functional equivalence by checking the following: 1. The repo is able to build and pass all tests 1. Compiled classes' major versions are consistent with the target java version

  • 52 and 61 for java 8 and 17 respectively

1. Test methods are invariant after code migration 1. Number of test cases is non-decreasing after code migration 1. The repos' dependency libraries match their *latest* major versions

  • Optional for minimal migration by definition, while
  • Required for maximal migration

1.2 JavaMigration: Migration with LLMs

JavaMigration is a separate Github package to conduct code migration with LLMs as a baseline solution, and it relies on the current package for the final evaluation.

2. 🤗 MigrationBench Datasets

There are three datasets in 🤗 MigrationBench:

  • All repositories included in the datasets are available on GitHub, under the MIT or Apache-2.0 license.

| Index | Dataset | Size | Notes | |-------|-----------------------------------------------|-------|-----------------------------------------------------------------------------------------------------| | 1 | 🤗 `AmazonScience/migration-bench-java-full` | 5,102 | Each repo has a test directory or at least one test case | | 2 | 🤗 `AmazonScience/migration-bench-java-selected` | 300 | A subset of 🤗 `migration-bench-java-full` | | 3 | 🤗 `AmazonScience/migration-bench-java-utg` | 4,814 | The unit test generation (utg) dataset, disjoint with 🤗 `migration-bench-java-full`|

3. Code Migration Evaluation

MigrationBench supports two evaluation modes:

1. Docker Mode (Recommended): Runs evaluations in isolated Docker containers. No need to install Java or Maven locally. 2. Local Mode: Runs evaluations directly on your machine. Requires Java 17 and Maven 3.9.6 installed locally.

3.1 Docker Mode (Recommended)

Docker mode provides a consistent evaluation environment without requiring local Java/Maven installation. Each evaluation runs in an isolated container, making it ideal for batch processing and reproducible results.

Benefits:

  • ✅ No local Java/Maven installation needed
  • ✅ Consistent environment across different machines
  • ✅ Parallel execution (multiple containers)
  • ✅ Easy setup and onboarding

3.1.1 Setup

1. Install Docker:

Follow the official Docker installation guide:

  • macOS: https://docs.docker.com/desktop/install/mac-install/
  • Windows: https://docs.docker.com/desktop/install/windows-install/
  • Linux: https://docs.docker.com/engine/install/

2. Verify Docker:

docker --version

3. Install MigrationBench:

git clone https://github.com/amazon-science/MigrationBench.git
cd MigrationBench

pip install -r requirements.txt -e .

That's it! The Docker image will be built automatically on first run.

3.1.2 Single Repository Evaluation

To run a single repository evaluation in Docker:

GITHUB_URL=https://github.com/0xShamil/java-xid
GIT_DIFF_FILE=/path/to/java-xid.diff

python run_eval.py --github_url $GITHUB_URL --git_diff_filename $GIT_DIFF_FILE --use_docker

Evaluate with a migrated repository directory:

MIGRATED_DIR=/path/to/migrated/repo
python run_eval.py --github_url $GITHUB_URL --migrated_root_dir $MIGRATED_DIR --use_docker

Force rebuild the Docker image:

python run_eval.py --github_url $GITHUB_URL --git_diff_filename $GIT_DIFF_FILE --use_docker --build_docker_image

3.1.3 Batch Evaluation

To…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low stars, benchmark repo