RepoAmazon (Nova)Amazon (Nova)published Jan 16, 2026seen 5d

amazon-science/IterativeListwiseReranking

Python

Open original ↗

Captured source

source ↗

amazon-science/IterativeListwiseReranking

Language: Python

License: Apache-2.0

Stars: 1

Forks: 0

Open issues: 0

Created: 2026-01-16T14:21:51Z

Pushed: 2026-01-16T15:12:02Z

Default branch: main

Fork: no

Archived: no

README:

Ranking with LLMs

This repository contains an advanced framework for iterative document ranking with Large Language Models (LLMs), including novel approaches for self-refinement and critic-based feedback mechanisms.

Overview

The project is built on top of the `rank_llm` library but we have further adapted their codebase. The project implements multiple ranking strategies in ranking/:

1. Multi-Pass Ranking: Iterative listwise reranking with LLMs 2. Iterative Self-Refinement: Multiple ranking iterations where the model refines its own output by first generating feedback on the given ranking and then outputting a new ranking based on this feedback. 3. Critic-Enhanced Ranking: Dual-model approach with a separate critic model providing structured or unstructed feedback. 4. Ground Truth-Guided Refinement: Critic model enhanced with relevance judgments for testing feedback potential.

Model Support

Built on top of and extending the [RankLLM](rank_llm/README.md) library, supporting:

  • Models: RankZephyr, RankVicuna, Qwen, Gemma
  • Inference Backend: vLLM

Configuration-Based Workflow

  • YAML Configuration: Unified configuration system for all experiments
  • Config Modes: Pre-configured setups for different ranking strategies:
  • ranking/config/reranking/: Single-pass ranking configurations
  • ranking/config/self_refinement/: Iterative self-refinement setups
  • ranking/config/refinement_with_critic/: Dual-model critic-based configurations
  • zephyr/: Zephyr model configurations
  • Reproducibility: Configuration hashing ensures experiment tracking

Important: To get the hash for a config you can use get_config_hash.py. For more information on this function anf examples, refer to README_CONFIG_HASH.md.

Dataset Support

  • Datasets: Support for Amazon Shopping Queries and a subset of hard queries (called test-data), FutureQueryEval and TREC DL19
  • Query Difficulty Analysis: Tools for identifying and analyzing challenging queries in data-analysis/.

Infrastructure

  • Distributed Inference: SageMaker_GPU integration for large-scale experiments
  • S3 Integration: Automatic data upload/download and result synchronization
  • Batch Processing: Submit and evaluate multiple experiments efficiently

Quick Start

Configuration

Copy the example environment file and configure your settings:

cp .env.example .env
# Edit .env with your configuration

Environment Variables:

  • RANKING_S3_BUCKET: Your S3 bucket name (optional, defaults to local storage)
  • RANKING_S3_BASE_PATH: Base path within S3 bucket (default: "ranking/data")
  • RANKING_LOCAL_DATA_PATH: Local data directory (default: "./data")

Installation

Pull and run the docker image: Login first aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin .dkr.ecr.us-east-1.amazonaws.com If you want to also build the image you need to log into this account, too, to pull the base image: aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com Then build: docker build -t .dkr.ecr.us-east-1.amazonaws.com/tczin/llm-rankers:ranking-sagemaker_gpu72 -f Dockerfile.ranking.sagemaker_gpu --push .

Then run the container: docker run --gpus all --shm-size 4GB -v path/to/LLMRanker/src/RankingWithLLMs/:/workspace -it .dkr.ecr.us-east-1.amazonaws.com/tczin/llm-rankers:ranking .dkr.ecr.us-east-1.amazonaws.com/tczin/llm-rankers:ranking-sagemaker_gpu72 bash Then connect to the container from VSCode. If you are using sagemaker_gpu, make sure that the submit_to_sagemaker_gpu.py script uses the right docker image.

Basic Usage

1. Multi-Pass Reranking

Recreate main results table by launching the jobs for all config files:

python batch_launch_sagemaker_gpu.py --model qwen gemma --dry-run
python ranking/cli.py --config ranking/config/reranking/dl19-default.yml

2. Iterative Self-Refinement

python ranking/cli.py --config ranking/config/self_refinement/dl19-default.yml

3. Critic-Enhanced Ranking

python ranking/cli.py --config ranking/config/refinement_with_critic/dl19_config.yml

4. Ground Truth-Guided Analysis

python ranking/cli.py --config ranking/config/refinement_with_critic/dl19_config_with_groundtruth.yml

Configuration Structure

Example configuration file:

# Model and ranking parameters
reranking:
enabled: true
model: "zephyr" # or "qwen", "gemma", etc.
batch_size: 4

# Iterative ranking settings
iterative:
enabled: true
iterations: 3
dual_model: true # Enable critic model
critic_model: "gemma"
convergence_threshold: 0.1

# Dataset configuration
data:
dataset: "dl19"
sample: false # Use full dataset

# Evaluation
evaluation:
enabled: true

# Prompt configuration
prompts:
template: "llm_prompt_with_structured_feedback"

Architecture

Ranking Pipeline

Input Documents
↓
[First Round Ranker] (e.g., RankZephyr) (Always uses rank_zephyr_template prompt)
↓
Initial Ranking
↓
[Iterative Refinement Loop]
├─→ [Critic Model] (optional)
│ ↓
│ Feedback
│ ↓
├─→ [Ranker Model]
│ ↓
├─→ Refined Ranking
↓
Final Ranking

Key Components

1. RankRefiner (ranking/ranking/rank_refiner.py)

  • Core iterative ranking engine
  • Support for single and dual-model modes (the latter when using separate critic model)
  • Can do ranking history tracking

2. Prompt Management (ranking/ranking/prompt_manager.py)

  • Dynamic prompt loading and template management
  • Support for different prompts per iteration
  • Parameterized prompt generation

3. Configuration System (ranking/config/)

  • YAML-based experiment configuration
  • Configuration validation and hashing
  • Template inheritance and overrides

4. Data Processing (ranking/data_processing/)

  • Dataset loaders for multiple benchmarks
  • Preprocessing pipelines
  • S3 integration for data management

Advanced Features

Structured Critic Model Feedback

For Structured Critic Model Feedback use prompt: llm_prompt_with_structured_feedback and critic_prompt: structured_critic_analysis. Then the…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low-star research repo from Amazon Science.