amazon-science/IterativeListwiseReranking
Python
Captured source
source ↗amazon-science/IterativeListwiseReranking
Language: Python
License: Apache-2.0
Stars: 1
Forks: 0
Open issues: 0
Created: 2026-01-16T14:21:51Z
Pushed: 2026-01-16T15:12:02Z
Default branch: main
Fork: no
Archived: no
README:
Ranking with LLMs
This repository contains an advanced framework for iterative document ranking with Large Language Models (LLMs), including novel approaches for self-refinement and critic-based feedback mechanisms.
Overview
The project is built on top of the `rank_llm` library but we have further adapted their codebase. The project implements multiple ranking strategies in ranking/:
1. Multi-Pass Ranking: Iterative listwise reranking with LLMs 2. Iterative Self-Refinement: Multiple ranking iterations where the model refines its own output by first generating feedback on the given ranking and then outputting a new ranking based on this feedback. 3. Critic-Enhanced Ranking: Dual-model approach with a separate critic model providing structured or unstructed feedback. 4. Ground Truth-Guided Refinement: Critic model enhanced with relevance judgments for testing feedback potential.
Model Support
Built on top of and extending the [RankLLM](rank_llm/README.md) library, supporting:
- Models: RankZephyr, RankVicuna, Qwen, Gemma
- Inference Backend: vLLM
Configuration-Based Workflow
- YAML Configuration: Unified configuration system for all experiments
- Config Modes: Pre-configured setups for different ranking strategies:
ranking/config/reranking/: Single-pass ranking configurationsranking/config/self_refinement/: Iterative self-refinement setupsranking/config/refinement_with_critic/: Dual-model critic-based configurationszephyr/: Zephyr model configurations- Reproducibility: Configuration hashing ensures experiment tracking
Important: To get the hash for a config you can use get_config_hash.py. For more information on this function anf examples, refer to README_CONFIG_HASH.md.
Dataset Support
- Datasets: Support for Amazon Shopping Queries and a subset of hard queries (called test-data), FutureQueryEval and TREC DL19
- Query Difficulty Analysis: Tools for identifying and analyzing challenging queries in
data-analysis/.
Infrastructure
- Distributed Inference: SageMaker_GPU integration for large-scale experiments
- S3 Integration: Automatic data upload/download and result synchronization
- Batch Processing: Submit and evaluate multiple experiments efficiently
Quick Start
Configuration
Copy the example environment file and configure your settings:
cp .env.example .env # Edit .env with your configuration
Environment Variables:
RANKING_S3_BUCKET: Your S3 bucket name (optional, defaults to local storage)RANKING_S3_BASE_PATH: Base path within S3 bucket (default: "ranking/data")RANKING_LOCAL_DATA_PATH: Local data directory (default: "./data")
Installation
Pull and run the docker image: Login first aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin .dkr.ecr.us-east-1.amazonaws.com If you want to also build the image you need to log into this account, too, to pull the base image: aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com Then build: docker build -t .dkr.ecr.us-east-1.amazonaws.com/tczin/llm-rankers:ranking-sagemaker_gpu72 -f Dockerfile.ranking.sagemaker_gpu --push .
Then run the container: docker run --gpus all --shm-size 4GB -v path/to/LLMRanker/src/RankingWithLLMs/:/workspace -it .dkr.ecr.us-east-1.amazonaws.com/tczin/llm-rankers:ranking .dkr.ecr.us-east-1.amazonaws.com/tczin/llm-rankers:ranking-sagemaker_gpu72 bash Then connect to the container from VSCode. If you are using sagemaker_gpu, make sure that the submit_to_sagemaker_gpu.py script uses the right docker image.
Basic Usage
1. Multi-Pass Reranking
Recreate main results table by launching the jobs for all config files:
python batch_launch_sagemaker_gpu.py --model qwen gemma --dry-run
python ranking/cli.py --config ranking/config/reranking/dl19-default.yml
2. Iterative Self-Refinement
python ranking/cli.py --config ranking/config/self_refinement/dl19-default.yml
3. Critic-Enhanced Ranking
python ranking/cli.py --config ranking/config/refinement_with_critic/dl19_config.yml
4. Ground Truth-Guided Analysis
python ranking/cli.py --config ranking/config/refinement_with_critic/dl19_config_with_groundtruth.yml
Configuration Structure
Example configuration file:
# Model and ranking parameters reranking: enabled: true model: "zephyr" # or "qwen", "gemma", etc. batch_size: 4 # Iterative ranking settings iterative: enabled: true iterations: 3 dual_model: true # Enable critic model critic_model: "gemma" convergence_threshold: 0.1 # Dataset configuration data: dataset: "dl19" sample: false # Use full dataset # Evaluation evaluation: enabled: true # Prompt configuration prompts: template: "llm_prompt_with_structured_feedback"
Architecture
Ranking Pipeline
Input Documents ↓ [First Round Ranker] (e.g., RankZephyr) (Always uses rank_zephyr_template prompt) ↓ Initial Ranking ↓ [Iterative Refinement Loop] ├─→ [Critic Model] (optional) │ ↓ │ Feedback │ ↓ ├─→ [Ranker Model] │ ↓ ├─→ Refined Ranking ↓ Final Ranking
Key Components
1. RankRefiner (ranking/ranking/rank_refiner.py)
- Core iterative ranking engine
- Support for single and dual-model modes (the latter when using separate critic model)
- Can do ranking history tracking
2. Prompt Management (ranking/ranking/prompt_manager.py)
- Dynamic prompt loading and template management
- Support for different prompts per iteration
- Parameterized prompt generation
3. Configuration System (ranking/config/)
- YAML-based experiment configuration
- Configuration validation and hashing
- Template inheritance and overrides
4. Data Processing (ranking/data_processing/)
- Dataset loaders for multiple benchmarks
- Preprocessing pipelines
- S3 integration for data management
Advanced Features
Structured Critic Model Feedback
For Structured Critic Model Feedback use prompt: llm_prompt_with_structured_feedback and critic_prompt: structured_critic_analysis. Then the…
Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Low-star research repo from Amazon Science.