meta-llama/prompt-ops
Python
Captured source
source ↗meta-llama/prompt-ops
Description: An open-source tool for LLM prompt optimization.
Language: Python
License: MIT
Stars: 820
Forks: 127
Open issues: 21
Created: 2025-03-14T17:59:40Z
Pushed: 2026-04-21T18:28:47Z
Default branch: main
Fork: no
Archived: no
README: Prompt Ops
🎉 New: Prompt Duel Optimizer (PDO) Published!
We've published a new paper on PDO (Prompt Duel Optimizer) - an efficient label-free prompt optimization method using dueling bandits and Thompson sampling. PDO achieves state-of-the-art results on BIG-bench Hard and MS MARCO benchmarks.
📄 Read the paper: LLM Prompt Duel Optimizer: Efficient Label-Free Prompt Optimization (arXiv:2510.13907)
🧪 Try it yourself: Check out the [Web of Lies use case](use-cases/web-of-lies-pdo/) demonstrating PDO on logical reasoning tasks
⭐ Star this repo and follow along - we'll be publishing a detailed tutorial notebook soon!
---
What is prompt-ops?
prompt-ops is a Python package that automatically optimizes prompts for Llama models. It transforms prompts that work well with other LLMs into prompts that are optimized for LLM models, improving performance and reliability.
Key Benefits:
- No More Trial and Error: Stop manually tweaking prompts to get better results
- Fast Optimization: Get model-optimized prompts in minutes with template-based optimization
- Data-Driven Improvements: Use your own examples to create prompts that work for your specific use case
- Measurable Results: Evaluate prompt performance with customizable metrics
Requirements
To get started with prompt-ops, you'll need:
- Existing System Prompt: Your existing system prompt that you want to optimize
- Existing Query-Response Dataset: A JSON file containing query-response pairs (as few as 50 examples) for evaluation and optimization (see [prepare your dataset](#preparing-your-data) below)
- Configuration File: A YAML configuration file (config.yaml) specifying model hyperparameters, and optimization details (see [example configuration](configs/facility-simple.yaml))
How It Works
┌──────────────────────────┐ ┌──────────────────────────┐ ┌────────────────────┐ │ Existing System Prompt │ │ set(query, responses) │ │ YAML Configuration │ └────────────┬─────────────┘ └─────────────┬────────────┘ └───────────┬────────┘ │ │ │ │ │ │ ▼ ▼ ▼ ┌────────────────────────────────────────────────────────────────────┐ │ prompt-ops migrate │ └────────────────────────────────────────────────────────────────────┘ │ │ ▼ ┌──────────────────────┐ │ Optimized Prompt │ └──────────────────────┘
Simple Workflow
1. Start with your existing system prompt: Take your existing system prompt that works with other LLMs (see [example prompt](use-cases/facility-support-analyzer/facility_prompt_sys.txt)) 2. [Prepare your dataset](#preparing-your-data): Create a JSON file with query-response pairs for evaluation and optimization 3. Configure optimization: Set up a simple YAML file with your dataset and preferences (see [example configuration](configs/facility-simple.yaml)) 4. [Run optimization](#step-4-run-optimization): Execute a single command to transform your prompt 5. [Get results](#prompt-transformation-example): Receive a model-optimized prompt with performance metrics
Real-world Results
HotpotQA
These results were measured on the HotpotQA multi-hop reasoning benchmark, which tests a model's ability to answer complex questions requiring information from multiple sources. Our optimized prompts showed substantial improvements over baseline prompts across different model sizes.
Quick Start (5 minutes)
Step 1: Installation
> Note: We recommend installing from source as we are currently transitioning package names on PyPI. This ensures you get the latest stable version without any naming conflicts.
# Create a virtual environment conda create -n prompt-ops python=3.10 conda activate prompt-ops # Recommended: Install from source git clone https://github.com/meta-llama/prompt-ops.git cd prompt-ops pip install -e . # Alternative: Install from PyPI (may have naming transition issues, still on version 0.0.7) # pip install llama-prompt-ops
Step 2: Create a sample project
This will create a directory called my-project with a sample configuration and dataset in the current folder.
prompt-ops create my-project cd my-project
Step 3: Set Up Your API Key
Add your API key to the .env file:
OPENROUTER_API_KEY=your_key_here
prompt-ops uses LiteLLM as a unified API client. LiteLLM automatically detects the provider from your model name (e.g., openrouter/model, groq/model) and looks for the corresponding provider-specific environment variable (OPENROUTER_API_KEY, GROQ_API_KEY, etc.). For more inference provider options, see [Inference Providers](./docs/inference_providers.md).
Step 4: Run Optimization
The optimization will take about 5 minutes.
prompt-ops migrate # defaults to config.yaml if --config not specified
Done! The optimized prompt will be saved to the results directory with performance metrics comparing the original and optimized versions.
To read more about this use case, we go into more detail in [Basic Tutorial](./docs/basic/readme.md).
Prompt Transformation Example
Below is an example of a transformed system prompt from proprietary LM to Llama:
| Original Proprietary LM Prompt | Optimized Llama Prompt | | --- | --- | | You are a helpful assistant. Extract and return a JSON with the following keys and values:
1. "urgency": one of high, medium, low 2. "sentiment": one of negative, neutral, positive 3. "categories": Create a dictionary with categories as keys and boolean values (True/False), where the value indicates whether the category matches tags like emergency_repair_services, routine_maintenance_requests, etc.
Your complete message should be a valid JSON string that can be read directly. | You are an expert in analyzing customer service messages. Your task is to categorize the following message based on urgency, sentiment, and relevant categories.
Analyze the message and return a JSON object with these fields:
1. "urgency": Classify as "high", "medium", or "low" based on how quickly this needs attention 2. "sentiment": Classify as "negative", "neutral", or "positive" based on the…
Excerpt shown — open the source for the full document.
Notability
notability 6.0/10Notable new repo from Meta, moderate stars