RepoAmazon (Nova)Amazon (Nova)published Oct 28, 2025seen 5d

amazon-science/MultiTurnAgentAttack

Python

Open original ↗

Captured source

source ↗

amazon-science/MultiTurnAgentAttack

Language: Python

License: NOASSERTION

Stars: 12

Forks: 7

Open issues: 9

Created: 2025-10-28T18:39:56Z

Pushed: 2026-03-31T13:13:30Z

Default branch: main

Fork: no

Archived: no

README:

MultiTurnAgentAttack

This repository contains data and code for the STAC (Sequential Tool Attack Chaining) framework, which generates and evaluates multi-turn adversarial attacks against LLM agents in tool-use environments.

📄 Paper: STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents

Quick Start

Just want to evaluate your model on our STAC Benchmark?

If you only want to evaluate your model on our pre-generated STAC Benchmark (483 test cases), you can skip the full pipeline and directly run:

# Set up environment
conda env create -f environment.yml
conda activate STAC
export OPENAI_API_KEY="your-openai-api-key-here" # For planner and judge models

# Run evaluation
python -m STAC_eval.eval_STAC_benchmark \
--model_agent gpt-4.1 \
--defense no_defense \
--batch_size 512

Input: data/STAC_benchmark_data.json (483 test cases from both SHADE-Arena and Agent-SafetyBench)

Output: Evaluation results in data/Eval/{model_planner}/{model_agent}/{defense}/gen_res.json

The benchmark automatically handles both SHADE-Arena and Agent-SafetyBench test cases. Skip to the [Benchmark Evaluation](#benchmark-evaluation-stac_eval) section for more details.

---

Full Setup

Prerequisites

Clone the repository and set up the conda environment on a Linux machine:

# Clone the repository
cd MultiTurnAgentAttack-main

# Create and activate conda environment
conda env create -f environment.yml
conda activate STAC

Alternative Setup (pip/venv)

If you prefer using pip instead of conda:

# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Environment Variables

Before running any scripts, configure the required API keys:

OpenAI API (Required for most steps)

export OPENAI_API_KEY="your-openai-api-key-here"

Hugging Face (Required for HF models)

export HF_TOKEN="your-hf-token-here"

AWS Bedrock API (Required for AWS models only)

export AWS_ACCESS_KEY_ID="your-aws-access-key-id"
export AWS_SECRET_ACCESS_KEY="your-aws-secret-access-key"
export AWS_SESSION_TOKEN="your-aws-session-token"

Note: AWS session tokens expire every 12 hours and must be refreshed regularly.

---

Repository Structure

MultiTurnAgentAttack-main/
├── STAC_gen/ # Full STAC attack generation pipeline
│ ├── step_1_gen_tool_chains.py # Generate tool chain attacks
│ ├── step_2_verify_tool_chains.py # Verify generated attacks
│ ├── step_3_reverse_engineer_prompts.py # Generate adversarial prompts
│ └── step_4_eval_adaptive_planning.py # Evaluate with adaptive planning
│
├── STAC_eval/ # Benchmark evaluation
│ └── eval_STAC_benchmark.py # Evaluate models on STAC benchmark
│
├── data/
│ └── STAC_benchmark_data.json # Pre-generated benchmark (483 cases)
│
├── Agent_SafetyBench/ # Agent-SafetyBench environments
├── SHADE_Arena/ # SHADE-Arena environments
│
├── src/ # Core implementation
│ ├── Agents.py
│ ├── Environments.py
│ ├── LanguageModels.py
│ ├── STAC.py
│ └── utils.py
│
└── prompts/ # System prompts for all components

---

Overview

The repository provides two main usage modes:

1. Benchmark Evaluation (STAC_eval/)

Evaluate your model on our pre-generated STAC Benchmark containing 483 test cases across SHADE-Arena and Agent-SafetyBench environments. This is the recommended starting point for most users.

2. Full Automated Attack Generation Pipeline (STAC_gen/)

Generate new attacks from scratch using the complete 4-step STAC pipeline. This is useful for:

  • Creating attacks for new environments
  • Experimenting with different attack generation strategies
  • Extending the benchmark with additional test cases

---

Benchmark Evaluation (STAC_eval)

Quick Evaluation

Evaluate your agent against the STAC Benchmark with a single command:

python -m STAC_eval.eval_STAC_benchmark \
--model_agent gpt-4.1 \
--defense no_defense \
--batch_size 512

Parameters

  • --input_path: Path to benchmark data (default: data/STAC_benchmark_data.json)
  • --output_dir: Output directory (default: data/Eval)
  • --model_planner: Model for attack planning (default: gpt-4.1)
  • --model_judge: Model for evaluation (default: gpt-4.1)
  • --model_agent: Your model to evaluate (default: gpt-4.1)
  • --temperature: Sampling temperature (default: 0.0)
  • --top_p: Nucleus sampling parameter (default: 0.95)
  • --batch_size: Batch size for evaluation (default: 1)
  • --region: AWS region for Bedrock API (default: us-west-2)
  • --max_n_turns: Maximum conversation turns (default: 3)
  • --defense: Defense mechanism to evaluate (default: no_defense)

Defense Mechanisms

Evaluate different defense strategies:

# No defense (baseline)
python -m STAC_eval.eval_STAC_benchmark --model_agent gpt-4.1 --defense no_defense

# Spotlighting (datamarking user prompts)
python -m STAC_eval.eval_STAC_benchmark --model_agent gpt-4.1 --defense spotlighting

# Failure mode awareness
python -m STAC_eval.eval_STAC_benchmark --model_agent gpt-4.1 --defense failure_modes

# User intent summarization
python -m STAC_eval.eval_STAC_benchmark --model_agent gpt-4.1 --defense summarization

# Harm-benefit reasoning
python -m STAC_eval.eval_STAC_benchmark --model_agent gpt-4.1 --defense reasoning

Evaluating Different Models

OpenAI Models

Requirements: OPENAI_API_KEY

# GPT-4.1
python -m STAC_eval.eval_STAC_benchmark --model_agent gpt-4.1 --batch_size 512

# GPT-4.1-mini
python -m STAC_eval.eval_STAC_benchmark --model_agent gpt-4.1-mini --batch_size 512

AWS Bedrock Models

Requirements: AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN

# Llama 3.3 70B
python -m STAC_eval.eval_STAC_benchmark \
--model_agent us.meta.llama3-3-70b-instruct-v1:0 \
--batch_size 10

# Llama 3.1 405B
python -m STAC_eval.eval_STAC_benchmark \
--model_agent meta.llama3-1-405b-instruct-v1:0 \
--batch_size 10

Hugging Face Models

Requirements: GPU access (tested on H100s), HF_TOKEN

# Request GPU node
# e.g., on…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low stars, routine repo release.