RepoAmazon (Nova)Amazon (Nova)published Jul 28, 2025seen 5d

amazon-science/Cyber-Zero

Python

Open original ↗

Captured source

source ↗
published Jul 28, 2025seen 5dcaptured 8hhttp 200method plain

amazon-science/Cyber-Zero

Description: Cyber-Zero: Training Cybersecurity Agents Without Runtime

Language: Python

License: NOASSERTION

Stars: 94

Forks: 17

Open issues: 38

Created: 2025-07-28T17:57:55Z

Pushed: 2026-02-13T14:29:44Z

Default branch: main

Fork: no

Archived: no

README:

Cyber-Zero: Training Cybersecurity Agents without Runtime

🧐 Overview | 🏆 Benchmark Suite | 🚀 Quick Start | 🏗️ Architecture | ⚙️ Configuration | 📊 Generation | 📝 Validation | 📝 CLI Interface | 📝 Citation

Cyber-Zero is a comprehensive framework for training cybersecurity agents without requiring runtime execution environments.

Overview

Large Language Models (LLMs) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, such environments are often unavailable in cybersecurity domains where challenge configurations and execution contexts are ephemeral or restricted. Cyber-Zero addresses this fundamental limitation by leveraging publicly available CTF writeups and employing persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual execution environments.

The key innovation is generating high-quality training trajectories through LLM simulation rather than requiring actual execution environments, making it scalable and practical for training cybersecurity agents. Using trajectories synthesized by Cyber-Zero, we train LLM-based agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench.

Benchmark Suite

To democratize the evaluation of cybersecurity agents, we provide three repaired benchmark suites adapted for EnIGMA+ in Cyber-Zero:

  • [InterCode-CTF](https://github.com/princeton-nlp/intercode/tree/master/data/ctf) - A comprehensive collection of CTF challenges covering various cybersecurity domains
  • [NYU CTF Bench](https://nyu-llm-ctf.github.io/) - NYU's curated benchmark suite for evaluating LLM-based CTF solving capabilities
  • [Cybench](https://Cybench.github.io/) - A diverse benchmark covering multiple CTF categories and difficulty levels

All benchmarks have been reformatted to follow the EnIGMA and EnIGMA+ specification, with each challenge including a challenge.json file and docker-compose.yml file (when required).

Our benchmark suite addresses several issues identified in the original benchmarks, providing repaired versions for reliable evaluation. For detailed information about specific repairs and improvements, see the [benchmarks/README.md](benchmarks/README.md).

EnIGMA+

To facilitate the development of cybersecurity agents, we present EnIGMA+, an enhanced agent scaffolding of EnIGMA that runs hundreds of CTF challenges in _hours_ instead of _days_. EnIGMA+ is built on top of SWE-agent.

Using EnIGMA+, our best model, Cyber-Zero-32B, establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents.

For detailed information about EnIGMA+, including installation, configuration, and usage instructions, please check the [README in the enigma-plus folder](enigma-plus/README.md).

Installation

From Source

# Install dependencies
pip install -r requirements.txt

# Install the package
pip install -e .

Quick Start

Generate Trajectories

# Using the CLI
cyber-zero generate \
--sampled_flags_path task_meta.jsonl \
--output_path trajectories.jsonl \
--trajectories_per_task 3 \
--workers 16

# Using the direct script interface
python generate_trajectory.py \
--sampled_flags_path task_meta.jsonl \
--output_path trajectories.jsonl \
--trajectories_per_task 3 \
--workers 16

Evaluate Quality

# Using the CLI
cyber-zero evaluate \
--input_path trajectories.jsonl \
--output_path quality_results.jsonl \
--model_id deepseek-v3-0324

# Using the direct script interface
python evaluate_quality.py \
--input_path trajectories.jsonl \
--output_path quality_results.jsonl \
--model_id deepseek-v3-0324

Reformat Trajectories

# Using the CLI
cyber-zero reformat \
--input_path quality_results.jsonl \
--output_path formatted_trajectories.jsonl \
--split_output

# Using the direct script interface
python reformat_trajectories.py \
--input_path quality_results.jsonl \
--output_path formatted_trajectories.jsonl \
--split_output

Architecture

The framework follows a modular architecture of Cyber-Zero:

cyber_zero/
├── __init__.py # Package initialization
├── config.py # Configuration management
├── models.py # Data models (TaskMeta, TrajectoryData, etc.)
├── utils.py # Common utilities
├── validation.py # Response and command validation
├── llm_client.py # LLM interaction and quality evaluation
├── trajectory_generator.py # Main trajectory generation logic
├── quality_evaluator.py # Quality evaluation for trajectories
├── trajectory_reformatter.py # Trajectory reformatting for training
├── cli.py # Command-line interface
├── prompts/ # System prompts
│ ├── __init__.py
│ ├── assistant_turn_prompt.txt # Assistant (CTF player) prompt
│ └── user_turn_prompt.txt # User (system/environment) prompt
└── data_collection/ # Data collection utilities
├── __init__.py # Package initialization
├── config.py # Data collection configuration
├── scraper.py # Shared web scraping utilities
└── README.md # Data collection documentation

Key Components

  • Config: Centralized configuration management with model mappings and validation rules
  • Models: Type-safe data structures for tasks, trajectories, and evaluation results
  • Validation: Comprehensive validation of responses, commands, and action formats
  • LLMClient: Abstracted interface for different language models with retry logic
  • TrajectoryGenerator: Main orchestrator for conversation generation
  • CLI: User-friendly command-line interface

Configuration

The framework uses a hierarchical configuration system with centralized model management:

Basic Configuration

from cyber_zero import Config

config…

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

new research repo, low traction

Amazon (Nova) has a repo signal matching infrastructure, safety and policy.