amazon-science/Cyber-Zero
Python
Captured source
source ↗amazon-science/Cyber-Zero
Description: Cyber-Zero: Training Cybersecurity Agents Without Runtime
Language: Python
License: NOASSERTION
Stars: 94
Forks: 17
Open issues: 38
Created: 2025-07-28T17:57:55Z
Pushed: 2026-02-13T14:29:44Z
Default branch: main
Fork: no
Archived: no
README:
Cyber-Zero: Training Cybersecurity Agents without Runtime
🧐 Overview | 🏆 Benchmark Suite | 🚀 Quick Start | 🏗️ Architecture | ⚙️ Configuration | 📊 Generation | 📝 Validation | 📝 CLI Interface | 📝 Citation
Cyber-Zero is a comprehensive framework for training cybersecurity agents without requiring runtime execution environments.
Overview
Large Language Models (LLMs) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, such environments are often unavailable in cybersecurity domains where challenge configurations and execution contexts are ephemeral or restricted. Cyber-Zero addresses this fundamental limitation by leveraging publicly available CTF writeups and employing persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual execution environments.
The key innovation is generating high-quality training trajectories through LLM simulation rather than requiring actual execution environments, making it scalable and practical for training cybersecurity agents. Using trajectories synthesized by Cyber-Zero, we train LLM-based agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench.
Benchmark Suite
To democratize the evaluation of cybersecurity agents, we provide three repaired benchmark suites adapted for EnIGMA+ in Cyber-Zero:
- [InterCode-CTF](https://github.com/princeton-nlp/intercode/tree/master/data/ctf) - A comprehensive collection of CTF challenges covering various cybersecurity domains
- [NYU CTF Bench](https://nyu-llm-ctf.github.io/) - NYU's curated benchmark suite for evaluating LLM-based CTF solving capabilities
- [Cybench](https://Cybench.github.io/) - A diverse benchmark covering multiple CTF categories and difficulty levels
All benchmarks have been reformatted to follow the EnIGMA and EnIGMA+ specification, with each challenge including a challenge.json file and docker-compose.yml file (when required).
Our benchmark suite addresses several issues identified in the original benchmarks, providing repaired versions for reliable evaluation. For detailed information about specific repairs and improvements, see the [benchmarks/README.md](benchmarks/README.md).
EnIGMA+
To facilitate the development of cybersecurity agents, we present EnIGMA+, an enhanced agent scaffolding of EnIGMA that runs hundreds of CTF challenges in _hours_ instead of _days_. EnIGMA+ is built on top of SWE-agent.
Using EnIGMA+, our best model, Cyber-Zero-32B, establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents.
For detailed information about EnIGMA+, including installation, configuration, and usage instructions, please check the [README in the enigma-plus folder](enigma-plus/README.md).
Installation
From Source
# Install dependencies pip install -r requirements.txt # Install the package pip install -e .
Quick Start
Generate Trajectories
# Using the CLI cyber-zero generate \ --sampled_flags_path task_meta.jsonl \ --output_path trajectories.jsonl \ --trajectories_per_task 3 \ --workers 16 # Using the direct script interface python generate_trajectory.py \ --sampled_flags_path task_meta.jsonl \ --output_path trajectories.jsonl \ --trajectories_per_task 3 \ --workers 16
Evaluate Quality
# Using the CLI cyber-zero evaluate \ --input_path trajectories.jsonl \ --output_path quality_results.jsonl \ --model_id deepseek-v3-0324 # Using the direct script interface python evaluate_quality.py \ --input_path trajectories.jsonl \ --output_path quality_results.jsonl \ --model_id deepseek-v3-0324
Reformat Trajectories
# Using the CLI cyber-zero reformat \ --input_path quality_results.jsonl \ --output_path formatted_trajectories.jsonl \ --split_output # Using the direct script interface python reformat_trajectories.py \ --input_path quality_results.jsonl \ --output_path formatted_trajectories.jsonl \ --split_output
Architecture
The framework follows a modular architecture of Cyber-Zero:
cyber_zero/ ├── __init__.py # Package initialization ├── config.py # Configuration management ├── models.py # Data models (TaskMeta, TrajectoryData, etc.) ├── utils.py # Common utilities ├── validation.py # Response and command validation ├── llm_client.py # LLM interaction and quality evaluation ├── trajectory_generator.py # Main trajectory generation logic ├── quality_evaluator.py # Quality evaluation for trajectories ├── trajectory_reformatter.py # Trajectory reformatting for training ├── cli.py # Command-line interface ├── prompts/ # System prompts │ ├── __init__.py │ ├── assistant_turn_prompt.txt # Assistant (CTF player) prompt │ └── user_turn_prompt.txt # User (system/environment) prompt └── data_collection/ # Data collection utilities ├── __init__.py # Package initialization ├── config.py # Data collection configuration ├── scraper.py # Shared web scraping utilities └── README.md # Data collection documentation
Key Components
- Config: Centralized configuration management with model mappings and validation rules
- Models: Type-safe data structures for tasks, trajectories, and evaluation results
- Validation: Comprehensive validation of responses, commands, and action formats
- LLMClient: Abstracted interface for different language models with retry logic
- TrajectoryGenerator: Main orchestrator for conversation generation
- CLI: User-friendly command-line interface
Configuration
The framework uses a hierarchical configuration system with centralized model management:
Basic Configuration
from cyber_zero import Config config…
Excerpt shown — open the source for the full document.
Notability
notability 4.0/10new research repo, low traction
Amazon (Nova) has a repo signal matching infrastructure, safety and policy.