RepoMicrosoftMicrosoftpublished Oct 11, 2024seen 6d

microsoft/AIOpsLab

Python

Open original ↗

Captured source

source ↗
published Oct 11, 2024seen 6dcaptured 6dhttp 200method plain

microsoft/AIOpsLab

Description: A holistic framework to enable the design, development, and evaluation of autonomous AIOps agents.

Language: Python

License: MIT

Stars: 899

Forks: 162

Open issues: 28

Created: 2024-10-11T00:44:00Z

Pushed: 2026-06-20T00:40:16Z

Default branch: main

Fork: no

Archived: no

README:

🤖 Overview

![alt text](./assets/images/aiopslab-arch-open-source.png)

AIOpsLab is a holistic framework to enable the design, development, and evaluation of autonomous AIOps agents that, additionally, serve the purpose of building reproducible, standardized, interoperable and scalable benchmarks. AIOpsLab can deploy microservice cloud environments, inject faults, generate workloads, and export telemetry data, while orchestrating these components and providing interfaces for interacting with and evaluating agents.

Moreover, AIOpsLab provides a built-in benchmark suite with a set of problems to evaluate AIOps agents in an interactive environment. This suite can be easily extended to meet user-specific needs. See the problem list [here](/aiopslab/orchestrator/problems/registry.py#L15).

📦 Installation

Requirements

  • Python >= 3.11
  • Helm
  • Poetry (recommended) or pip
  • Additional requirements depend on the deployment option selected, which is explained in the next section

Step 1: Install Python 3.11

sudo apt update
sudo apt install python3.11 python3.11-venv python3.11-dev -y

Step 2: Install Poetry (Official Installer)

# Use the official installer (NOT apt - the apt version is outdated)
curl -sSL https://install.python-poetry.org | python3.11 -
export PATH="$HOME/.local/bin:$PATH"

# Add to your shell profile for persistence
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc

> Warning: Do NOT use sudo apt install python3-poetry - it installs an outdated version that may not work with the lock file.

Step 3: Clone and Install

git clone --recurse-submodules
cd AIOpsLab
poetry env use python3.11
poetry install
eval $(poetry env activate)

> Troubleshooting: If you get a "lock file not compatible" error, run poetry lock first, then poetry install.

Alternative installation with pip:

pip install -e .

🚀 Quick Start

Choose either a) or b) to set up your cluster and then proceed to the next steps.

a) Local simulated cluster

AIOpsLab can be run on a local simulated cluster using kind on your local machine. Please look at this [README](kind/README.md#prerequisites) for a list of prerequisites.

# For x86 machines
kind create cluster --config kind/kind-config-x86.yaml

# For ARM machines
kind create cluster --config kind/kind-config-arm.yaml

If you're running into issues, consider building a Docker image for your machine by following this [README](kind/README.md#deployment-steps). Please also open an issue.

[Tips]

If you are running AIOpsLab using a proxy, beware of exporting the HTTP proxy as 172.17.0.1. When creating the kind cluster, all the nodes in the cluster will inherit the proxy setting from the host environment and the Docker container.

The 172.17.0.1 address is used to communicate with the host machine. For more details, refer to the official guide: Configure Kind to Use a Proxy.

Additionally, Docker doesn't support SOCKS5 proxy directly. If you're using a SOCKS5 protocol to proxy, you may need to use Privoxy to forward SOCKS5 to HTTP.

If you're running VLLM and the LLM agent locally, Privoxy will by default proxy localhost, which will cause errors. To avoid this issue, you should set the following environment variable:

export no_proxy=localhost

After finishing cluster creation, proceed to the next "Update config.yml" step.

b) Remote cluster (Manual setup with Ansible)

AIOpsLab supports any remote kubernetes cluster that your kubectl context is set to, whether it's a cluster from a cloud provider or one you build yourself. We have some Ansible playbooks to setup clusters on providers like CloudLab and our own machines. Follow this [README](./scripts/ansible/README.md) to set up your own cluster, and then proceed to the next "Update config.yml" step.

c) Azure VMs with Terraform + Ansible (Recommended for cloud)

Single command provisions VMs, sets up K8s, and configures AIOpsLab:

# Mode B (AIOpsLab on laptop, remote kubectl):
python3 scripts/terraform/deploy.py --apply --resource-group --workers 2 --mode B

# Mode A (AIOpsLab on controller VM, full fault injection support):
python3 scripts/terraform/deploy.py --apply --resource-group --workers 2 --mode A

See [Terraform README](./scripts/terraform/README.md) for all options (--allowed-ips, --dev, --setup-only, etc.).

> Note: Mode B is convenient for development but some fault injectors (e.g., VirtualizationFaultInjector) require Docker on the local machine. Use Mode A for full functionality.

Update config.yml

cd aiopslab
cp config.yml.example config.yml

Update your config.yml so that k8s_host is the host name of the control plane node of your cluster. Update k8s_user to be your username on the control plane node. If you are using a kind cluster, your k8s_host should be kind. If you're running AIOpsLab on cluster, your k8s_host should be localhost.

Running agents locally

Human as the agent:

python3 cli.py
(aiopslab) $ start misconfig_app_hotel_res-detection-1 # or choose any problem you want to solve
# ... wait for the setup ...
(aiopslab) $ submit("Yes") # submit solution

Run GPT-4 baseline agent:

# Create a .env file in the project root (if not exists)
echo "OPENAI_API_KEY=" > .env
# Add more API keys as needed:
# echo "QWEN_API_KEY=" >> .env
# echo "DEEPSEEK_API_KEY=" >> .env

python3 clients/gpt.py # you can also change the problem to solve in the main() function

Our repository comes with a variety of pre-integrated agents, including agents that enable secure authentication with Azure OpenAI endpoints using identity-based access. Please check out [Clients](/clients) for a comprehensive list of all implemented clients.

The clients will automatically load API keys from your .env file.

You can check the running status of the cluster using...

Excerpt shown — open the source for the full document.

Notability

notability 7.0/10

New Microsoft repo with strong traction (899 stars)