basetenlabs/autocomp
forked from ucb-bar/autocomp
Captured source
source ↗basetenlabs/autocomp
Description: Autocomp: Optimize any AI kernel, anywhere.
Language: Python
License: BSD-3-Clause
Stars: 0
Forks: 0
Open issues: 6
Created: 2026-05-01T17:27:12Z
Pushed: 2026-05-28T03:47:00Z
Default branch: main
Fork: yes
Parent repository: ucb-bar/autocomp
Archived: no
README:
Autocomp
Optimize any AI kernel, anywhere.
| Paper | Blog | VS Code Extension |
Autocomp is a portable, extensible framework for LLM-driven kernel optimization across tensor accelerators. Point it at a kernel, pick your hardware target, and Autocomp speeds it up, automatically.
It already delivers strong results across [AWS Trainium](https://aws.amazon.com/ai/machine-learning/trainium/), [Google TPU](https://cloud.google.com/tpu), [NVIDIA GPUs](https://charleshong3.github.io/blog/autocomp_update.html), [Gemmini](https://github.com/ucb-bar/gemmini), [RISC-V Vector Processors](https://saturn-vectors.org/), and [Apple Silicon GPUs](https://developer.apple.com/documentation/apple-silicon). Need a new target? The [Agent Builder](autocomp/agent_builder/README.md) can spin up a hardware-specific optimization agent from your docs in minutes.
📚 Read the paper · ✏️ Authors: Charles Hong, Sahil Bhatia, Alvin Cheung, Yakun Sophia Shao (UC Berkeley)
🚀 Quick Start
Autocomp's workflow is:
1. Pick your hardware target:
- Choose an optimization agent (or build your own with the Agent Builder).
- Set up an evaluation backend.
2. Configure one or more LLMs. 3. Edit autocomp/search/run_search.py with your settings. 4. Run search.
For example, a Trainium run might look like this:
# autocomp/search/run_search.py
backend_name = "trn"
agent_name = "built:trn1-nki1"
hw_config = TrnHardwareConfig("trn1.2xlarge")
prob_type = "trn-tutorial-nki1"
prob_id = 2
models = ["openai::gpt-5.4"]Then run:
python -m autocomp.search.run_search
Keep reading for more on picking your hardware target, setting up your backend, configuring LLM providers, and tuning the search.
⚙️ Setup
Hardware Targets
Each hardware target requires two things: an optimization agent that knows how to optimize code for that target, and an evaluation backend — the toolchain that compiles and benchmarks code on it. You also provide a hardware config (hw_config) that describes your specific hardware instance (e.g., TrnHardwareConfig("trn1.2xlarge")). The table below shows the supported targets and the agents/backends available for each.
| Hardware target | Optimization agent(s) | Evaluation backend(s) | |---|---|---| | AWS Trainium | built:trn1-nki1 (Trainium 1, NKI v1) built:trn2-nki1 (Trainium 2, NKI v1) built:trn2-nki2 (Trainium 2, NKI v2) | trn ([trn_setup.md](autocomp/backend/trn/trn_setup.md)) | | Google TPU | built:tpu-v6e (TPU v6e) built:tpu-v5e (TPU v5e / v5litepod) | tpu ([tpu_setup.md](autocomp/backend/tpu/tpu_setup.md)) jaxbench ([jaxbench_setup.md](autocomp/backend/jaxbench/jaxbench_setup.md)) | | Gemmini | gemmini | gemmini ([gemmini_setup.md](autocomp/backend/gemmini/gemmini_setup.md)) | | NVIDIA GPU | cuda | kernelbench ([kb_setup.md](autocomp/backend/kernelbench/kb_setup.md)) gpumode ([gpumode_setup.md](autocomp/backend/gpumode/gpumode_setup.md)) | | Saturn (RVV) | built:saturn-rvv | saturn ([saturn_setup.md](autocomp/backend/saturn/saturn_setup.md)) xnnpack ([xnnpack_setup.md](autocomp/backend/xnnpack/xnnpack_setup.md)) | | Apple Metal | built:metal-m2 (Apple M2) | metal ([metal_setup.md](autocomp/backend/metal/metal_setup.md)) |
Partially supported hardware targets:
- RISC-V Vector (RVV) on Canaan Kendryte K230. See
k230branch for code. As the implementation is very hacky, we do not currently recommend using this hardware target.
For instructions on adding full codebase support for a new hardware target (eval backend, config class, etc.), see [ADDING_HARDWARE_SUPPORT.md](ADDING_HARDWARE_SUPPORT.md).
🧠 Optimization Agents
Optimization agents decide what transformations to try and how to implement them. In run_search.py, this is controlled by agent_name. Each agent is designed for a specific hardware target — see the table above for the right agent for each target. We recommend using the Agent Builder as the fastest way to set up a complete agent from your hardware's documentation.
🏗️ Agent Builder
Want to create a new agent? The [Agent Builder](autocomp/agent_builder/README.md) automatically generates hardware-specific optimization agents from documentation sources such as local directories, PDFs, and webpages. Built agents are stored in autocomp/agent_builder/.built/ and selected with agent_name = "built:". Legacy handcrafted agents in autocomp/agents/ (e.g., gemmini, cuda) are also available for some targets.
pip install "autocomp[agent-builder]" python -m autocomp.agent_builder.run_agent_builder \ --agent-name my_accelerator \ --source-dir path/to/docs \ --agent-scope "Optimizing kernels for MyAccelerator using the XYZ programming interface."
For detailed usage, CLI options, Python API, and output format, see the [Agent Builder documentation](autocomp/agent_builder/README.md).
LLM Setup
Autocomp supports both local and remote endpoint LLM inference. For local inference, we support vLLM's OpenAI-compatible server. For endpoint inference, we support a variety of providers (see below).
Local Inference with vLLM
1. Install and launch vLLM:
pip install vllm vllm serve --model Qwen/Qwen3-8B --port 8000 -tp
2. Configure Autocomp: Set models/code_models in run_search.py:
models = ["vllm::Qwen/Qwen3-8B"]
Optionally set VLLM_API_BASE if using a different host/port (default: http://localhost:8000/v1).
3. Multiple models on different ports: You can serve multiple vLLM models on separate ports and use them together by encoding the base URL in the provider string with the format vllm@:::
# Terminal 1 vllm serve --model Qwen/Qwen3-8B --port 8000 -tp 1 # Terminal 2 vllm serve --model meta-llama/Llama-3-70B --port 8001 -tp 4
models = [ "vllm@http://localhost:8000/v1::Qwen/Qwen3-8B", "vllm@http://localhost:8001/v1::meta-llama/Llama-3-70B", ]
For more details, see the vLLM documentation.
LLM Endpoint Setup
API keys can be configured via environment…
Excerpt shown — open the source for the full document.
Notability
notability 2.0/10Routine fork by same org