What does this fork signal mean?

Baseten forked basetenlabs/autocomp (forked from ucb-bar/autocomp). This fork signal points to upstream code the lab may be inspecting, patching, or building on. High-signal details: repo basetenlabs/autocomp · parent ucb-bar/autocomp · Automatic model optimization and deployment tool by Baseten Labs.. onlylabs links this event to 1 captured evidence page and 6 related fork signals.

Baseten Fork: basetenlabs/autocomp

Captured source

source ↗

GitHub/github.com/basetenlabs/autocomp

basetenlabs/autocomp repository metadata

Source ↗

published May 1, 2026seen Jun 5captured Jun 11http 200method plain

basetenlabs/autocomp

Description: Autocomp: Optimize any AI kernel, anywhere.

Language: Python

License: BSD-3-Clause

Stars: 0

Forks: 0

Open issues: 6

Created: 2026-05-01T17:27:12Z

Pushed: 2026-05-28T03:47:00Z

Default branch: main

Fork: yes

Parent repository: ucb-bar/autocomp

Archived: no

README:

Autocomp

Optimize any AI kernel, anywhere.

| Paper | Blog | VS Code Extension |

Autocomp is a portable, extensible framework for LLM-driven kernel optimization across tensor accelerators. Point it at a kernel, pick your hardware target, and Autocomp speeds it up, automatically.

It already delivers strong results across [AWS Trainium](https://aws.amazon.com/ai/machine-learning/trainium/), [Google TPU](https://cloud.google.com/tpu), [NVIDIA GPUs](https://charleshong3.github.io/blog/autocomp_update.html), [Gemmini](https://github.com/ucb-bar/gemmini), [RISC-V Vector Processors](https://saturn-vectors.org/), and [Apple Silicon GPUs](https://developer.apple.com/documentation/apple-silicon). Need a new target? The [Agent Builder](autocomp/agent_builder/README.md) can spin up a hardware-specific optimization agent from your docs in minutes.

📚 Read the paper · ✏️ Authors: Charles Hong, Sahil Bhatia, Alvin Cheung, Yakun Sophia Shao (UC Berkeley)

🚀 Quick Start

Autocomp's workflow is:

1. Pick your hardware target:

Choose an optimization agent (or build your own with the Agent Builder).
Set up an evaluation backend.

2. Configure one or more LLMs. 3. Edit autocomp/search/run_search.py with your settings. 4. Run search.

For example, a Trainium run might look like this:

# autocomp/search/run_search.py
backend_name = "trn"
agent_name = "built:trn1-nki1"
hw_config = TrnHardwareConfig("trn1.2xlarge")
prob_type = "trn-tutorial-nki1"
prob_id = 2
models = ["openai::gpt-5.4"]

Then run:

python -m autocomp.search.run_search

Keep reading for more on picking your hardware target, setting up your backend, configuring LLM providers, and tuning the search.

⚙️ Setup

Hardware Targets

Each hardware target requires two things: an optimization agent that knows how to optimize code for that target, and an evaluation backend — the toolchain that compiles and benchmarks code on it. You also provide a hardware config (hw_config) that describes your specific hardware instance (e.g., TrnHardwareConfig("trn1.2xlarge")). The table below shows the supported targets and the agents/backends available for each.

| Hardware target | Optimization agent(s) | Evaluation backend(s) | |---|---|---| | AWS Trainium | built:trn1-nki1 (Trainium 1, NKI v1) built:trn2-nki1 (Trainium 2, NKI v1) built:trn2-nki2 (Trainium 2, NKI v2) | trn ([trn_setup.md](autocomp/backend/trn/trn_setup.md)) | | Google TPU | built:tpu-v6e (TPU v6e) built:tpu-v5e (TPU v5e / v5litepod) | tpu ([tpu_setup.md](autocomp/backend/tpu/tpu_setup.md)) jaxbench ([jaxbench_setup.md](autocomp/backend/jaxbench/jaxbench_setup.md)) | | Gemmini | gemmini | gemmini ([gemmini_setup.md](autocomp/backend/gemmini/gemmini_setup.md)) | | NVIDIA GPU | cuda | kernelbench ([kb_setup.md](autocomp/backend/kernelbench/kb_setup.md)) gpumode ([gpumode_setup.md](autocomp/backend/gpumode/gpumode_setup.md)) | | Saturn (RVV) | built:saturn-rvv | saturn ([saturn_setup.md](autocomp/backend/saturn/saturn_setup.md)) xnnpack ([xnnpack_setup.md](autocomp/backend/xnnpack/xnnpack_setup.md)) | | Apple Metal | built:metal-m2 (Apple M2) | metal ([metal_setup.md](autocomp/backend/metal/metal_setup.md)) |

Partially supported hardware targets:

RISC-V Vector (RVV) on Canaan Kendryte K230. See k230 branch for code. As the implementation is very hacky, we do not currently recommend using this hardware target.

For instructions on adding full codebase support for a new hardware target (eval backend, config class, etc.), see [ADDING_HARDWARE_SUPPORT.md](ADDING_HARDWARE_SUPPORT.md).

🧠 Optimization Agents

Optimization agents decide what transformations to try and how to implement them. In run_search.py, this is controlled by agent_name. Each agent is designed for a specific hardware target — see the table above for the right agent for each target. We recommend using the Agent Builder as the fastest way to set up a complete agent from your hardware's documentation.

🏗️ Agent Builder

Want to create a new agent? The [Agent Builder](autocomp/agent_builder/README.md) automatically generates hardware-specific optimization agents from documentation sources such as local directories, PDFs, and webpages. Built agents are stored in autocomp/agent_builder/.built/ and selected with agent_name = "built:". Legacy handcrafted agents in autocomp/agents/ (e.g., gemmini, cuda) are also available for some targets.

pip install "autocomp[agent-builder]"

python -m autocomp.agent_builder.run_agent_builder \
--agent-name my_accelerator \
--source-dir path/to/docs \
--agent-scope "Optimizing kernels for MyAccelerator using the XYZ programming interface."

For detailed usage, CLI options, Python API, and output format, see the [Agent Builder documentation](autocomp/agent_builder/README.md).

LLM Setup

Autocomp supports both local and remote endpoint LLM inference. For local inference, we support vLLM's OpenAI-compatible server. For endpoint inference, we support a variety of providers (see below).

Local Inference with vLLM

1. Install and launch vLLM:

pip install vllm
vllm serve --model Qwen/Qwen3-8B --port 8000 -tp

2. Configure Autocomp: Set models/code_models in run_search.py:

models = ["vllm::Qwen/Qwen3-8B"]

Optionally set VLLM_API_BASE if using a different host/port (default: http://localhost:8000/v1).

3. Multiple models on different ports: You can serve multiple vLLM models on separate ports and use them together by encoding the base URL in the provider string with the format vllm@:::

# Terminal 1
vllm serve --model Qwen/Qwen3-8B --port 8000 -tp 1
# Terminal 2
vllm serve --model meta-llama/Llama-3-70B --port 8001 -tp 4

models = [
"vllm@http://localhost:8000/v1::Qwen/Qwen3-8B",
"vllm@http://localhost:8001/v1::meta-llama/Llama-3-70B",
]

For more details, see the vLLM documentation.

LLM Endpoint Setup

API keys can be configured via environment...

Excerpt shown — open the source for the full document.

Notability

notability 2.0/10

Routine fork by same org