ForkWaferWaferpublished Jan 26, 2026seen 5d

wafer-ai/aiter

forked from ROCm/aiter

Open original ↗

Captured source

source ↗
published Jan 26, 2026seen 5dcaptured 14hhttp 200method plain

wafer-ai/aiter

Description: AI Tensor Engine for ROCm

License: MIT

Stars: 0

Forks: 0

Open issues: 0

Created: 2026-01-26T13:32:34Z

Pushed: 2026-01-29T13:26:52Z

Default branch: main

Fork: yes

Parent repository: ROCm/aiter

Archived: no

README:

aiter

!image

AITER is AMD’s centralized repository that support various of high performance AI operators for AI workloads acceleration, where a good unified place for all the customer operator-level requests, which can match different customers' needs. Developers can focus on operators, and let the customers integrate this op collection into their own private/public/whatever framework.

Some summary of the features:

  • C++ level API
  • Python level API
  • The underneath kernel could come from triton/ck/asm
  • Not just inference kernels, but also training kernels and GEMM+communication kernels—allowing for workarounds in any kernel-framework combination for any architecture limitation.

Installation

git clone --recursive https://github.com/ROCm/aiter.git
cd aiter
python3 setup.py develop

If you happen to forget the --recursive during clone, you can use the following command after cd aiter

git submodule sync && git submodule update --init --recursive

Triton-based Communication (Iris)

AITER supports GPU-initiated communication using the Iris library. This enables high-performance Triton-based communication primitives like reduce-scatter and all-gather.

Installation

Install with Triton communication support:

# Install AITER with Triton communication dependencies
pip install -e .
pip install -r requirements-triton-comms.txt

For more details, see [docs/triton_comms.md](docs/triton_comms.md).

Run operators supported by aiter

There are number of op test, you can run them with: python3 op_tests/test_layernorm2d.py | Ops | Description | |-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------| |ELEMENT WISE | ops: + - * / | |SIGMOID | (x) = 1 / (1 + e^-x) | |AllREDUCE | Reduce + Broadcast | |KVCACHE | W_K W_V | |MHA | Multi-Head Attention | |MLA | Multi-head Latent Attention with KV-Cache layout | |PA | Paged Attention | |FusedMoe | Mixture of Experts | |QUANT | BF16/FP16 -> FP8/INT4 | |RMSNORM | root mean square | |LAYERNORM | x = (x - u) / (σ2 + ϵ) e*0.5 | |ROPE | Rotary Position Embedding | |GEMM | D=αAβB+C |

Notability

notability 1.0/10

Routine fork of a repo