{"schema_version":"onlylabs.public_analysis_evidence.v1","title":"Wafer analysis evidence pack","description":"Public onlylabs evidence pack for cited agent analysis: captured pages, ranked public signals, and stored web-search provenance used by the background analysis workflow.","url":"https://onlylabs.fyi/labs/wafer","json_url":"https://onlylabs.fyi/analysis/wafer/evidence.json","generated_at":"2026-06-11T18:06:26.904Z","org":{"slug":"wafer","name":"Wafer","category":"neocloud","category_label":"Neocloud","dossier_url":"https://onlylabs.fyi/labs/wafer"},"analysis":null,"workflow":{"version":"onlylabs-deepagents-analysis-v3","provider":null,"model":null,"agent":null,"public_pack_mode":"local-pages-and-events","live_web_fetches":false,"note":"Public evidence exports do not trigger live Exa calls; stored Exa provenance is included when analysis metadata contains it."},"stats":{"pages":9,"events":9,"web":0,"evidence":18,"signal_desks":{"hiring":0,"forks":3,"releases":0,"talking":0,"repos":6},"data_radar_lanes":null,"data_radar_matches":null,"stored_analysis_evidence":null,"stored_analysis_web":null,"stored_analysis_signal_desks":null,"stored_analysis_data_radar_lanes":null,"stored_analysis_data_radar_matches":null},"stored_web_provenance":null,"evidence":[{"ref":"P1","kind":"page","title":"wafer-ai/chipbenchmark repository metadata","date":"2026-06-11T04:08:54.637818+00:00","date_source":null,"source_url":"https://github.com/wafer-ai/chipbenchmark","signal_url":null,"signal_json_url":null,"text":"# wafer-ai/chipbenchmark\n\nDescription: a platform for monitoring the chip situation\n\nLanguage: Shell\n\nStars: 17\n\nForks: 3\n\nOpen issues: 0\n\nCreated: 2025-07-13T04:17:39Z\n\nPushed: 2025-07-19T16:27:31Z\n\nDefault branch: main\n\nFork: no\n\nArchived: no\n\nREADME:\n# Chip Benchmark\n\nA platform for visualizing chip benchmark results.\n\n![Chip Benchmark Banner](assets/banner.png)\n\n## Quick Start\n\n```bash\ngit clone <repository-url>\ncd chipbenchmark/frontend\nyarn install\nyarn dev\n```\n\nVisit [http://localhost:3000](http://localhost:3000)\n\n## Adding Data\n\nPut your benchmark data in `benchmarks/` then run:\n\n```bash\nnode scripts/sync-benchmark-data.mjs\n```"},{"ref":"P2","kind":"page","title":"wafer-ai/gpu-perf-engineering-resources repository metadata","date":"2026-06-11T03:02:57.807675+00:00","date_source":null,"source_url":"https://github.com/wafer-ai/gpu-perf-engineering-resources","signal_url":null,"signal_json_url":null,"text":"# wafer-ai/gpu-perf-engineering-resources\n\nDescription: A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do\n\nStars: 819\n\nForks: 98\n\nOpen issues: 0\n\nCreated: 2026-01-12T00:47:24Z\n\nPushed: 2026-04-27T21:27:59Z\n\nDefault branch: main\n\nFork: no\n\nArchived: no\n\nREADME:\n<p align=\"center\">\n<img src=\"cover.avif\" alt=\"Performance Engineering for AI Infra\" width=\"100%\">\n</p>\n\n# Learning Guide: Performance Engineering for AI Infra\n\n## Purpose\n\nThe purpose of this guide is to help engineers learn GPU kernel programming and optimization, with a focus on high-performance AI systems. It covers the full journey from fundamentals to production deployment, balancing foundational concepts with cutting-edge techniques.\n\nIf you're interested in GPU performance engineering - [we're hiring at Wafer](https://wafer.ai).\n\n## How to read\n\nRecommended reading order:\n\n1. Read \"Tier 1\" for all topics\n2. Read \"Tier 2\" for all topics\n3. Etc\n\n## Table of contents\n\n- [Fundamentals](#fundamentals)\n- [Introduction to GPU programming](#introduction-to-gpu-programming)\n- [Architecture deep dives](#architecture-deep-dives)\n- [Low-level details](#low-level-details)\n- [Matrix Multiplication](#matrix-multiplication)\n- [Essential tutorials](#essential-tutorials)\n- [Advanced implementations](#advanced-implementations)\n- [cuBLAS internals](#cublas-internals)\n- [Tensor Cores & Mixed Precision](#tensor-cores--mixed-precision)\n- [Tensor core fundamentals](#tensor-core-fundamentals)\n- [Precision formats](#precision-formats)\n- [Blackwell-specific](#blackwell-specific)\n- [Attention & Memory-Bound Kernels](#attention--memory-bound-kernels)\n- [FlashAttention](#flashattention)\n- [PagedAttention & serving](#pagedattention--serving)\n- [KV cache optimization](#kv-cache-optimization)\n- [Compiler & DSL Approaches](#compiler--dsl-approaches)\n- [Triton](#triton)\n- [CUTLASS & CuTe](#cutlass--cute)\n- [Other DSLs](#other-dsls)\n- [Profiling & Optimization](#profiling--optimization)\n- [NVIDIA tools](#nvidia-tools)\n- [Optimization techniques](#optimization-techniques)\n- [Advanced topics](#advanced-topics)\n- [AMD & Alternative Hardware](#amd--alternative-hardware)\n- [ROCm funda"},{"ref":"P3","kind":"page","title":"wafer-ai/HIP-Benchmarks-Results repository metadata","date":"2026-06-11T02:53:18.781532+00:00","date_source":null,"source_url":"https://github.com/wafer-ai/HIP-Benchmarks-Results","signal_url":null,"signal_json_url":null,"text":"# wafer-ai/HIP-Benchmarks-Results\n\nDescription: Traces and Kernels of our LLM generated HIP benchmarks\n\nLanguage: Python\n\nStars: 2\n\nForks: 1\n\nOpen issues: 0\n\nCreated: 2026-01-23T20:25:53Z\n\nPushed: 2026-01-23T21:33:27Z\n\nDefault branch: main\n\nFork: no\n\nArchived: no\n\nREADME: none published or not readable through the GitHub API."},{"ref":"P4","kind":"page","title":"wafer-ai/skills repository metadata","date":"2026-06-11T02:53:18.307183+00:00","date_source":null,"source_url":"https://github.com/wafer-ai/skills","signal_url":null,"signal_json_url":null,"text":"# wafer-ai/skills\n\nStars: 1\n\nForks: 0\n\nOpen issues: 0\n\nCreated: 2026-01-24T04:50:24Z\n\nPushed: 2026-01-24T04:51:12Z\n\nDefault branch: main\n\nFork: no\n\nArchived: no\n\nREADME: none published or not readable through the GitHub API."},{"ref":"P5","kind":"page","title":"wafer-ai/kernel-arena repository metadata","date":"2026-06-11T02:52:08.643357+00:00","date_source":null,"source_url":"https://github.com/wafer-ai/kernel-arena","signal_url":null,"signal_json_url":null,"text":"# wafer-ai/kernel-arena\n\nDescription: Public benchmark results from Kernel Arena, a leaderboard for LLM-generated AI accelerator kernels.\n\nLanguage: Python\n\nStars: 19\n\nForks: 0\n\nOpen issues: 0\n\nCreated: 2026-03-10T21:23:03Z\n\nPushed: 2026-03-11T16:41:33Z\n\nDefault branch: main\n\nFork: no\n\nArchived: no\n\nREADME:\n# Kernel Arena — Benchmark Results\n\nPublic benchmark results from [Kernel Arena](https://kernelarena.ai/eval), a leaderboard for LLM-generated AI accelerator kernels.\n\n## Benchmark Suites\n\n| Suite | Hardware | Tasks | Models | Reference |\n| --- | --- | --- | --- | --- |\n| [WaferBench NVFP4](waferbench-nvfp4-b200/) | NVIDIA B200 (CUDA 12.8) | 6 fused NVFP4 inference kernels | GPT-5.4, Claude-4.6-Opus, Composer-1.5, Gemini-3.1-Pro | FlashInfer 0.2.6.post1 |\n| [KernelBench HIP](kernelbench-hip-mi300x/) | AMD MI300X (ROCm 7.0) | 41 kernels across 4 difficulty levels | 11 models from Anthropic, OpenAI, Google, xAI, Moonshot, Z.AI | PyTorch (torch.allclose) |\n\n## Links\n\n- **Leaderboard:** [kernelarena.ai/eval](https://kernelarena.ai/eval)\n- **Methodology:** [kernelarena.ai/methodology](https://kernelarena.ai/methodology)\n- **Reward Hacking Catalog:** [kernelarena.ai/resources](https://kernelarena.ai/resources)"},{"ref":"P6","kind":"page","title":"wafer-ai/composable_kernel repository metadata","date":"2026-06-11T02:43:18.706112+00:00","date_source":null,"source_url":"https://github.com/wafer-ai/composable_kernel","signal_url":null,"signal_json_url":null,"text":"# wafer-ai/composable_kernel\n\nDescription: Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators\n\nLicense: NOASSERTION\n\nStars: 0\n\nForks: 0\n\nOpen issues: 0\n\nCreated: 2026-01-22T05:04:34Z\n\nPushed: 2026-01-22T05:04:59Z\n\nDefault branch: develop\n\nFork: yes\n\nParent repository: ROCm/composable_kernel\n\nArchived: no\n\nREADME:\n# Composable Kernel\n\n> [!NOTE]\n> The published documentation is available at [Composable Kernel](https://rocm.docs.amd.com/projects/composable_kernel/en/latest/) in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the `docs` folder of this repository. As with all ROCm projects, the documentation is open source. For more information on contributing to the documentation, see [Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html).\n\nThe Composable Kernel (CK) library provides a programming model for writing performance-critical\nkernels for machine learning workloads across multiple architectures (GPUs, CPUs, etc.). The CK library\nuses general purpose kernel languages, such as HIP C++.\n\nCK uses two concepts to achieve performance portability and code maintainability:\n\n* A tile-based programming model\n* Algorithm complexity reduction for complex machine learning (ML) operators. This uses an innovative\ntechnique called *Tensor Coordinate Transformation*.\n\n![ALT](/docs/data/ck_component.png \"CK Components\")\n\nThe current CK library is structured into four layers:\n\n* Templated Tile Operators\n* Templated Kernel and Invoker\n* Instantiated Kernel and Invoker\n* Client API\n\n![ALT](/docs/data/ck_layer.png \"CK Layers\")\n\n## General information\n\n* [CK supported operations](include/ck/README.md)\n* [CK Tile supported operations](include/ck_tile/README.md)\n* [CK wrapper](client_example/25_wrapper/README.md)\n* [CK codegen](codegen/README.md)\n* [CK profiler](profiler/README.md)\n* [Examples (Custom use of CK supported operations)](example/README.md)\n* [Client examples (Use of CK supported operations with instance factory)](client_example/README.md)\n* [Terminology](/TERMINOLOGY.md)\n* [Contributors](/CONTRIBUTORS.md)\n\nCK is rel"},{"ref":"P7","kind":"page","title":"wafer-ai/modular repository metadata","date":"2026-06-11T02:43:18.476098+00:00","date_source":null,"source_url":"https://github.com/wafer-ai/modular","signal_url":null,"signal_json_url":null,"text":"# wafer-ai/modular\n\nDescription: The Modular Platform (includes MAX & Mojo)\n\nLicense: NOASSERTION\n\nStars: 0\n\nForks: 0\n\nOpen issues: 0\n\nCreated: 2026-01-22T05:04:35Z\n\nPushed: 2026-01-22T05:22:42Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: modular/modular\n\nArchived: no\n\nREADME:\n<div align=\"center\">\n<img src=\"https://modular-assets.s3.amazonaws.com/images/modular_github_logo_bg.png\">\n\n[About Modular] | [Get started] | [API docs] | [Contributing] | [Changelog]\n</div>\n\n[About Modular]: https://www.modular.com/\n[Get started]: https://docs.modular.com/max/get-started\n[API docs]: https://docs.modular.com/max/api\n[Contributing]: ./CONTRIBUTING.md\n[Changelog]: https://docs.modular.com/max/changelog\n\n---\n[Join us next Thursday, December 11th][dec-meetup] at Modular's Los Altos\noffices for a [Modular Meetup][meetup-group] going inside the MAX platform!\n\n# Modular Platform\n\n> A unified platform for AI development and deployment, including **MAX**🧑‍🚀 and\n**Mojo**🔥.\n\nThe Modular Platform is an open and fully-integrated suite of AI libraries\nand tools that accelerates model serving and scales GenAI deployments. It\nabstracts away hardware complexity so you can run the most popular open\nmodels with industry-leading GPU and CPU performance without any code changes.\n\n![](https://docs.modular.com/images/modular-container-stack.png?20250513)\n\n## Get started\n\nYou don't need to clone this repo.\n\nYou can install Modular as a `pip` or `conda` package and then start an\nOpenAI-compatible endpoint with a model of your choice.\n\nTo get started with the Modular Platform and serve a model using the MAX\nframework, see [the quickstart guide](https://docs.modular.com/max/get-started).\n\n> [!NOTE]\n> **Nightly vs. stable releases**\n> If you cloned the repo and want a stable release, run\n`git checkout modular/vX.X` to match the version.\n> The `main` branch tracks nightly builds, while the `stable` branch matches\nthe latest released version.\n\nAfter your model endpoint is up and running, you can start sending the model\ninference requests using\n[our OpenAI-compatible REST API](https://docs.modular.com/max/api/serve).\n\nTry running hundreds of other models from\n[our model repository](https://b"},{"ref":"P8","kind":"page","title":"wafer-ai/aiter repository metadata","date":"2026-06-11T02:43:18.246267+00:00","date_source":null,"source_url":"https://github.com/wafer-ai/aiter","signal_url":null,"signal_json_url":null,"text":"# wafer-ai/aiter\n\nDescription: AI Tensor Engine for ROCm\n\nLicense: MIT\n\nStars: 0\n\nForks: 0\n\nOpen issues: 0\n\nCreated: 2026-01-26T13:32:34Z\n\nPushed: 2026-01-29T13:26:52Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: ROCm/aiter\n\nArchived: no\n\nREADME:\n# aiter\n![image](https://github.com/user-attachments/assets/9457804f-77cd-44b0-a088-992e4b9971c6)\n\nAITER is AMD’s centralized repository that support various of high performance AI operators for AI workloads acceleration, where a good unified place for all the customer operator-level requests, which can match different customers' needs. Developers can focus on operators, and let the customers integrate this op collection into their own private/public/whatever framework.\n\nSome summary of the features:\n* C++ level API\n* Python level API\n* The underneath kernel could come from triton/ck/asm\n* Not just inference kernels, but also training kernels and GEMM+communication kernels—allowing for workarounds in any kernel-framework combination for any architecture limitation.\n\n## Installation\n```\ngit clone --recursive https://github.com/ROCm/aiter.git\ncd aiter\npython3 setup.py develop\n```\n\nIf you happen to forget the `--recursive` during `clone`, you can use the following command after `cd aiter`\n```\ngit submodule sync && git submodule update --init --recursive\n```\n\n### Triton-based Communication (Iris)\n\nAITER supports GPU-initiated communication using the [Iris library](https://github.com/ROCm/iris). This enables high-performance Triton-based communication primitives like reduce-scatter and all-gather.\n\n**Installation**\n\nInstall with Triton communication support:\n\n```bash\n# Install AITER with Triton communication dependencies\npip install -e .\npip install -r requirements-triton-comms.txt\n```\n\nFor more details, see [docs/triton_comms.md](docs/triton_comms.md).\n\n## Run operators supported by aiter\n\nThere are number of op test, you can run them with: `python3 op_tests/test_layernorm2d.py`\n| **Ops** | **Description** |\n|-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n|ELEMENT WISE"},{"ref":"P9","kind":"page","title":"wafer-ai/wafer-docs repository metadata","date":"2026-06-11T02:42:16.1924+00:00","date_source":null,"source_url":"https://github.com/wafer-ai/wafer-docs","signal_url":null,"signal_json_url":null,"text":"# wafer-ai/wafer-docs\n\nDescription: Public Mintlify documentation for Wafer\n\nLanguage: MDX\n\nStars: 3\n\nForks: 1\n\nOpen issues: 7\n\nCreated: 2026-05-07T02:27:44Z\n\nPushed: 2026-06-08T19:44:01Z\n\nDefault branch: main\n\nFork: no\n\nArchived: no\n\nREADME:\n# Wafer Docs\n\nPublic documentation for Wafer."},{"ref":"E1","kind":"event","title":"wafer-ai/wafer-docs","date":"2026-05-07T02:27:44+00:00","date_source":"source","source_url":"https://github.com/wafer-ai/wafer-docs","signal_url":"https://onlylabs.fyi/signals/bc0d110a-c900-4621-83ca-9d19c425ee63","signal_json_url":"https://onlylabs.fyi/signals/bc0d110a-c900-4621-83ca-9d19c425ee63/signal.json","text":"repo_new · wafer-ai/wafer-docs · signal_desk=repos · occurred_at=2026-05-07T02:27:44+00:00 · url=https://github.com/wafer-ai/wafer-docs · stars=3 · raw={\"repo\":\"wafer-ai/wafer-docs\",\"description\":\"Public Mintlify documentation for Wafer\",\"language\":\"MDX\"}"},{"ref":"E2","kind":"event","title":"wafer-ai/gpu-perf-engineering-resources","date":"2026-01-12T00:47:24+00:00","date_source":"source","source_url":"https://github.com/wafer-ai/gpu-perf-engineering-resources","signal_url":"https://onlylabs.fyi/signals/98fa4bb3-8d05-48e0-b5da-4f1361d4eae6","signal_json_url":"https://onlylabs.fyi/signals/98fa4bb3-8d05-48e0-b5da-4f1361d4eae6/signal.json","text":"repo_new · wafer-ai/gpu-perf-engineering-resources · signal_desk=repos · occurred_at=2026-01-12T00:47:24+00:00 · url=https://github.com/wafer-ai/gpu-perf-engineering-resources · stars=819 · raw={\"repo\":\"wafer-ai/gpu-perf-engineering-resources\",\"description\":\"A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do\"}"},{"ref":"E3","kind":"event","title":"wafer-ai/kernel-arena","date":"2026-03-10T21:23:03+00:00","date_source":"source","source_url":"https://github.com/wafer-ai/kernel-arena","signal_url":"https://onlylabs.fyi/signals/b981ee71-2193-43bc-9769-4abb9b1f70a1","signal_json_url":"https://onlylabs.fyi/signals/b981ee71-2193-43bc-9769-4abb9b1f70a1/signal.json","text":"repo_new · wafer-ai/kernel-arena · signal_desk=repos · occurred_at=2026-03-10T21:23:03+00:00 · url=https://github.com/wafer-ai/kernel-arena · stars=19 · raw={\"repo\":\"wafer-ai/kernel-arena\",\"description\":\"Public benchmark results from Kernel Arena, a leaderboard for LLM-generated AI accelerator kernels.\",\"language\":\"Python\"}"},{"ref":"E4","kind":"event","title":"wafer-ai/chipbenchmark","date":"2025-07-13T04:17:39+00:00","date_source":"source","source_url":"https://github.com/wafer-ai/chipbenchmark","signal_url":"https://onlylabs.fyi/signals/a492fd1e-fb7e-4137-81d5-b5285e6611a9","signal_json_url":"https://onlylabs.fyi/signals/a492fd1e-fb7e-4137-81d5-b5285e6611a9/signal.json","text":"repo_new · wafer-ai/chipbenchmark · signal_desk=repos · occurred_at=2025-07-13T04:17:39+00:00 · url=https://github.com/wafer-ai/chipbenchmark · stars=17 · raw={\"repo\":\"wafer-ai/chipbenchmark\",\"description\":\"a platform for monitoring the chip situation\",\"language\":\"Shell\"}"},{"ref":"E5","kind":"event","title":"wafer-ai/HIP-Benchmarks-Results","date":"2026-01-23T20:25:53+00:00","date_source":"source","source_url":"https://github.com/wafer-ai/HIP-Benchmarks-Results","signal_url":"https://onlylabs.fyi/signals/e5809722-e737-4514-9028-c20115d3d70a","signal_json_url":"https://onlylabs.fyi/signals/e5809722-e737-4514-9028-c20115d3d70a/signal.json","text":"repo_new · wafer-ai/HIP-Benchmarks-Results · signal_desk=repos · occurred_at=2026-01-23T20:25:53+00:00 · url=https://github.com/wafer-ai/HIP-Benchmarks-Results · stars=2 · raw={\"repo\":\"wafer-ai/HIP-Benchmarks-Results\",\"description\":\"Traces and Kernels of our LLM generated HIP benchmarks\",\"language\":\"Python\"}"},{"ref":"E6","kind":"event","title":"wafer-ai/skills","date":"2026-01-24T04:50:24+00:00","date_source":"source","source_url":"https://github.com/wafer-ai/skills","signal_url":"https://onlylabs.fyi/signals/64650e2b-9906-4c8d-b0f6-3c1683d0e3e6","signal_json_url":"https://onlylabs.fyi/signals/64650e2b-9906-4c8d-b0f6-3c1683d0e3e6/signal.json","text":"repo_new · wafer-ai/skills · signal_desk=repos · occurred_at=2026-01-24T04:50:24+00:00 · url=https://github.com/wafer-ai/skills · stars=1 · raw={\"repo\":\"wafer-ai/skills\"}"},{"ref":"E7","kind":"event","title":"wafer-ai/aiter","date":"2026-01-26T13:32:34+00:00","date_source":"source","source_url":"https://github.com/wafer-ai/aiter","signal_url":"https://onlylabs.fyi/signals/188e7a49-35e7-40e6-b963-fbae4d3f17ed","signal_json_url":"https://onlylabs.fyi/signals/188e7a49-35e7-40e6-b963-fbae4d3f17ed/signal.json","text":"repo_forked · wafer-ai/aiter · signal_desk=forks · occurred_at=2026-01-26T13:32:34+00:00 · url=https://github.com/wafer-ai/aiter · raw={\"repo\":\"wafer-ai/aiter\",\"parent\":\"ROCm/aiter\"}"},{"ref":"E8","kind":"event","title":"wafer-ai/modular","date":"2026-01-22T05:04:35+00:00","date_source":"source","source_url":"https://github.com/wafer-ai/modular","signal_url":"https://onlylabs.fyi/signals/372d0def-c664-4972-88cb-9cf44a923d61","signal_json_url":"https://onlylabs.fyi/signals/372d0def-c664-4972-88cb-9cf44a923d61/signal.json","text":"repo_forked · wafer-ai/modular · signal_desk=forks · occurred_at=2026-01-22T05:04:35+00:00 · url=https://github.com/wafer-ai/modular · raw={\"repo\":\"wafer-ai/modular\",\"parent\":\"modular/modular\"}"},{"ref":"E9","kind":"event","title":"wafer-ai/composable_kernel","date":"2026-01-22T05:04:34+00:00","date_source":"source","source_url":"https://github.com/wafer-ai/composable_kernel","signal_url":"https://onlylabs.fyi/signals/cb271ef1-751d-420a-8e87-c810a7c02055","signal_json_url":"https://onlylabs.fyi/signals/cb271ef1-751d-420a-8e87-c810a7c02055/signal.json","text":"repo_forked · wafer-ai/composable_kernel · signal_desk=forks · occurred_at=2026-01-22T05:04:34+00:00 · url=https://github.com/wafer-ai/composable_kernel · raw={\"repo\":\"wafer-ai/composable_kernel\",\"parent\":\"ROCm/composable_kernel\"}"}]}