ReleaseNVIDIANVIDIApublished Jun 15, 2026seen 1w

NVIDIA/aicr v0.15.0

NVIDIA/aicr

Open original ↗

Captured source

source ↗
published Jun 15, 2026seen 1wcaptured 1whttp 200method plain

v0.15.0

Repository: NVIDIA/aicr

Tag: v0.15.0

Published: 2026-06-15T19:54:42Z

Prerelease: no

Release notes: This release focuses on recipe health scoring, improved deployment validation, improved snapshot/discovery, and extending software supply chain capabilities for enterprise users.

Highlights

Recipe Structural Health \- New pkg/health engine computes per-recipe health signals (chart_pinned, constraints_wellformed, declared_coverage) and rolls them up into a recipe-health matrix. aicr recipe list surfaces structural-health columns (with a --no-health opt-out), a tools/health generator and weekly recipe-health-refresh workflow keep the matrix current, and a lint guard now requires healthCheck.assertFile.

Improved Deployment Validation \- The chainsaw deployment-phase runner is now an in-process executor rather than a shelled-out binary. aicr validate runs all phases by default with a --fail-fast opt-in, fails closed on evaluator errors, and is nil-safe across health checks.

Snapshot/Discovery \- The collector now discovers GPU SKUs without nvidia-smi, removing the CUDA base image dependency and matching SKUs on token boundaries instead of substrings.

Closed Supply Chain \- Signing and verification now work end-to-end in air-gapped and enterprise environments. aicr bundle supports KMS-backed signing (--signing-key) and private Sigstore deployments (--fulcio-url, --rekor-url); aicr verify --key validates bundles against a KMS or public key; and aicr evidence publish signs recipe evidence off-network. The recipe catalog itself now ships signed provenance for the V1 closed supply chain, and keyless signing warns before publishing identity to the public transparency log.

New Recipes & Overlays

  • A100 training Kubeflow overlay chains for EKS, AKS, GKE COS, and OKE
  • GB300 concrete EKS service-bound overlays
  • OKE GB200 and AKS H100 Dynamo performance checks

CLI & Bundling

  • aicr recipe list subcommand for catalog enumeration
  • Gatekeeper added as an optional component

Inference Performance & Validation

  • Inference-performance validation enhanced and tuned; gated on all worker services Ready
  • nccl-all-reduce-bw gates wired for EKS + H200; GKE NCCL node selector made dynamic
  • Bounded absent-resource retries in deployment-phase health checks

*Thanks to* @atif1996, @cdesiniotis, @dims, @haarchri, @JaydipGabani, @lalitadithya, @lockwobr, @njhensley, @pdmack, @pedjak, @rsd-darshan, @sttts, @xdu31, @yuanchen8911, and @mchmarny.

Changelog

New Features

  • cfb0cb06fbc3156c1b5cd9681f7d0826de0899df: feat(agentgateway): scope inference-gateway LB to allowed source ranges (#1138) (@yuanchen8911)
  • 3b8b1f31fb75a25a7c62d3b0e814d5d4e0da5d95: feat(bundle): KMS-backed signing via --signing-key (#407) (#1205) (@lockwobr)
  • 5316d3c08d9274caadb265c52293fd861378d74a: feat(bundle): private Sigstore via --fulcio-url and --rekor-url (#1158) (@lockwobr)
  • b847f968144f3e0290d67cf1a3828337095319c5: feat(bundler): retry sign.Bundle on transient Sigstore failures (#1251) (@mchmarny)
  • 3c1f525d76741dae40d9996aafd6b02b3224d8e9: feat(bundler): warn on open agentgateway inference-gateway exposure (#1163) (@yuanchen8911)
  • cd30fe5630d06ef8c091fa9586f9fa589aa0bc70: feat(ci): add weekly recipe-health-refresh workflow (#1320) (@njhensley)
  • 5f8647dd3ae68ddb1f5e25eed5d63d302060333f: feat(cli): add --no-health opt-out to recipe list (#1314) (@njhensley)
  • f0490fbf1e575098da54fd0d962768173708f6b4: feat(cli): add --set-json/--set-file for list and object bundle overrides (#1162) (@yuanchen8911)
  • b401339ca6531e3d2d1ac4159cadba6de4097d23: feat(cli): add structural-health columns to recipe list (#1302) (@njhensley)
  • a47a53fa469747da3735137711d2dd3cc6eb9838: feat(cli): warn before keyless signing publishes identity to public log (#1300) (@njhensley)
  • 97934ad1773b203e6f2be25eac91f26b5ca1180e: feat(collector): driver-free GPU SKU discovery; remove nvidia-smi + CUDA base (#1352) (@mchmarny)
  • eb8728d3029112c5dc8f4b3d5f63cf8b333ef0f2: feat(coverage): generated CUJ/CLI coverage matrix (RQ3) (#1316) (@mchmarny)
  • 1674ea4bd49c4bfd6e9551444c57c57d4efed352: feat(evidence): add aicr evidence publish for off-network signing (#1140) (@njhensley)
  • e1b01600fd687ed391d288cea165fe5b52f6d863: feat(health): add tools/health generator and recipe-health matrix (#1304) (@njhensley)
  • 8d0da78521ce2da7e2a628ab1f77665630e8dd44: feat(health): chart_pinned signal + declared_coverage descriptor (#1293) (@mchmarny)
  • fa92b4375bbaf006216f5fe3b33a9b2464149a7b: feat(health): constraints_wellformed signal (parse-only, hermetic) (#1301) (@njhensley)
  • 10b08f1c6c72029071a39d8ae187f97320ee9cfa: feat(health): pkg/health core Compute loop, resolves signal, rollup (#1291) (@mchmarny)
  • 3e2f8230e18cbfbf997a7d3a93edf386b167aaf8: feat(recipe): add AKS H100 Dynamo perf check (#1232) (@yuanchen8911)
  • 8f8bc56a00f9ec00f3c355efdb1e6b598c4dfa1d: feat(recipe): add OKE GB200 perf check (#1233) (@yuanchen8911)
  • ec95e2001498624aa990d908533f1d17e8c72118: feat(recipe): add aicr recipe list subcommand for catalog enumeration (#1208) (@rsd-darshan)
  • 57fbed0f8e55daf5199e429a28309c1765d4c078: feat(recipe): hydrate healthCheck.assertFile + suppression sentinel (#1231) (@mchmarny)
  • ae6819c6c71b14c91ac26b7cdf44eae86f9e6c2b: feat(recipe): lint guard requiring healthCheck.assertFile + allowlist (#1244) (@mchmarny)
  • 0bf2267c1f74cee213decd884f45cec2a5f0bf93: feat(recipe): signed catalog provenance for V1 closed supply chain (#1216) (@mchmarny)
  • 463d6a1997ab40222975d243d2f6fcc16a0c1de8: feat(recipes): add A100 AKS training Kubeflow overlay chain (#1295) (@yuanchen8911)
  • d8d3070292ca6662d6a31a9993531cb5b257fc04: feat(recipes): add A100 EKS training Kubeflow overlay chain (#1305) (@yuanchen8911)
  • fd64dd7a55aae45862198aabcc180d3be25bbbba: feat(recipes): add A100 GKE COS training Kubeflow overlay chain (#1306) (@yuanchen8911)
  • 6eb85ac43ae6030d223ec5a36617c34fd05368c2: feat(recipes): add A100 OKE training Kubeflow overlay chain (#1294) (@yuanchen8911)
  • 4b817ce56c550115a1d319676f8ea3df6ea33721: feat(recipes): add concrete GB300 EKS service-bound overlays (#1319) (@yuanchen8911)
  • cad014233eb63d9c033c4cc1d304239ce282d82e: feat(recipes): backfill chainsaw health checks for 5 missing components (#1243) (@mchmarny)
  • 81daab3b13e607ed98dc9a496b63ad11b0590d26: feat(recipes): deepen 21 chainsaw health checks; close epic #660 (#1245) (@mchmarny)

*...

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

Minor version release of NVIDIA's AICR library