ReleaseNVIDIANVIDIApublished May 13, 2026seen 5d

NVIDIA/aistore v1.4.5

NVIDIA/aistore

Open original ↗

Captured source

source ↗
published May 13, 2026seen 5dcaptured 9hhttp 200method plain

4.5

Repository: NVIDIA/aistore

Tag: v1.4.5

Published: 2026-05-13T23:04:15Z

Prerelease: no

Release notes: AIStore 4.5 is a focused release on three major areas: [global rebalance](#global-rebalance) scalability, indexed archive access via the new [shard index](#shard-index), and a [get-batch](#get-batch) flow reordering that substantially reduces memory pressure under load.

The work on global rebalance is one of the headlines. AIS removes the legacy per-object ACK machinery that had become a scalability ceiling for very large rebalances (counting millions of migrated objects), replaces the cleanup behavior that depended on that machinery with a new explicit cleanup mode, and optimizes the lifecycle around data movers, transport endpoints, stage transitions, and peer status queries.

Remote list-objects (R-flow) is another practical driver for this release. Multi-target remote-bucket listing now starts the (N - 1) targets _before_ the designated target (DT) begins backend listing and page distribution. The same area also gets flow-control cleanup, corrected page accounting, and a dedicated CLI job view for remote-list xactions.

The shard index is a new experimental subsystem for indexed extraction from TAR shards. It lets GET and [get-batch](#get-batch) read files from TAR shards directly using a persisted index instead of scanning the full archive. The 4.5 implementation includes the index format and binary pack/unpack support, persistence in a system bucket, a bucket-scoped indexing xaction, CLI support, read-path integration, tests, and a micro-benchmark.

Streaming get-batch responses now use explicit write deadlines while sending data to the client. This lets AIS detect terminated, stalled, or unreachable clients promptly and abort the request instead of continuing to assemble and transmit a large batch that no client is still reading. The flow was reordered and optimized, work-item cancellation now propagates across senders, and admission control is stricter under load.

[Authentication and access control](#authn) adds support for externally provisioned RSA signing keys, JWKS refresh on cache miss for rotated key pairs, configurable maximum token age, and a more general signing-key configuration. Intra-cluster request validation is also tightened: spoofed caller headers are rejected on the public network, and internal-network checks require a validated Smap entry.

CLI and observability gain several user-visible improvements: dynamic ais show cluster cpu and ais show cluster memory views, a new ais performance intra-data view, shard-index commands, better rebalance rendering including cleanup mode, and improved help for force-join / split-brain recovery workflows.

This release preserves backward compatibility. The few additive API fields, configurable behavior changes, and operational migration notes are summarized in [Upgrade Notes](#upgrade-notes).

---

Table of Contents

1. [Global Rebalance](#global-rebalance) 2. [Shard Index](#shard-index) 3. [Get-Batch](#get-batch) 4. [Intra-Cluster Control Plane](#intra-cluster-control-plane) 5. [AuthN](#authn) 6. [Stats and Observability](#stats-and-observability) 7. [Blob Downloader and Prefetch](#blob-downloader-and-prefetch) 8. [CLI](#cli) 9. [Core, Config, and xactions](#core-config-and-xactions) 10. [Python SDK, ETL, and aisloader](#python-sdk-etl-and-aisloader) 11. [Documentation and Website](#documentation-and-website) 12. [Build, CI, and Tools](#build-ci-and-tools) 13. [Upgrade Notes](#upgrade-notes)

---

Global Rebalance

AIStore 4.5 delivers the largest rebalance update in several release cycles. The main theme is replacing per-object state with stage-level coordination and explicit cleanup semantics.

Per-object ACKs removed

Rebalance no longer tracks individual object acknowledgments. The previous mechanism - ACK messages back to the sender, sender-side maps of unacknowledged objects, retransmit and wait loops, and an ACK-driven lazy-delete path - has been removed in favor of stage-level coordination.

In the new model, traversal sends objects without keeping per-object state on the sender, and a post-traverse barrier takes the place of the old wait-for-ACK drain. Intra-cluster transport headers carry a compact opaque payload with the rebalance generation, so receivers can recognize and reject objects that arrive late from a previous run. Object and byte totals reported by ais show rebalance and ais show job now come directly from the transmitted counters rather than from ACK accounting.

The practical effect is lower memory pressure and simpler lifecycle behavior during large rebalances, where per-object ACK state had become the limiting factor.

Cleanup mode

Removing per-object ACKs also removed the old incidental mechanism that trimmed misplaced source copies after migration. AIStore 4.5 adds an explicit replacement: rebalance cleanup mode.

Cleanup mode walks local mountpaths, identifies misplaced object copies using Smap HRW ownership, and removes a misplaced copy only after verifying that the canonical copy exists in the expected location with matching object identity. Identity checks include size and checksum, and use version / ETag when available.

CLI:

$ ais start rebalance --cleanup
$ ais start rebalance --cleanup --force

By default, cleanup mode keeps diverged copies. With --force, it can also remove copies that differ from the canonical peer version. This is an advanced operator option.

Cleanup mode is intentionally distinct from regular rebalance:

  • it has its own preflight checks;
  • it refuses to start while rebalance or resilver is active;
  • it requires at least two active targets;
  • it bypasses config.Rebalance.Enabled;
  • it uses no data mover, no streams, and no GFN;
  • it skips EC-enabled buckets and busy objects.

ais show job and ais show rebalance render cleanup-mode runs with a dedicated view that reports removed objects and bytes rather than migration TX/RX counters.

Lifecycle and transport

A series of lifecycle changes makes rebalance more robust through abort, preempt, renew, and finalization paths: fresh data mover construction per run, safer handling of duplicate transport endpoints after abort, narrower mutex scope in the finalization path, a consistent same-targets predicate across preempt and renew, and corrected stage-reached detection.

One change is operator-visible: rebalance CtlMsg now carries per-stage…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Routine patch release of AI storage tool.