RepoNVIDIANVIDIApublished Dec 14, 2017seen 5d

NVIDIA/aistore

Go

Open original ↗

Captured source

source ↗
published Dec 14, 2017seen 5dcaptured 10hhttp 200method plain

NVIDIA/aistore

Description: AIStore: scalable storage for AI applications

Language: Go

License: MIT

Stars: 1877

Forks: 264

Open issues: 10

Created: 2017-12-14T01:07:30Z

Pushed: 2026-06-11T03:29:58Z

Default branch: main

Fork: no

Archived: no

README: AIStore: High-Performance, Scalable Storage for AI Workloads

!Go Report Card

AIStore (AIS) is a lightweight distributed storage stack tailored for AI applications. It's an elastic cluster that can grow and shrink at runtime and can be ad-hoc deployed, with or without Kubernetes, anywhere from a single Linux machine to a bare-metal cluster of any size. Built from scratch, AIS provides linear scale-out, consistent performance, and a flexible deployment model.

AIS is a reliable storage cluster that can natively operate on both in-cluster and remote data, without treating either as a cache.

AIS consistently shows balanced I/O distribution and linear scalability across an arbitrary number of clustered nodes. The system supports fast data access, reliability, and rich customization for data transformation workloads.

Features

  • Multi-Cloud Access: Seamlessly access and manage content across multiple [cloud backends](/docs/overview.md#at-a-glance) (including AWS S3, GCS, Azure, and OCI), with fast-tier performance, configurable redundancy, and namespace-aware bucket identity (same-name buckets can coexist across accounts, endpoints, and providers).
  • Deploy Anywhere: AIS runs on any Linux machine, virtual or physical. Deployment options range from a minimal container-based deployment and Google Colab to petascale Kubernetes clusters. There are no built-in limitations on deployment size or functionality.
  • High Availability: Redundant control and data planes. Self-healing, end-to-end protection, n-way mirroring, and erasure coding. Arbitrary number of lightweight access points (AIS proxies).
  • HTTP-based API: A feature-rich, native API (with user-friendly SDKs for Go and Python), and compliant [Amazon S3 API](/docs/s3compat.md) for running unmodified S3 clients.
  • Monitoring: Comprehensive observability with integrated Prometheus metrics, Grafana dashboards, detailed logs with configurable verbosity, and CLI-based performance tracking for complete cluster visibility and troubleshooting. See [AIStore Observability](/docs/monitoring-overview.md) for details.
  • Chunked Objects: High-performance chunked object representation, with independently retrievable chunks, metadata v2, and checksum-protected manifests. Supports rechunking, parallel reads, and seamless integration with [Get-Batch](/docs/get_batch.md), [blob-downloader](/docs/blob_downloader.md), and multipart uploads to supported cloud backends.
  • JWT Authentication and Authorization: [Validates request JWTs](/docs/auth_validation.md) to provide cluster- and bucket-level access control using static keys or dynamic OIDC issuer JWKS lookup.
  • Secure Redirects: Configurable cryptographic signing of redirect URLs using HMAC-SHA256 with a versioned cluster key (distributed via metasync, stored in memory only).
  • Load-Aware Throttling: Dynamic request throttling based on a multi-dimensional load vector (CPU, memory, disk, file descriptors, goroutines) to protect AIS clusters under stress.
  • Unified Namespace: Attach AIS clusters together to provide unified access to datasets across independent clusters, allowing users to reference shared buckets with cluster-specific identifiers.
  • Turn-key Cache: In addition to robust data protection features, AIS offers a per-bucket configurable LRU-based cache with eviction thresholds and storage capacity watermarks.
  • ETL Offload: Execute I/O intensive data transformations [close to the data](/docs/etl.md), either inline (on-the-fly as part of each read request) or offline (batch processing, with the destination bucket populated with transformed results).
  • Get-Batch: Retrieve multiple objects and/or [archived files](/docs/archive.md) with a single call. Designed for ML/AI pipelines, [Get-Batch](/docs/get_batch.md) fetches an entire training batch in one operation, assembling a TAR (or other supported [serialization formats](/docs/archive.md)) that contains all requested items in the exact user-specified order (paper).
  • Data Consistency: Guaranteed [consistency](/docs/terminology.md#read-after-write-consistency) across all gateways, with [write-through](/docs/terminology.md#write-through) semantics in presence of [remote backends](/docs/terminology.md#backend-provider).
  • Serialization & Sharding: Native, first-class support for TAR, TGZ, TAR.LZ4, and ZIP [archives](/docs/archive.md) for efficient storage and processing of small-file datasets. Features include seamless integration with existing unmodified workflows across all APIs and subsystems.
  • Kubernetes: For production, AIS runs natively on Kubernetes. The dedicated ais-k8s repository includes the AIS K8s Operator, Ansible playbooks, Helm charts, and deployment guidance.
  • Batch Jobs: More than 30 cluster-wide [batch operations](/docs/batch.md) that you can start, monitor, and control otherwise. The list currently includes:
$ ais show job --help

NAME:
archive blob-download cleanup copy-bucket copy-objects delete-objects
download dsort ec-bucket ec-get ec-put ec-resp
elect-primary etl-bucket etl-inline etl-objects evict-objects evict-remote-bucket
get-batch list lru-eviction mirror prefetch-objects promote-files
put-copies rebalance rechunk rename-bucket resilver summary
warm-up-metadata

> The feature set continues to grow and also includes: [native bucket inventory (NBI)](/docs/nbi.md); [blob-downloader](/docs/blob_downloader.md); [AuthN - authentication and authorization server](/docs/authn.md); runtime management of [TLS certificates](/docs/cli/x509.md); full support for [adding/removing nodes at runtime](/docs/lifecycle_node.md); adaptive [rate limiting](/docs/rate_limit.md); and…

Excerpt shown — open the source for the full document.