RepoNVIDIANVIDIApublished Aug 27, 2025seen 2w

NVIDIA/fleet-intelligence-agent

Go

Open original ↗

Captured source

source ↗
published Aug 27, 2025seen 2wcaptured 2whttp 200method plain

NVIDIA/fleet-intelligence-agent

Description: NVIDIA Fleet Intelligence Agent - Host agent for GPU telemetry collection and attestation

Language: Go

License: Apache-2.0

Stars: 38

Forks: 10

Open issues: 0

Created: 2025-08-27T22:10:50Z

Pushed: 2026-06-11T18:00:50Z

Default branch: main

Fork: no

Archived: no

README:

NVIDIA Fleet Intelligence Agent

NVIDIA Fleet Intelligence Agent - Host agent for GPU telemetry collection and attestation.

Built on top of leptonai/gpud

Overview

What It Monitors:

  • GPU Metrics: Power, temperature, clocks, utilization, memory, Xid events
  • System Metrics: CPU, memory, disk, network usage
  • Infrastructure: NVIDIA drivers, CUDA runtime, InfiniBand, containers

Export Formats:

  • HTTP API Server: Serves data via REST endpoints (JSON) and Prometheus metrics (/metrics)
  • File Export (Offline Mode): Writes data to local files in CSV or JSON format
  • Remote Export: Sends telemetry data to OpenTelemetry-compatible endpoints via OTLP over HTTP

Key Features:

  • Lightweight: <500MB RAM, <1% CPU usage
  • Non-intrusive: Read-only operations, no system modifications
  • Production-ready: 24/7 datacenter operation

Supported Platforms

| OS Family | Supported Versions | Architecture | GPU | |-----------|--------------------|--------------|-----| | Ubuntu | 22.04, 24.04 | x86_64, ARM64 | Ampere, Ada Lovelace, Hopper, Blackwell, Rubin | | RHEL | 8, 9, 10 | x86_64, ARM64 | Ampere, Ada Lovelace, Hopper, Blackwell, Rubin | | Rocky Linux | 8, 9, 10 | x86_64, ARM64 | Ampere, Ada Lovelace, Hopper, Blackwell, Rubin | | AlmaLinux | 8, 9, 10 | x86_64, ARM64 | Ampere, Ada Lovelace, Hopper, Blackwell, Rubin | | Amazon Linux | 2023 | x86_64, ARM64 | Ampere, Ada Lovelace, Hopper, Blackwell, Rubin |

Documentation

Important: Documentation links are relative to the branch or tag you are viewing. The default GitHub view uses main, which may describe unreleased changes. When installing or upgrading a specific agent version, switch to that version's release tag first.

  • [Helm Installation](docs/install-helm.md) - Kubernetes (Helm) installation and troubleshooting
  • [DEB Installation](docs/install-deb.md) - Ubuntu package install, update, and uninstall
  • [RPM Installation](docs/install-rpm.md) - RHEL/Rocky/Alma/Amazon package install, update, and uninstall
  • [Architecture](docs/architecture.md) - Bare metal and Kubernetes architecture, dependencies, and runtime flow
  • [Usage](docs/usage.md) - Commands, HTTP API, integration, and troubleshooting
  • [Configuration](docs/configuration.md) - Environment variables and service configuration
  • [Development](docs/development.md) - Building from source and contributing

Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup and guidelines.

Related: leptonai/gpud (upstream dependency)

License

Apache License 2.0 - see [LICENSE](LICENSE) for details.

Notability

notability 5.0/10

New AI agent repo from NVIDIA with low stars.