RepoNVIDIANVIDIApublished Nov 22, 2024seen 5d

NVIDIA/nodewright

Go

Open original ↗

Captured source

source ↗
published Nov 22, 2024seen 5dcaptured 10hhttp 200method plain

NVIDIA/nodewright

Description: A Kubernetes Operator to manage Node OS customizations.

Language: Go

License: Apache-2.0

Stars: 57

Forks: 13

Open issues: 17

Created: 2024-11-22T17:39:43Z

Pushed: 2026-06-09T23:24:48Z

Default branch: main

Fork: no

Archived: no

README:

NodeWright (formerly Skyhook)

![Pipeline Status](https://github.com/NVIDIA/skyhook/actions/workflows/operator-ci.yaml) ![Coverage Status](https://coveralls.io/github/NVIDIA/skyhook) ![Go Report Card](https://goreportcard.com/report/github.com/NVIDIA/nodewright/operator)

NodeWright is a Kubernetes-aware package manager for cluster administrators to safely modify and maintain underlying host declaratively at scale.

> Note: NodeWright is being renamed from Skyhook. The Helm chart and operator image are already published under nodewright (see the install command below). CRDs (skyhook.nvidia.com/v1alpha1), the CLI (kubectl skyhook), and the default install namespace (skyhook) still use skyhook for now to avoid breaking existing users. The rename will roll out incrementally. > > Distribution change (v0.16.0+): NodeWright is now distributed exclusively through GitHub Container Registry (ghcr.io) — both the container images and the Helm chart (as an OCI artifact). Publication to nvcr.io / the NGC Helm repository (helm.ngc.nvidia.com) is paused and is planned to return in a future release. Existing users installing from NGC need to switch to the OCI install below. See [Distribution: ghcr.io only (for now)](docs/release-process.md#distribution-ghcrio-only-for-now) for the full story.

Why NodeWright?

Managing and updating Kubernetes clusters is challenging. While Kubernetes advocates treating compute as disposable, but certain scenarios make this difficult:

  • Updating hosts without re-imaging:
  • Limited excess hardware/capacity for rolling replacements
  • Long node replacement times (example can be hours in some cloud providers)
  • OS image management:
  • Maintain a common base image with workload-specific overlays instead of multiple OS images
  • Workload sensitivity:
  • Some workloads can't be moved, are difficult to move, or take a long time to migrate

What is NodeWright?

NodeWright functions like a package manager but for your entire Kubernetes cluster, with three main components:

1. NodeWright Operator - Manages installing, updating, and removing packages 2. NodeWright Custom Resource - Declarative definitions of changes to apply 3. Packages - The actual modifications you want to implement

Where and When to use NodeWright

NodeWright works in any Kubernetes environment (self-managed, on-prem, cloud) and shines when you need:

  • Kubernetes-aware scheduling that protects important workloads
  • Rolling or simultaneous updates across your cluster
  • Declarative configuration management for host-level changes

Benefits

  • Native Kubernetes integration - Packages are standard Kubernetes resources compatible with GitOps tools like ArgoCD, Helm, and Flux
  • Autoscaling support - Ensure newly created nodes are properly configured before schedulable
  • First-class upgrades - Deploys changes with minimal disruption, waiting for running workloads to complete when needed

Key Features

  • Interruption Budget: percent of nodes or count
  • Node Selectors: selectors for which nodes to apply too (node labels)
  • Pod Non Interrupt Labels: labels for pods to never interrupt
  • Package Interrupt: service (containerd, cron, any thing systemd), or reboot
  • Additional Tolerations: are tolerations added to the packages
  • [Runtime Required](docs/runtime_required.md): requires node to come into the cluster with a taint, and will do work prior to removing custom taint.
  • Resource Management: Skyhook uses Kubernetes LimitRange to set default CPU and memory requests/limits for all containers in its namespace. You can override these defaults per-package in your Skyhook CR. Strict validation is enforced: if you set any resource override, you must set all four fields (cpuRequest, cpuLimit, memoryRequest, memoryLimit), and limits must be >= requests. See [docs/resource_management.md](docs/resource_management.md) for details and examples.
  • [Explicit Uninstall](docs/uninstall.md): controlled, explicit uninstall of packages from nodes with uninstall.enabled and uninstall.apply fields, webhook guards, finalizer-driven cleanup on CR deletion, and cancel support.

Pre-built Packages

There are a few pre-built generalist packages available at NVIDIA/skyhook-packages

Installation via Helm

Install NodeWright quickly using Helm without downloading the repository:

Prerequisites

  • Kubernetes cluster (tested on v1.30+)
  • Helm 3.x installed
  • Container registry access credentials (if using private registries)

Install NodeWright

# The chart is distributed as an OCI artifact on GitHub Container Registry.
# Helm 3.8+ supports OCI natively — no `helm repo add` needed.
helm install nodewright oci://ghcr.io/nvidia/nodewright/charts/nodewright \
--version v0.16.1 \
--namespace skyhook \
--create-namespace

> Where things live: chart at oci://ghcr.io/nvidia/nodewright/charts/nodewright, operator image at ghcr.io/nvidia/nodewright/operator, agent image at ghcr.io/nvidia/skyhook/agent (agent path migration to nodewright is pending). NGC / nvcr.io distribution is paused — see [docs/release-process.md#distribution-ghcrio-only-for-now](docs/release-process.md#distribution-ghcrio-only-for-now). > > Migrating from `helm repo add skyhook https://helm.ngc.nvidia.com/...`? Run helm repo remove skyhook and use the OCI install above. If you also want to keep the existing in-cluster release name (e.g. skyhook), substitute it for nodewright in the helm install command — the chart works either way.

Configure Image Pull Secrets (if needed)

If you're using private container registries, create the necessary secrets:

kubectl create secret generic node-init-secret \
--from-file=.dockerconfigjson=${HOME}/.docker/config.json \
--type=kubernetes.io/dockerconfigjson \…

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low traction routine repo from NVIDIA