NVIDIA/nodewright
Go
Captured source
source ↗NVIDIA/nodewright
Description: A Kubernetes Operator to manage Node OS customizations.
Language: Go
License: Apache-2.0
Stars: 57
Forks: 13
Open issues: 17
Created: 2024-11-22T17:39:43Z
Pushed: 2026-06-09T23:24:48Z
Default branch: main
Fork: no
Archived: no
README:
NodeWright (formerly Skyhook)
  
NodeWright is a Kubernetes-aware package manager for cluster administrators to safely modify and maintain underlying host declaratively at scale.
> Note: NodeWright is being renamed from Skyhook. The Helm chart and operator image are already published under nodewright (see the install command below). CRDs (skyhook.nvidia.com/v1alpha1), the CLI (kubectl skyhook), and the default install namespace (skyhook) still use skyhook for now to avoid breaking existing users. The rename will roll out incrementally. > > Distribution change (v0.16.0+): NodeWright is now distributed exclusively through GitHub Container Registry (ghcr.io) — both the container images and the Helm chart (as an OCI artifact). Publication to nvcr.io / the NGC Helm repository (helm.ngc.nvidia.com) is paused and is planned to return in a future release. Existing users installing from NGC need to switch to the OCI install below. See [Distribution: ghcr.io only (for now)](docs/release-process.md#distribution-ghcrio-only-for-now) for the full story.
Why NodeWright?
Managing and updating Kubernetes clusters is challenging. While Kubernetes advocates treating compute as disposable, but certain scenarios make this difficult:
- Updating hosts without re-imaging:
- Limited excess hardware/capacity for rolling replacements
- Long node replacement times (example can be hours in some cloud providers)
- OS image management:
- Maintain a common base image with workload-specific overlays instead of multiple OS images
- Workload sensitivity:
- Some workloads can't be moved, are difficult to move, or take a long time to migrate
What is NodeWright?
NodeWright functions like a package manager but for your entire Kubernetes cluster, with three main components:
1. NodeWright Operator - Manages installing, updating, and removing packages 2. NodeWright Custom Resource - Declarative definitions of changes to apply 3. Packages - The actual modifications you want to implement
Where and When to use NodeWright
NodeWright works in any Kubernetes environment (self-managed, on-prem, cloud) and shines when you need:
- Kubernetes-aware scheduling that protects important workloads
- Rolling or simultaneous updates across your cluster
- Declarative configuration management for host-level changes
Benefits
- Native Kubernetes integration - Packages are standard Kubernetes resources compatible with GitOps tools like ArgoCD, Helm, and Flux
- Autoscaling support - Ensure newly created nodes are properly configured before schedulable
- First-class upgrades - Deploys changes with minimal disruption, waiting for running workloads to complete when needed
Key Features
- Interruption Budget: percent of nodes or count
- Node Selectors: selectors for which nodes to apply too (node labels)
- Pod Non Interrupt Labels: labels for pods to never interrupt
- Package Interrupt: service (containerd, cron, any thing systemd), or reboot
- Additional Tolerations: are tolerations added to the packages
- [Runtime Required](docs/runtime_required.md): requires node to come into the cluster with a taint, and will do work prior to removing custom taint.
- Resource Management: Skyhook uses Kubernetes LimitRange to set default CPU and memory requests/limits for all containers in its namespace. You can override these defaults per-package in your Skyhook CR. Strict validation is enforced: if you set any resource override, you must set all four fields (cpuRequest, cpuLimit, memoryRequest, memoryLimit), and limits must be >= requests. See [docs/resource_management.md](docs/resource_management.md) for details and examples.
- [Explicit Uninstall](docs/uninstall.md): controlled, explicit uninstall of packages from nodes with
uninstall.enabledanduninstall.applyfields, webhook guards, finalizer-driven cleanup on CR deletion, and cancel support.
Pre-built Packages
There are a few pre-built generalist packages available at NVIDIA/skyhook-packages
Installation via Helm
Install NodeWright quickly using Helm without downloading the repository:
Prerequisites
- Kubernetes cluster (tested on v1.30+)
- Helm 3.x installed
- Container registry access credentials (if using private registries)
Install NodeWright
# The chart is distributed as an OCI artifact on GitHub Container Registry. # Helm 3.8+ supports OCI natively — no `helm repo add` needed. helm install nodewright oci://ghcr.io/nvidia/nodewright/charts/nodewright \ --version v0.16.1 \ --namespace skyhook \ --create-namespace
> Where things live: chart at oci://ghcr.io/nvidia/nodewright/charts/nodewright, operator image at ghcr.io/nvidia/nodewright/operator, agent image at ghcr.io/nvidia/skyhook/agent (agent path migration to nodewright is pending). NGC / nvcr.io distribution is paused — see [docs/release-process.md#distribution-ghcrio-only-for-now](docs/release-process.md#distribution-ghcrio-only-for-now). > > Migrating from `helm repo add skyhook https://helm.ngc.nvidia.com/...`? Run helm repo remove skyhook and use the OCI install above. If you also want to keep the existing in-cluster release name (e.g. skyhook), substitute it for nodewright in the helm install command — the chart works either way.
Configure Image Pull Secrets (if needed)
If you're using private container registries, create the necessary secrets:
kubectl create secret generic node-init-secret \
--from-file=.dockerconfigjson=${HOME}/.docker/config.json \
--type=kubernetes.io/dockerconfigjson \…Excerpt shown — open the source for the full document.
Notability
notability 3.0/10Low traction routine repo from NVIDIA