NVIDIA/harbor-reef-operator
Go
Captured source
source ↗NVIDIA/harbor-reef-operator
Description: Kubernetes Operator for Harbor Reef Image Cache
Language: Go
License: Apache-2.0
Stars: 4
Forks: 0
Open issues: 0
Created: 2026-01-22T12:58:35Z
Pushed: 2026-06-03T20:31:10Z
Default branch: main
Fork: no
Archived: no
README:
harbor-reef-operator
Kubernetes operator for the Harbor Reef image caching system. It provides two controllers:
1. Pod Fallback Controller -- Reverts Pod container images from Harbor cache back to original upstream when Pods enter ImagePullBackOff/ErrImagePull. 2. ProxyCache Controller -- Reconciles ProxyCache custom resources to declaratively manage Harbor registry endpoints and proxy-cache projects.
Project Structure
The operator follows a kubebuilder-style Go package layout:
main.go # Manager bootstrap, scheme registration, controller wiring pkg/ apis/v1alpha1/ # API resource definitions (ProxyCache CRD types) controller/ pod/reconciler.go # Pod fallback controller proxycache/reconciler.go # ProxyCache controller harbor/client.go # Harbor v2.0 REST API client Dockerfile helm-charts/ harbor-reef-operator/ # Operator Helm chart (CRDs, RBAC, Deployment)
Pod Fallback Controller
Requirements
- Pod annotated with original upstream images using the
harbor.rewrite/original-upstreamsannotation. - The annotation value is a JSON object mapping container names to their original upstream images:
metadata:
annotations:
harbor.rewrite/original-upstreams: '{"my-container": "docker.io/library/nginx:latest", "sidecar": "docker.io/library/alpine"}'Recommended to use a policy engine such as Kyverno to annotate pods with this annotation on CREATE
Features
- Watches Pods using controller-runtime informers
- Controller caches pod status and uses event handling to watch pod events
- Selective patching: Only patches containers that are actually in ImagePullBackOff/ErrImagePull state, preventing unnecessary restarts of healthy containers
- Incremental patching: Can re-process pods when new containers enter ImagePullBackOff (e.g., main containers after init containers complete)
- Per-container idempotency: Tracks which containers have been patched in
harbor-reef/patched-containersannotation to ensure each container is only patched once (prevents loops) - Adds audit annotation
harbor-reef/patchedwith timestamp for logging purposes
Cache and event handling
Uses controller-runtime's shared informer cache. The manager starts informers for Pod resources scoped by WATCH_NAMESPACE (comma-separated) or cluster-wide when unset. The cache maintains a consistent local store synchronized via Kubernetes watch streams, not by repeatedly polling all Pods. The manager registers a predicate that filters update events, only reconciling when a Pod transitions into ImagePullBackOff or ErrImagePull. This means the manager receives events from the API server and reacts; it does not loop over all Pods to check status.
API impact:
- Steady-state uses a long-lived LIST+WATCH per watched namespace (or cluster) managed by the cache, minimizing repeated API calls.
- Reconciles generally perform a single
Getfor the target Pod from the cache client path and issue one JSONPatchPatchcall only when an update is needed. - Scope can be reduced with
WATCH_NAMESPACEto limit cache size and watch traffic when only certain namespaces are relevant.
Startup sequence
- Register Prometheus metrics (
pods_upstream_patched_total,reconcile_errors_total,reconcile_duration_seconds). - Read
WATCH_NAMESPACE. If set, watch only the listed namespaces; otherwise scope is cluster-wide. - Create controller-runtime manager with the configured cache scope.
- Construct
pod.Reconcilerand callSetupWithManagerto register a controller that watches Pod updates with a predicate. Only transitions intoImagePullBackOff/ErrImagePulltrigger reconciles. - If
HARBOR_URLis set, constructproxycache.Reconcilerand callSetupWithManagerto watchProxyCachecustom resources (see below). - Start the manager (runs informers and controller workers; stops on SIGTERM/SIGINT).
When a Pod enters ImagePullBackOff/ErrImagePull
- Fetch the
Podbynamespace/name. Ignore not found. - Exit early if:
- Pod is deleting (
metadata.deletionTimestampset), or - Pod has no annotations, or
- Pod is no longer in
ImagePullBackOff/ErrImagePull. - Identify which specific containers are in
ImagePullBackOff/ErrImagePullstate. - Log detection (includes names of waiting containers).
- Read the
harbor-reef/patched-containersannotation to get the list of containers already patched. - Build a single JSON6902 patch:
- Parse the
harbor.rewrite/original-upstreamsJSON annotation to get container→image mappings. - For each container and initContainer that is currently in ImagePullBackOff/ErrImagePull:
- Skip if the container was already patched (listed in
patched-containersannotation). - Otherwise, add a
replaceop for/spec/(init)containers//imageto the upstream image. - If no replace ops were added, stop (nothing to patch).
- Update
harbor-reef/patched-containersannotation with the combined list of all patched containers. - Update
harbor-reef/patchedannotation with the current UTC timestamp. - Apply the JSON patch to the Pod. On error, requeue after 15s.
- On success:
- Log which containers were patched and to which upstream images.
- Increment
pods_upstream_patched_totalPrometheus counter for each patched container with labels:patched_kube_namespace,patched_pod_name,patched_container_name,patched_image.
Note: The operator can process the same pod multiple times if different containers enter ImagePullBackOff at different times (e.g., init containers fail first, then main containers fail after init completes). Each container is only patched once - the patched-containers annotation ensures idempotency and prevents loops even if the original upstream also fails.
Prometheus Metrics
The operator exposes the following metrics on the controller-runtime metrics endpoint (default port 8080):
| Metric | Type | Labels | Description | |--------|------|--------|-------------| | pods_upstream_patched_total | Counter | patched_kube_namespace, patched_pod_name, patched_container_name, patched_image | Total number of pod containers patched to use original…
Excerpt shown — open the source for the full document.
Notability
notability 1.0/10Low stars, new repo not notable.