RepoCloudflare (Workers AI)Cloudflare (Workers AI)published May 5, 2025seen 5d

cloudflare/udpgrm

C

Open original ↗

Captured source

source ↗
published May 5, 2025seen 5dcaptured 10hhttp 200method plain

cloudflare/udpgrm

Description: UDP Graceful Restart Marshal

Language: C

License: Apache-2.0

Stars: 150

Forks: 7

Open issues: 3

Created: 2025-05-05T09:07:31Z

Pushed: 2026-04-23T20:53:46Z

Default branch: main

Fork: no

Archived: no

README: UDP Graceful Restart Marshal =============================

It's difficult to support zero-downtime, graceful restarts in modern UDP application.

Historically, UDP was designed for simple single-packet request/response protocols like DNS or NTP, where graceful restarts were not a problem. Modern UDP services like QUIC, Masque, WireGuard, SIP, or games hold flow state that shouldn't be lost on restart. Passing state between application instances is usually hard to do safely.

One solution is to borrow semantics from TCP servers: when an application restarts, new flows are sent to the new instance, while old flows keep going to the old one and gradually drain. After a timeout or when all flows end, the old instance exits. There are two ways to achieve this.

The first is the established-over-unconnected technique. It has two major issues: it is racy for some protocols (when the handshake uses more than one packet), and it has a performance cost (kernel hash table conflicts are likely at scale, as the hash bucket is based only on the local two-tuple).

Another solution is to utilize Linux REUSEPORT API. A REUSEPORT socket group can contain sockets from both old and new application instances. A correctly set REUSEPORT eBPF program can, based on some flow tracking logic, direct packets to an appropriate UDP socket and maintain flow stickiness. This is what *udpgrm* does.

What is a reuseport group =========================

Sockets with the SO_REUSEPORT flag can share a local port tuple, like 192.0.2.0:443.

┌───────────────────────────────────────────┐
│ reuseport group 192.0.2.0:443 │
│ ┌───────────┐ ┌───────────┐ ┌───────────┐ │
│ │ socket #1 │ │ socket #2 │ │ socket #3 │ │
│ └───────────┘ └───────────┘ └───────────┘ │
└───────────────────────────────────────────┘

For sockets to create a reuseport group, they need to share:

  • ip:port pair
  • BINDTODEVICE, SO_REUSEPORT and ipv6_only settings
  • network namespace
  • owner id

Udpgrm ======

Udpgrm is a lightweight software daemon that sets up REUSEPORT group eBPF program and cgroup hooks on getsockopt, setsockopt and sendmsg syscalls. It has two main goals:

  • steer new flows to sockets belonging to a "new application" instance
  • preserve flow affinity, to avoid disturbing the old flows

An eBPF program can be installed on a REUSEPORT group to implement custom load balancing logic with SO_ATTACH_REUSEPORT_EBPF. It can direct packets to specific sockets within the group. udpgrm builds on this and loads its own custom REUSEPORT eBPF program.

Udpgrm concepts ===============

Before we can explain the API we need to discuss some udpgrm concepts.

Generations -----------

Within REUSEPORT, udpgrm groups sockets into sets called "generations", identified by an unsigned integer. Each generation represents one instance of the application. If a generation contains multiple UDP sockets, new flows are balanced across them like in a standard REUSEPORT setup. A socket may also have no assigned generation number.

┌───────────────────────────────────────────────────┐
│ reuseport group 192.0.2.0:443 │
│ ┌─────────────────────────────────────────────┐ │
│ │ socket generation 0 │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ │ socket #1 │ │ socket #2 │ │ socket #3 │ │ │
│ │ └───────────┘ └───────────┘ └───────────┘ │ │
│ └─────────────────────────────────────────────┘ │
│ ┌─────────────────────────────────────────────┐ │
│ │ socket generation 1 │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ │ socket #4 │ │ socket #5 │ │ socket #6 │ │ │
│ │ └───────────┘ └───────────┘ └───────────┘ │ │
│ └─────────────────────────────────────────────┘ │
│ sockets with unassigned generation │
│ ┌───────────┐ ┌───────────┐ │
│ │ socket #0 │ │ socket #8 │ │
│ └───────────┘ └───────────┘ │
└───────────────────────────────────────────────────┘

Udpgrm maintains a pointer to generation that is accepting new flows, this is called "working generation". This is supposed to point to sockets belonging to the newest application instance.

┌──────────────────────────────────────────────┐
│ reuseport group 192.0.2.0:443 │
│ │
│ ... │
│ │
│ Working generation ────┐ │
│ │ │
│ ┌──────────────▼────────────────┐ │
│ │ socket generation 1 │ │
│ │ ┌───────────┐ ┌──────────┐ │ │
│ │ │ socket #4 │ │ ... │ │ │
│ │ └───────────┘ └──────────┘ │ │
│ └───────────────────────────────┘ │
│ │
│ ... │
└──────────────────────────────────────────────┘

The application assigns a socket generation and a working generation with setsockopt syscalls.

Dissectors ----------

The REUSEPORT group can change over time: sockets can come and go as the application is being restarted. To keep the stickiness of existing flows, udpgrm must preserve the flow-to-socket mapping.

Udpgrm supports three flow state management models:

  • Udpgrm can maintain a flow table. Indexed by a flow hash, it contains a target socket identifier. The size of the flow table is fixed - there is a limit to the number of concurrent flows supported by this mode.
  • A cookie-based model, where the target socket identifier - cookie - is encoded in each ingress UDP packet. For example in QUIC this identifier can be stored as part of the connection ID. The dissection logic can be expressed as cBPF code. This model does not require a flow table in udpgrm, but is harder to integrate - it requires protocol support.
  • A no-op null mode, with no state tracking at all. Useful for traditional UDP services like DNS.

These modes are called "dissectors" and are named DISSECTOR_FLOW, DISSECTOR_CBPF, DISSECTOR_NOOP accordingly.

Udpgrm API reference ====================

Probing for udpgrm cgroup hooks -------------------------------

Before the application does anything useful, it shall check if udpgrm daemon is working properly. There are three conditions that must be met: cgroup hooks must be installed, pid and network namespaces must match.

The first condition - cgroup hooks - can be verified by calling getsockopt(UDP_GRM_WORKING_GEN) on a UDP socket:

sd = socket.socket(AF_INET, SOCK_DGRAM, 0)
sd.getsockopt(IPPROTO_UDP,…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New Cloudflare repo, moderate stars