WritingScalewayScalewaypublished Jun 20, 2023seen 5d

How we connected over 250,000 IoT devices to the cloud

Open original ↗

Captured source

source ↗
published Jun 20, 2023seen 5dcaptured 3dhttp 200method plain

How we connected over 250,000 IoT devices to the cloud Build • Christian Hein, Inheaden • 20/06/23 • 7 min read

Inheaden recently helped one of their customers connect IoT devices to the cloud using Kubernetes Kosmos. Christian Hein, CIO at Inheaden , will be telling us how they pulled that off, the challenges they had to overcome, and the detours they had to take to get to where they needed to be. But we’ll let Chris fill you in on the details.

A customer of ours in the Internet of Things (IoT) realm produces IoT devices that send data such as GPS position, battery status, sensor data, etc. The communication with IoT devices flows via UDP, and a proprietary microservice then picks up these UDP packets and decodes them.

The requirement was quite simple: the backend should be able to send a packet to a single IoT device which turns a function on or off. While that sounds pretty simple, it wasn’t immediately possible because of the specific way the infrastructure was set up. We needed the real IP of the IoT device to control the function with a direct connection, but for various reasons, which we’ll get into below, the IP was not preserved in the transmission.

We eventually found a solution using Scaleway’s Kubernetes Kosmos and Envoy . But let’s back up for a bit to see where we started and how we got there.

Connecting IoT devices with a mobile connection and limited bandwidth

The mobile connection of the IoT devices is established with LPWAN radio technology. This radio standard, which was designed for machine-to-machine (M2M) communication, is energy efficient, can penetrate buildings, and transmits data reliably. The downside of this technology is that we have limited bandwidth available.

Because of this, overloaded protocols like TCP are not the best way to send data from a device to the backend. As HTTP/1 and HTTP/2 are based on TCP, the use of HTTPS for the connection is also not desired.

So how do these devices transmit data securely?

In the field of the LPWAN radio standard, a private Access Point Name (private APN) from the mobile network operator (MNO) is used to transmit the data from IoT devices to the company network using a legacy VPN connection. This legacy VPN connection terminates on a virtual machine (VM). Every device gets a private IP out of the private class network CIDR, for example, 10.200.0.0/16.

Switching cloud providers

At our client’s previous provider, the infrastructure was set up so the legacy VPN tunnel was directly connected to the corresponding servers for a production and preview environment.

After switching to Scaleway a few years back, we set it up so the UDP packets from the IoT devices were arriving at the production server via a Wireguard VPN tunnel. On the production and staging VM, we had a static internal IP to which the IoT devices were sending their data.

The incoming data on the production and staging server was then proxied to the Kubernetes Cluster.

Our challenge: to preserve the IP of the IoT device

The challenge that came with the old infrastructure was that the client IP of the IoT device (e.g., 10.200.21.22) was not preserved — we couldn’t see the real IP of the device because we had some NAT layers between.

Even when we tried to route the packets completely through, we still had a NAT layer in the last stage, where the UDP packets entered the Kubernetes cluster via the nodePort. Here, the source IP was changed to the internal IP of the Kubernetes node (more about the details of Kubernetes networking ).

For nearly two years, it wasn’t necessary to see the IoT device IP, as the UDP packets arrived at the decoding connector and transferred the information about the IMEI and IMSI of the device, which can then be assigned in the backend.

That is until our customer wanted the backend to be able to send a packet to the IoT device to turn a function on or off. For that, we needed the real IP of the IoT device. And that wasn’t going to work with the NAT layers in between. We were able to answer packets from the UDP device but nothing more.

How would we solve this problem?

Proxy Protocol to the rescue?

After some research, we discovered that Proxy Protocol ( HA Proxy - Proxy Protocol ) might help us preserve the client IP. A lot of implementations of the Proxy Protocol deal with normal HTTP connections. But, as mentioned above, HTTP doesn’t work for us. We needed an implementation that works with UDP.

And there, the possible solutions began to melt away again — we could only find a few possible approaches that might fit our needs.

The udppp/mmproxy approach

mmproxy is a tool that was developed by Cloudflare to preserve the client IPs in a UDP environment. In addition to mmproxy, we also use a small tool called udppp . udppp is running on a local port to which the original UDP packets are sent.

udppp adds the Proxy Protocol header to the packet and sends this out to an IP you define in the command line. mmproxy then picks up this packet, removes the Proxy Protocol header, and creates a new packet with a magical IP_TRANSPARENT socket option. Read more about IP Transparent mode in this blog from Cloudflare .

With the help of Andy Smith’s blog post about preserving client IPs in Kubernetes , we implemented a sidecar container on the decoding microservice. After some tests, we saw that this approach was working — the client IP was preserved.

While the approach was a success, some questions were still not answered:

How are packets flowing back to the IoT device without passing a NAT layer?

How scalable is this solution?

To solve the first issue, we tried out several iptables hacks to send the packets back, but ultimately, this approach failed — the return route wasn’t possible.

So what now?

The Wireguard Pod approach

As mentioned before, we used Wireguard to connect the old-fashioned servers to a gateway server. So we thought, “Why don’t we try to put the connection in the cluster?”

With this idea in my mind, we created a Wireguard Pod, which established a secure VPN connection to our gateway server. Some iptables hacks, and a few errors later, we found out that this approach was not working either because the Wireguard Network is not directly known by the Kubernetes Node. A cluster-internal routing isn’t possible in this case. Policy-based routing wasn’t working either. And neither was the return route.

It was hard for us to realize that these two approaches just weren’t working at all. So I decided to take…

Excerpt shown — open the source for the full document.