DevOps coding game: container, deployment, monitoring & CI/CD
Captured source
source ↗DevOps coding game: container, deployment, monitoring & CI/CD Build • Aurélien Maury - CTO Wescale • 28/04/22 • 6 min read
A year ago, WeScale had the idea to create a coding game oriented towards DevOps and Infra-as-Code: a treasure hunt that could amuse experts and push beginners to improve their skills by solving technical puzzles.
Abhra Shambhala
The Abhra Shambhala project started in March 2021, driven by the passion for sharing technical knowledge in a fun way that also provides a challenge for participants, as well as for us... At the time of writing this article, 235 people have already taken part in our treasure hunt, 22 have reached the finish line.
Let's take a look at the project in more detail, without spoiling the game.
Challenges and objectives of building the coding game
The primary goal of the game is to familiarize players with git, Docker, Helm manifests, and automated deployment pipelines.
To build a successful coding game platform, we needed to prevent users from being able to modify the build from pull requests for (obvious) security reasons and deploy/destroy the platform quickly. Continuous deployment on our application components was also a must-have to deliver quickly in case of problems. Spoiler alert: there are always problems when building a technical treasure hunt game.
We also obviously wanted to have fun while testing a complete stack with a concrete project and bring players into this fun story.
Tools
The toolbox used to deploy and maintain this architecture is composed of:
Ansible, as the central orchestrator. All operations start by a playbook, and variables repository are stored as YAML files in group_vars
Terraform to pilot servers’ resources, the initialization of Rancher, and cluster deployment on Kapsule. Terraform’s operations are supervised by a playbook that handles prerequisite tasks, output collects, and YAML file generation on group_vars to make the output from an operation available for the following playbooks
Make to spare you from typing ansible-playbook all the time
Direnv, for the virtualenv with Ansible and to load environment variables for Ansible and Terraform configurations
Final architecture
The final form of our architecture here revolves around two primary resources: a compute instance and a Kapsule cluster.
Architecture airplane view
Compute Instance: game master
The cornerstone of the platform is a simple Instance running on Debian 11 with:
The DNS domain we manage for our applications
For more flexibility in deployments, we delegated a subdomain to a bind9 daemon that will become our reference DNS authority. The DNS records of application deployments that we expose are managed here, with updates pushed by an Ansible playbook.
A user-friendly web interface to manage our Kapsule cluster
We installed Rancher on this server rather than on Kapsule to allow an easier switch between Kapsule clusters without reinstalling Rancher.
Rancher is deployed by Helm charts, and we maintain a local cluster with a single K3S instance to serve as Rancher’s execution platform.
To expose the Rancher service, we installed Nginx to act as a reverse proxy to a local port and carry the TLS certificates. K3S APIs are not exposed externally and are only accessed locally via Ansible.
The deployment of cluster tools
Once Rancher is deployed, any cluster imported into its management scope can receive Rancher tooling, an observability stack detailed in the next section.
Kapsule: game board
Now that the back base has been deployed, we can start a Kapsule Kubernetes cluster through an Ansible playbook that pilots Terraform to create the cluster, retrieve useful output and launch a second action to implement the cluster on Rancher.
Rancher deploys its probes and graphic interface to inspect the cluster when imported. This gives us the tools to visualize workloads’ logs from each pod and start terminals on each for troubleshooting.
A final Terraform piloting playbook is used to deploy:
Grafana, the well-known dashboard manager, adapted to the following application components alongside dashboards
Logging Operator, an automation of Fluentbit and Fluentd by BanzaïCloud, makes life much easier for centralizing logs
Loki, to store logs for Grafana
The Nginx-ingress-controller and cert-manager which will be used to expose our APIs
Phew! Once all this has been deployed, we have a sound and well-equipped working base to accommodate the application part. We will not detail here the content of the application so as not to disclose the CodinGame to future participants!
We’ll just tell you that there is a continuous deployment component with a Drone (which does the job brilliantly).
Automation
Automation and reproducibility are central to our work. Once the code base is mature, an environment can be set up with two commands that follow a certain number of playbooks.
make core
Creation of the Compute instance for Rancher, through Terraform
System Setup
Subdomain delegation to make it the authority of the domain for game services
Creation of public certificates by DNS challenge with Let's Encrypt
Installation of K3S, Rancher, and exposure through the Nginx reverse-proxy
Initial Rancher setup, via Terraform
make kapsule
Creation of the Kapsule cluster via Terraform
Importing the cluster in the management of Rancher
Installation of Helm observability charts and Rancher, via Terraform
Thoughts on the tools after the first experience
Scaleway’s Kubernetes Kapsule
Installing one’s own Kubernetes cluster is a hard path to follow. Therefore, a managed K8s orchestrator is the obvious choice. Kubernetes Kapsule is a great choice with:
Available versions that closely follow the K8s roadmap
Transparent integration with Scaleway Load Balancer
Easy-to-handle attached storage
Ansible Terraform
The Ansible-Terraform duo for Infra-as-Code management is a real success, even if the encapsulation of Terraform by playbooks may seem counterintuitive.
In this context, where the scope is clearly defined and involves many heterogeneous tasks, Ansible as an entry point makes it more accessible.
Ansible is an excellent glue for all that, and the Ansible Terraform module fits right in.
Fleet
The GitOps component from RancherLabs, Fleet, still seems young to us.
The proposed Custom Resources Definition abstraction for managing continuous deployment flows and targets is somewhat complex to grasp. Redeployments…
Excerpt shown — open the source for the full document.