Getting started with Scaleway's managed Kubernetes service - Kapsule

· by Adrian Todorov · Read in about 16 min · (3338 words) ·

Introduction

Scaleway is a French cloud provider that mostly specialises in (custom designed) bare metal ARM servers, standard VPSes, and has recently started adding some additional services like x86 bare metal servers, Load Balancers, a new and improved object storage, managed databases, container registry, managed firewalls, and, hotly anticipated, a managed Kubernetes Service, Kapsule. There’s plenty of competition in the managed Kubernetes space, but Scaleway have a few potential advantages, most notably:

  • the possibility to have bare metal and ARM-based node pools (not yet available in the public beta, but ARM is kind of Scaleway’s specialty, so it’d be surprising if they didn’t offer it)
  • decent integrated ecosystem - Load Balancers, Container Registry, block and object storage, and plenty of others soon
  • pricing, which is on par with Digital Ocean’s pricing (even if the smallest possible node type is pretty big at 4vCPU/16GB RAM and at 39 euros/month), the control plane is free, and associated services aren’t expensive - load balancers are at 9 euros/month, container registry is at 0.025 euros/GB/month for storage, with networking free within the same region and at 0.03 euros/GB/month
  • cluster upgrades, which aren’t offered by everyone (and for a long time neither Digital Ocean nor OVH proposed it, and Kapsule is younger than either of them)
  • cluster autoscaling (based on the cluster autoscaler Kubernetes project), which, again, isn’t offered by everyone
  • EU-based (datacenters in Paris and Amsterdam)

So, let’s give a Kapsule cluster a spin and see if it lives up to my expectations.

Kapsule Overview

Currently Kapsule is in beta, so some features aren’t ready yet (like cluster autoscaling or upgrades from the web UI), and for now only the Paris region is supported.

Kapsule clusters can use calico, weave, flannel or cilium for the overlay network, and upon creation Scaleway can optionally deploy traefik (1.x, for now) or nginx ingress controllers, and the Kubernetes dashboard (which of course you can do later on your own, via Helm or otherwise, but it’s a nice touch). You get an automatic wildcard DNS with your cluster’s id.nodes.k8s.fr-par.scw.cloud, which can come in handy for testing.

Deploying a cluster

For now, cluster deployment is possible via the API, the web console or terraform (which sadly wasn’t available when i started writing this post).

Let’s deploy a small cluster via the API. First you’ll need your organization id and a token secret key, both available from the credentials page of the Scaleway console, and export the latter for easier reuse to SCALEWAY_TOKEN . Then, create a JSON file along these lines (updating the organization_id with yours), which is the bare minimum to create a 1.14.8 (on purpose, to test cluster upgrades to a more recent version, 1.15.x, later on) Kubernetes cluster with 2 GP-1S nodes in the default node pool, with calico and treafik ingress:

{
  "organization_id": "xxx",
  "name": "k8s-test",
  "version": "1.14.8",
  "cni": "calico",
  "dashboard": true,
  "ingress": "traefik",
  "default_pool_commercial_type": "gp1_s",
  "default_pool_autoscaling": false,
  "default_pool_size": 2
}

Then POST it to the Scaleway API in the right region (fr-par only for now):

curl -XPOST  -H "X-Auth-Token: ${SCALEWAY_TOKEN}" -d @data.json https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters

And we get a response straight away with the cluster ID and the DNS wildcard:

{
    "autoscaler_config": {
        "balance_similar_node_groups": false,
        "estimator": "binpacking",
        "expander": "random",
        "expendable_pods_priority_cutoff": -10,
        "ignore_daemonsets_utilization": false,
        "scale_down_delay_after_add": "10m",
        "scale_down_disable": false
    },
    "cluster_ip": "",
    "cluster_port": 0,
    "cluster_url": "https://f50f0126-b994-47a9-9949-b68a3ed1335b.api.k8s.fr-par.scw.cloud:6443",
    "cni": "calico",
    "created_at": "2019-09-15T15:32:01.278854200Z",
    "current_core_count": 0,
    "current_mem_count": 0,
    "current_node_count": 0,
    "description": "",
    "dns_wildcard": "*.f50f0126-b994-47a9-9949-b68a3ed1335b.nodes.k8s.fr-par.scw.cloud",
    "id": "f50f0126-b994-47a9-9949-b68a3ed1335b",
    "name": "k8s-bal",
    "organization_id": "xxx",
    "region": "fr-par",
    "status": "creating",
    "sub_status": "no_details",
    "tags": [],
    "updated_at": "2019-09-15T15:32:01.278854200Z",
    "version": "1.14.8"
}

Note down the cluster ID, and store it in a varialbe (like CLUSTER_ID) for future use, so that you can check the status and get the kubeconfig file:

curl -XGET  -H "X-Auth-Token: ${SCALEWAY_TOKEN}"  "https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters/${CLUSTER_ID}"
curl -XGET  -H "X-Auth-Token: ${SCALEWAY_TOKEN}"  "https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters/${CLUSTER_ID}/kubeconfig?dl=1" > /tmp/mycluster.kubeconfig

Once the cluster is ready, we can check connectivity with kubectl:

export KUBECONFIG=/tmp/mycluster.kubeconfig
kubectl cluster-info
Kubernetes master is running at https://f50f0126-b994-47a9-9949-b68a3ed1335b.api.k8s.fr-par.scw.cloud:6443
CoreDNS is running at https://f50f0126-b994-47a9-9949-b68a3ed1335b.api.k8s.fr-par.scw.cloud:6443/api/v1/namespaces/kube-system/services/coredns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Testing the traefik ingress controller

Let’s create a simple Ingress for the Treafik dashboard deployed for us by Scaleway to test it and the Traefik Ingress Controller.

First, generate a file which will contain the basic auth credentials with htpasswd (apache2-utils package on Debian/Ubuntu), with the -B option which forces bcrypt instead of the horribly outdated and insecure MD5 used by default, and create a Kubernetes secret in the kube-system namespace based on it :

htpasswd -c -B ./auth admin
New password:
Re-type new password:
Adding password for user admin

kubectl create secret generic traefik-dash-basic-auth --from-file auth --namespace=kube-system

Then, create an Ingress object for the Traefik dashboard, swapping XXX in the host field with your cluster id:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: traefik-web-ui
  namespace: kube-system
  annotations:
    traefik.ingress.kubernetes.io/auth-type: basic
    traefik.ingress.kubernetes.io/auth-secret: traefik-dash-basic-auth
spec:
  rules:
  - host: dashboard.XXX.nodes.k8s.fr-par.scw.cloud
    http:
      paths:
      - path:
        backend:
          serviceName: ingress-traefik
          servicePort: admin

and apply it:

kubectl apply -n=kube-system -f traefik-ingress.yml

After that, going to dashboard.XXX.nodes.k8s.fr-par.scw.cloud should get you to the traefik dashboard, protected by the username and password you created with htpasswd.

So our cluster is up and running, and so is the Traefik ingress.

Kapsule with Scaleway’s Container Registry

Scaleway provide a managed container registry service, called simply “Container Registry”. It’s rather barebones - no security scanning or anything, just a basic Docker Registry that is separated in namespaces which have regionally unique names, can’t be recreated and can either be publicly accessible or private. If they are private, accessing them requires Scaleway API tokens, which aren’t scoped and thus give full read/write access to the Scaleway API, which isn’t great. You only pay for storage (0.025/GB/month) and traffic outside of the same region (a Kapsule cluster in fr-par pulling from a Container Registry in fr-par will cost nothing).

To use it, you only need to create a namespace(more details on that below), docker login, docker build / docker tag your image, and docker push it, like this:

docker login rg.fr-par.scw.cloud -u anyuser -p ${SCALEWAY_TOKEN}
docker build . -t rg.fr-par.scw.cloud/sofixa/golang-example-web-server-k8s:latest
docker push rg.fr-par.scw.cloud/sofixa/golang-example-web-server-k8s

To be able to use the container image created from inside Kapsule, if your registry is private, you need to configure the registry authentication on your cluster (link to the official docs), by first creating the secret in your namespace with the Scaleway token:

kubectl -n=<your-namespace> create secret docker-registry regcred --docker-server=rg.fr-par.scw.cloud --docker-username=anyuser --docker-password=${SCALEWAY_TOKEN} --docker-email=<your-email>

And then using imagePullSecrets in your template spec on Deployments/Daemon Sets/Stateful Sets/Pods:

spec:
  selector:
      matchLabels:
        name: golang-test # Label selector that determines which Pods belong to the DaemonSet
  template:
    metadata:
      labels:
        name: golang-test # Pod template's label selector
    spec:
      containers:
      - name: golang-test
        image: rg.fr-par.scw.cloud/xxx/golang-example-web-server-k8s:1.0
      imagePullSecrets: 
        - name: regcred

Kapsule with Scaleway’s Load balancer

Scaleway’s Load Balancer service is an active-passive Load balancer that supports multiple frontends (listeners) and multiple backends (targets), which can be of different types, with pre-defined healthchecks for backends like PostgreSQL, MySQL, Redis, LDAP or plain old TCP or HTTP. It can do SSL offloading (via automatically provisioned Let’s Encrypt certificates) or SSL passthrough (by configuring a frontend and backend with TCP/443).

To create a Scaleway Load Balancer from inside Kapsule, you need to create a Service of Type LoadBalancer, like this:

apiVersion: v1
kind: Service
metadata:
  name: golang-test
spec:
  selector:
    name: golang-test
  ports:
    - port: 80
      targetPort: 80
  type: LoadBalancer

Which creates a LB with one frontend on port 80, one backend with all Kapsule nodes containing your service (targeting it on port 80), with round-robin load balancing (the default). The Scaleway Cloud Controller doesn’t seem to be fully documented yet, so here is a list of the available annotations (kudos to ben from Scaleway’s Community Slack for sharing them):

service.beta.kubernetes.io/scw-loadbalancer-forward-port-algorithm #annotation to choose the load balancing algorithm

service.beta.kubernetes.io/scw-loadbalancer-sticky-sessions #annotation to enable cookie-based session persistence

service.beta.kubernetes.io/scw-loadbalancer-sticky-sessions-cookie-name #annotation for the cookie name for sticky sessions

service.beta.kubernetes.io/scw-loadbalancer-health-check-type #health check used

service.beta.kubernetes.io/scw-loadbalancer-health-check-delay #time between two consecutive health checks

service.beta.kubernetes.io/scw-loadbalancer-health-check-timeout #additional check timeout, after the connection has been already established

service.beta.kubernetes.io/scw-loadbalancer-health-check-max-retries #number of consecutive unsuccessful health checks, after wich the server will be considered dead

service.beta.kubernetes.io/scw-loadbalancer-health-check-http-uri #URI that is used by the "http" health check

service.beta.kubernetes.io/scw-loadbalancer-health-check-http-method #method used by the "http" health check

service.beta.kubernetes.io/scw-loadbalancer-health-check-http-code #HTTP code that the "http" health check will be matching against

service.beta.kubernetes.io/scw-loadbalancer-health-check-mysql-user #MySQL user used to check the MySQL connection when using the "mysql" health check

service.beta.kubernetes.io/scw-loadbalancer-health-check-pgsql-user #PgSQL user used to check the PgSQL connection when using the "pgsql" health check

service.beta.kubernetes.io/scw-loadbalancer-send-proxy-v2 #annotation that enables PROXY protocol version 2 (must be supported by backend servers)

service.beta.kubernetes.io/scw-loadbalancer-timeout-server # maximum server connection inactivity time

service.beta.kubernetes.io/scw-loadbalancer-timeout-connect #maximum initical server connection establishment time

service.beta.kubernetes.io/scw-loadbalancer-timeout-tunnel #maximum tunnel inactivity time

service.beta.kubernetes.io/scw-loadbalancer-on-marked-down-action# annotation that modifes what occurs when a backend server is marked down

Details about them and their possible values are available on the Load Balancer API docs.

Cluster upgrades

One of the most interesting features of Kapsule is the possibility to do rolling cluster upgrades, let’s test it!

We’ll create a random service behind a Load Balancer to test if our service stays up during the upgrade, as it should. To see what’s going on, the service will based on a DaemonSet (a pod running on each node) with a simple Golang http server which prints the node hostname. Along the way we’ll test Scaleway’s Load Balancer and Container Registry services.

Prerequisites

First, create your container registry with a similar JSON file, editing the appropriate lines:

{
  "name": "xxx",
  "description": "My awesome container registry",
  "organization_id": "xxx",
  "is_public": false
}

and POST it to Scaleway’s API endpoint for Container Registry namespaces:

https://api.scaleway.com/registry/v1/regions/{region}/namespaces

curl -XPOST  -H "X-Auth-Token: ${SCALEWAY_TOKEN}" -d @data.json https://api.scaleway.com/registry/v1/regions/fr-par/namespaces

Second, clone the example code in my GitHub repository , build its Docker container, and push it to your registry:

git clone git@github.com:sofixa/golang-example-web-server-k8s.git
cd golang-example-web-server-k8s/
docker build . -t rg.fr-par.scw.cloud/xxx/golang-example-web-server-k8s:1.0
docker push rg.fr-par.scw.cloud/xxx/golang-example-web-server-k8s

Once the image is successfully uploaded to the registry, let’s create a DaemonSet from it and a Load Balancer in front with the manifest files (based on the examples from the Load Balancer and the Container Registry sections) in the same repository ( the DaemonSet manifest will try to use a regcred secret to authentify to your registry, so if your registry is public, you should remove the imagePullSecrets part from it):

kubectl create namespace test-golang
namespace/test-golang created

kubectl apply -n=test-golang -f daemon-set.yml
daemonset.apps/golang-test created

kubectl apply -n=test-golang -f load-balancer.yml
service/golang-test created

To get the Load Balancer’s external IP, get the corresponding service and copy the EXTERNAL-IP:

kubectl get svc -n=test-golang
NAME          TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)        AGE
golang-test   LoadBalancer   10.38.179.245   51.159.25.121   80:31120/TCP   5d23h

Once the load balancer is ready, curl-ing it’s public IP should get you a 200 response code with a similar response body:

curl -v http://51.159.25.121   
* Rebuilt URL to: http://51.159.25.121/
*   Trying 51.159.25.121...
* TCP_NODELAY set
* Connected to 51.159.25.121 (51.159.25.121) port 80 (#0)
> GET / HTTP/1.1
> Host: 51.159.25.121
> User-Agent: curl/7.58.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Mon, 11 Nov 2019 20:11:57 GMT
< Content-Length: 65
< Content-Type: text/plain; charset=utf-8
<
Hello world from scw-k8s-test-default-16b716e81c7a4!
* Connection #0 to host 51.159.25.121 left intact

Notice the response, which contains the node name, composed of the following:

  • cluster name k8s-test
  • node pool name default
  • node id 16b716e81c7a4

The test

Now, let’s set up vegeta, a HTTP load testing tool, to test if our cluster continues to respond during a cluster upgrade. Follow the installation instructions, and do a quick test on your load balancer’s IP:

echo "GET http://51.159.25.121" | vegeta attack -rate=1/s
wResult	Attack
              SeqCode	TimestampLatencBytesOutBytesInError
                                                           Body
Timeb���6P<bAAHello world from scw-k8s-test-default-3cb12e55883e4!
d���6*<H.�AAHello world from scw-k8s-test-default-3cb12e55883e4!
Everything seems to be working (ignore the binary output, vegeta output isn’t meant to be parsed directly by humans), you can quit/Ctrl+C the process. You can now launch a continuous “attack” in vegeta parlance, storing the results in a json file:

echo "GET http://51.159.25.121" | vegeta attack -rate=1/s | vegeta encode > results.json

cat-ing the file should result in a similar output, which contains all sorts of useful information for a normal load test (the main purpose of vegeta), but for our test case, only the response code and body (which is base64 encoded) matter:

cat results.json | jq
  "attack": "",
  "seq": 0,
  "code": 200,
  "timestamp": "2019-11-11T21:24:15.375938851+01:00",
  "latency": 21093314,
  "bytes_out": 0,
  "bytes_in": 65,
  "error": "",
  "body": "SGVsbG8gd29ybGQgZnJvbSBzY3ctazhzLWVsb3F1ZW50LWxlaG1hbm4tZGVmYXVsdC0zY2IxMmU1NTg4M2U0IQo="
}

A base64decode / base64 -d on that body will get the same result as a bare curl:

head -n 1 results.json | jq .body | base64 -i -d
Hello world from scw-k8s-test-default-3cb12e55883e4!

Once the vegeta load test is running, time to launch an upgrade to the Kapsule cluster.

The upgrade

Kubernetes cluster upgrades are done in two main stages:

  • the control plane and all its components
  • the worker node pools, a node at a time

During the control plane upgrade, it remains accessible (worst case scenario you might get an EOF or two). For a worker node to be updated, it needs to be drained from all pods. Scaleway do this for you automatically, but for that to happen there must be enough resources on the other nodes for all the pods required to be scheduled.

Usually you can only upgrade a minor version at a time (e.g. 1.14.x to 1.15.x, but not 1.16.x directly), and this is the case with Scaleway. In any case, doing the upgrade over API is cooler, so that’s what we’ll do. All that’s required is a small JSON file with the version wanted (remember, only patches or a single minor version upwards are supported) and a boolean wether we want to upgrade the nodes as well (duh).

{
  "version": "1.15.5",
  "upgrade_pools": true
}

All that’s left to do is to POST that to the /upgrade endpoint:

curl -XPOST -d @data.json  -H "X-Auth-Token: ${SCALEWAY_TOKEN}"  "https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters/${CLUSTER_ID}/upgrade

Which would get a similar result:

{
  "region": "fr-par",
  "id": "ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b",
  "organization_id": "37a7df83-e2f2-43aa-a181-170a52aec2ac",
  "created_at": "2019-11-11T19:46:54.261230Z",
  "updated_at": "2019-11-11T20:49:16.331469260Z",
  "name": "k8s-test",
  "description": "",
  "cluster_ip": "",
  "cluster_port": 0,
  "current_node_count": 2,
  "status": "updating",
  "sub_status": "deploy_controlplane",
  "version": "1.15.5",
  "cni": "calico",
  "tags": [],
  "current_core_count": 8,
  "current_mem_count": 34359738368,
  "cluster_url": "https://ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b.api.k8s.fr-par.scw.cloud:6443",
  "dns_wildcard": "*.ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b.nodes.k8s.fr-par.scw.cloud",
  "autoscaler_config": {
    "scale_down_disable": false,
    "scale_down_delay_after_add": "10m",
    "estimator": "binpacking",
    "expander": "random",
    "ignore_daemonsets_utilization": false,
    "balance_similar_node_groups": false,
    "expendable_pods_priority_cutoff": -10
  }
}

To check the status of the upgrade, you can use GET the status of the cluster and the node pools via the Scaleway API on the cluster, pools and nodes endpoints or kubectl get nodes:

curl -XGET  -H "X-Auth-Token: ${SCALEWAY_TOKEN}"  "https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters/${CLUSTER_ID}"
{
  "region": "fr-par",
  "id": "ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b",
  "organization_id": "37a7df83-e2f2-43aa-a181-170a52aec2ac",
  "created_at": "2019-11-11T19:46:54.261230Z",
  "updated_at": "2019-11-11T21:04:47.430331Z",
  "name": "k8s-test",
  "description": "",
  "cluster_ip": "",
  "cluster_port": 0,
  "current_node_count": 2,
  "status": "ready",
  "sub_status": "deploy_controlplane",
  "version": "1.15.5",
  "cni": "calico",
  "tags": [],
  "current_core_count": 8,
  "current_mem_count": 34359738368,
  "cluster_url": "https://ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b.api.k8s.fr-par.scw.cloud:6443",
  "dns_wildcard": "*.ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b.nodes.k8s.fr-par.scw.cloud",
  "autoscaler_config": {
    "scale_down_disable": false,
    "scale_down_delay_after_add": "10m",
    "estimator": "binpacking",
    "expander": "random",
    "ignore_daemonsets_utilization": false,
    "balance_similar_node_groups": false,
    "expendable_pods_priority_cutoff": -10
  }
}
curl -XGET  -H "X-Auth-Token: ${SCALEWAY_TOKEN}"  "https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters/${CLUSTER_ID}/pools"
{
  "total_count": 1,
  "pools": [
    {
      "region": "fr-par",
      "id": "feb2c164-8805-4130-b5d3-57889dc35652",
      "cluster_id": "ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b",
      "created_at": "2019-11-11T19:46:54.266926Z",
      "updated_at": "2019-11-11T21:02:09.443695Z",
      "name": "default",
      "current_node_count": 2,
      "status": "updating",
      "version": "1.16.2",
      "commercial_type": "gp1_xs",
      "autoscaling": false,
      "size": 2,
      "min_size": 2,
      "max_size": 2,
      "current_core_count": 8,
      "current_mem_count": 34359738368,
      "container_runtime": "docker",
      "autohealing": false
    }
  ]
}
curl -XGET  -H "X-Auth-Token: ${SCALEWAY_TOKEN}"  "https://api.scaleway.com/k8s/v1beta2/regions/fr-par/clusters/${CLUSTER_ID}/nodes"
{
  "total_count": 2,
  "nodes": [
    {
      "region": "fr-par",
      "id": "16b716e8-1c7a-4486-9861-067717cd44ea",
      "cluster_id": "ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b",
      "created_at": "2019-11-11T19:47:52.240391Z",
      "updated_at": "2019-11-11T21:06:40.605505Z",
      "pool_id": "feb2c164-8805-4130-b5d3-57889dc35652",
      "status": "ready",
      "npd_status": {
        "DiskPressure": "False",
        "KernelDeadlock": "False",
        "MemoryPressure": "False",
        "NetworkUnavailable": "False",
        "PIDPressure": "False",
        "Ready": "True"
      },
      "name": "scw-k8s-test-default-16b716e81c7a4",
      "public_ip_v4": "51.158.69.231",
      "public_ip_v6": null
    },
    {
      "region": "fr-par",
      "id": "3cb12e55-883e-404e-9617-9716a1ab22aa",
      "cluster_id": "ae336b4a-7875-4150-8b1c-ed1e4c9f3b2b",
      "created_at": "2019-11-11T19:47:54.410118Z",
      "updated_at": "2019-11-11T21:06:40.732254Z",
      "pool_id": "feb2c164-8805-4130-b5d3-57889dc35652",
      "status": "notready",
      "npd_status": {
        "DiskPressure": "Unknown",
        "KernelDeadlock": "False",
        "MemoryPressure": "Unknown",
        "NetworkUnavailable": "False",
        "PIDPressure": "Unknown",
        "Ready": "Unknown"
      },
      "name": "scw-k8s-test-default-3cb12e55883e4",
      "public_ip_v4": "51.15.217.30",
      "public_ip_v6": null
    }
  ]
}
kubectl get nodes         
NAME                                             STATUS                        ROLES    AGE   VERSION
scw-k8s-test-default-16b716e81c7a4   Ready                         <none>   77m   v1.15.5
scw-k8s-test-default-3cb12e55883e4   NotReady,SchedulingDisabled   <none>   76m   v1.14.8

Upgrades are pretty quick (<5 minutes) if your cluster is empty, as was mine, but as usual, your mileage might vary. However, they aren’t fully non-disruptive - checking the negative results of the vegeta run, we see there were a few timeouts:

less results.json | jq '. | select (.code != 200)'
{
  "attack": "",
  "seq": 41,
  "code": 0,
  "timestamp": "2019-11-11T22:02:41.187717035+01:00",
  "latency": 30000794729,
  "bytes_out": 0,
  "bytes_in": 0,
  "error": "Get http://51.159.25.121: net/http: request canceled (Client.Timeout exceeded while awaiting headers)",
  "body": null
}
{
  "attack": "",
  "seq": 42,
  "code": 0,
  "timestamp": "2019-11-11T22:02:42.187457957+01:00",
  "latency": 30000935369,
  "bytes_out": 0,
  "bytes_in": 0,
  "error": "Get http://51.159.25.121: net/http: request canceled (Client.Timeout exceeded while awaiting headers)",
  "body": null
}
{
  "attack": "",
  "seq": 43,
  "code": 0,
  "timestamp": "2019-11-11T22:02:43.187827619+01:00",
  "latency": 30000555385,
  "bytes_out": 0,
  "bytes_in": 0,
  "error": "Get http://51.159.25.121: net/http: request canceled (Client.Timeout exceeded while awaiting headers)",
  "body": null
}

It looks like the Load balancer’s health check isn’t frequent enough to detect that the backend is down during upgrades. The Load Balancer API docs show that it is possible to tweak the delay between checks (not available via the web UI though). As mentioned before, there are a few (not yet publicly documented) annotations we can use on our service object for that (the available choices and detailed explanations are available on the Load Balancer API page):

service.beta.kubernetes.io/scw-loadbalancer-health-check-delay # time between two consecutive health checks, in milliseconds
service.beta.kubernetes.io/scw-loadbalancer-health-check-timeout # additional check timeout, after the connection has been already established
service.beta.kubernetes.io/scw-loadbalancer-health-check-max-retries # number of consecutive unsuccessful health checks, after wich the server will be considered dead

This is what our Service object could look like (example values, the precise ones will vary depending on your environment):

apiVersion: v1
kind: Service
metadata:
  name: golang-test
  annotations:
    service.beta.kubernetes.io/scw-loadbalancer-health-check-delay: 10000 # a check every 10s
    service.beta.kubernetes.io/scw-loadbalancer-health-check-timeout:  3000 # a 3s timeout, which can be dangerously low
    service.beta.kubernetes.io/scw-loadbalancer-health-check-max-retries: 2 # 2 retries
spec:
  selector:
    name: golang-test
  ports:
    - port: 80
      targetPort: 80
  type: LoadBalancer

Summary

Pros:

  • quick to provision and good provisioning tooling (API, terraform)
  • good amount of features/integrations (cluster upgrades, autohealing, optional installation of dashboard, ingress for Kapsule, automatic cetrs via Let’s Encrypt and various healthchecks for Load Balancer)
  • Container Registry is good enough and inexpensive (0.025 eur/GB/month for storage and 0.03 eur/GB/month traffic outside of the same region)
  • Kubernetes versions come out pretty quickly (~1 week for 1.16)
  • ecosystem is good - Scaleway have managed Databases, Object and Block storage, Load Balancers (which will soon be multi-cloud) and a few potentially very interesting services in beta/preview like Serverless Functions, VPCs, IoT Hub, AI Inference, Domains (which includes promissing features like “Dynamic routing capabilities: balance traffic depending on resource health and more to come” ), striking, for me, a fine balance between more basic platforms (Linode, Vultr, Digital Ocean to an extent) and full featured clouds (AWS, GCP, Azure) while still retaining pricing closer to the former

Cons:

  • only relatively expensive instance types are available (starting at 40 eur/month), for now, making Kapsule less affordable compared to the main competitors - hopefully that will change soon, and we might even get bare metal nodes one day
  • Container Registry auth requires a full Scaleway token, which gives full read/write API access - read-only tokens would be a cool addition
  • documentation is somewhat lacking in some areas
  • there’s no way to monitor/get statistics from services like Load Balancer or Container Registry; for Kapsule Heapster and metrics-server are preinstalled, so there’s access to Kubernetes-level statistics, but nothing else
  • there’s no way to create and manage Scaleway ressources (other than Load Balancer and block storage) from inside Kapsule, like with GCP’s Cloud Connector (Note: you can probably use AppsCode Kubeform for that, since it’s just a Kubernetes CRD wrapper around terraform providers, and Scaleway’s terraform provider is pretty decent)
  • cluster upgrades can result in small (a few seconds) downtimes with the default configuration, some fine tuning is required

So, in conclusion, Kapsule and the Scaleway stack are very interesting and evolving quickly, but come with some rough edges (well, Kapsule is still in beta, so completely normal). Once those are polished, Scaleway Kapsule and the rest of their ecosystem will make for a very compelling development / small-to-mid scale cloud environment, with just the right amount of managed services and features for a bargain price, and EU data sovereignty, which can be a requirement/plus in some cases.