Using Traefik on Nomad

Introduction

Traefik is a great load balancer, which uses dynamic configuration from a variety of providers, notably in this case Consul Catalog, which Nomad jobs can register into, providing a fast and easy way of having automatic virtual hosts and load balancing (ingress) for all of our Nomad jobs. There’s already a decent basic tutorial on Hashicorp Learn about doing just that, so I’ll focus on more advanced patterns and use cases that I’ve had to deal with.

A full Nomad job file can be found here.

Base

Traefik’ s configuration comes in two parts - static, like on which ports to listen, tracing, logs, etc. and dynamic, which comes from a third-party source(like Consul’ s service catalogue) and is used to dynamically create routers (aka virtual hosts, which can be HTTP, TCP or UDP) and services(aka backends).

The static configuration can be provided via CLI flags on start-up, or auto-reloaded YAML / TOML configuration files (which in the case of Traefik running on Nomad can come from Consul or Vault via the template stanza), and contains the basics - entrypoints(on which ports to listen), access logs, metrics, etc. and middlewares (Attached to the routers, pieces of middleware are a means of tweaking the requests before they are sent to your service).

Personally I prefer to have as much as possible in the Nomad file for better version history and rollbackability (if the configuration of a middleware comes from a YAML file stored in Consul, its updates are independent of Nomad, and you if you rollback your Nomad deployment to version X, it won’ t rollback the value in Consul as well), and specifically in the case of Traefik, I think that CLI flags are a bit leaner and sufficiently readable for the basics.

A small example, minimal Traefik configuration with the dashboard and API enabled, healthcheck, Prometheus-compatible metrics, access logs :

yaml YAML version


# to make use of the file, we need to tell Traefik to watch the folder where it is
args = ["--providers.file.directory=local/traefik/"]
template {
  data        = <<EOH
  ping: {}
  accessLog: {}
  api:
    dashboard: true
  metrics:
    entryPoint: metrics
EOH
  destination = "local/traefik/base-config.yaml"

bash CLI flags version


  args = [
    "--ping=true",
    "--accesslog=true",
    "--api=true",
    "--metrics=true",
    "--metrics.prometheus=true",
    "--metrics.prometheus.entryPoint=metrics",
  ]

I’ll provide most examples from here on in the CLI format, but converting them to TOML or YAML shouldn’t be very complicated in most cases with the help of the Traefik docs, which show everything in the different versions.

The bare minimum to get Traefik running under Nomad looks like this:

task "traefik" {
  driver = "docker"
  config {
    image = "traefik:2.4"
    args = [
      "--entryPoints.http.address=:80",
      "--accesslog=true",
      "--api=true",
      "--metrics=true",
      "--metrics.prometheus=true",
      "--metrics.prometheus.entryPoint=http",
      "--ping=true",
      "--ping.entryPoint=http",
    ]
    ports = ["http"]
  }

Static configuration from Consul via templates

Traefik has a file configuration provider (with automatic and hot reloads, hence the noop change_mode), and with the following argument --providersfile.directory=local/traefik/ we can use Nomad templates to generate extra configuration coming from Consul:

hcl Consul template


template {
  data        = "{{ key \"traefik/auth\" }} "
  destination = "local/traefik/auth.yaml"
  change_mode = "noop"
}

As I already mentioned though, I prefer to have most of Traefik’ s configuration in the Nomad job file directly for rollbackability.

TLS configuration from Vault via templates

Using the same argument --providers.file.directory=local/traefik/ and the following templates, we can have TLS certificates stored in Vault with restarts (due to the fact that Traefik doesn’t support hot reload of certificates).

hcl manual TLS from Vault


template {
  data = <<EOH
    tls:
      certificates:
        - certFile: "local/secret/example.com.crt"
          keyFile: "local/secret/example.com.key"
   EOH
  destination = "local/traefik/cert.yaml"
  change_mode = "noop"
}
template {
  data        = "{{ with secret \"secret/data/example.com\"  }}{{.Data.data.key}}{{end}}"
  destination = "local/secret/example.com.key"
  change_mode = "restart"
  splay       = "1m"
}
template {
  data        = "{{ with secret \"secret/data/example.com\"  }}{{.Data.data.crt}}{{end}}"
  destination = "local/secret/example.com.crt"
  change_mode = "restart"
  splay       = "1m"
}

Consul Catalog

To be able to create dynamic routers based on the services we have in Consul, Traefik needs some configuration (full doc here) on itself, and then tags on those services telling it the specifics:

bash flags version


  "--providers.consulcatalog=true",
  "--providers.consulcatalog.exposedByDefault=false",
  "--providers.consulcatalog.endpoint.address=http://172.17.0.1:8500",
  "--providers.consulcatalog.prefix=traefik",
  "--providers.consulcatalog.defaultrule=Host(`{{ .Name }}.example.com`)",

toml TOML version


[providers.consulCatalog]
  prefix           = "traefik"
  exposedByDefault = false
  defaultRule      = "Host(`{{ .Name }}.{{ index .Labels \"customLabel\"}}`)"
  constraints      = "Tag(`a.tag.name`)"
  [providers.consulCatalog.endpoint]
    address = "http://172.17.0.1:8500"

Basically we enable the Consul Catalog provider and point it to a Consul agent, determine the “prefix” (tags on Consul services starting with prefix will be looked at for extra configuration, so for prefix = traefik, tags like traefik.http.routers.api.service = api@internal will be used), define whether or not services are exposed by default (IMHO they shouldn’t be), the default routing rule (Host {{.Name }}.example.com will use service-name.example.com by default, over-ridable per-service via tags).

Optionally we can also add constraints, which allows to do complex matching on tags to determine if a router should be created or not, mostly useful if exposedByDefault is true.

Note on the Consul address - since Traefik runs inside Docker, pointing it to the default localhost:8500 won’t work unless we run it in network_mode = host (sharing the host’s networking, more on that below); in my case, I’m pointing it to the host’s docker0 IP, 172.17.0.1, which is accessible from all Docker containers, and is the same across all my hosts, and making Consul listen on the docker0 on top of usual localhost via the following configuration on the consul agent.

hcl Consul agent configuration to make its API and DNS interfaces available from within docker containers


  addresses {
      http = "127.0.0.1 {{ GetInterfaceIP \"docker0\" }}"
  }

There’s also bunch of more advanced options around Consul auth, API read consistency requirements and refreshInterval.

Here’s how to make use of the provider in the Nomad job file via the service stanza:

hcl Example Traefik configuration via Consul tags


service {
  name = "my-service"
  port = "https"
  tags = [
    # HTTPS-only service which will be called when the URL is example.com or example.org/something
    # with my-middleware attached
    "traefik.enable=true",
    "traefik.http.routers.my-service.rule=Host(`example.com`) || (Host(`example.org`) && Path(`/something`))",
    "traefik.http.routers.my-service.tls=true",
    "traefik.http.routers.my-service.middlewares=my-middleware",
  ]
}

If exposedByDefault is true and you have all the configuration you need by default (e.g.http->https redirect and other middlewares you need on the entrypoint level), you don’t even need per-service configuration unless you want to make something specific.

Networking

There are two main ways you can configure Nomad in regards to Traefik networking - host networking or with static ports. Host networking, which is the same as Docker’s --network="host", adds the task to the host’s network namespace and shares its network stack, and is generally not recommended.

Static ports are host-level ports that are declared statically (compared to regular Nomad ports, which are ephemeral and random on the host level, and rather impractical for incoming http / https traffic) and forwarded to the task, like so:

hcl single network ports example


group "traefik" {
  network {
    port "http" {
      static = 80
    }
    port "https" {
      static = 443
    }
  }
  task "traefik" {
    driver = "docker"
    config {
      image = "traefik:2.4"
      args = [
        "--entryPoints.http.address=:80",
        "--entryPoints.https.address=:443",
      ]
      ports = ["http", "https"]
      ...

This declares the ports 80 and 443, and attaches them to the Traefik task, which uses them as entrypoints named http and https.

If you have multiple networks, you can make use of Nomad’s host_network configuration, where you declare the available networks with aliases on the Nomad client and target them with the host_network option on ports. ( In reality that’s what Nomad does behind the scenes, using the first network it finds for a hidden default host_network, and attaching ports to it). Do note however, that if you declare host_networks, ports which don’t specify the host_network will use the default value of default, and if no such network exists, allocations will fail (silently for system jobs).

hcl multiple host_networks example


group "traefik" {
  network {
    port "http-priv" {
      static       = 80
      host_network = "private"
    }
    port "https-priv" {
      static       = 443
      host_network = "private"
    }
    port "http-pub" {
      static       = 80
      host_network = "public"
    }
    port "https-pub" {
      static       = 443
      host_network = "public"
    }
  }
  task "traefik" {
    driver = "docker"
    config {
      image = "traefik:2.4"
      args = [
        "--entryPoints.http.address=:80",
        "--entryPoints.https.address=:443",
      ]
      ports = ["http-priv", "https-priv", "http-pub", "https-pub"]
      ...

This will make available the http and https Traefik endpoints on both networks, and can of course be adapted to have specific endpoints only on specific networks.

If for some reason the above doesn’t work for you (like if you use VRRP with a virtual IP floating between multiple machines, host_network might not work due Nomad fingerprinting the networks on start-up, and since the floating IP won’t be available on all machines simultaneously, Nomad won’t bind to it), you can resort to host-mode networking (again, it ’ s generally discouraged) where the task shares the host’s networking, like so:

hcl host networking


group "traefik" {
  network {
    port "http" {
      static = 80
    }
    port "https" {
      static = 443
    }
  }
  task "traefik" {
    driver = "docker"
    config {
      image = "traefik:2.4"
      args = [
        "--entryPoints.http.address=:80",
        "--entryPoints.https.address=:443",
      ]
      ports        = ["http", "https"]
      network_mode = "host"
      ...

With this, Traefik will be available on ports 80 and 443 on all network interfaces on the host.

Security

Traefik’s API, dashboard and metrics endpoints should be protected for security reasons, which can be done by putting them on a separate entrypoint which is firewalled off and/or adding middlewares with auth. The API and dashboard are represented by a special service, api@internal, and Prometheus-compatible /metrics can be put on a dedicated service as well (via themanualRouting = true option, which will create a prometheus@internal service ).

Here’s a basic example of that, making use of tags on the Consul service of Traefik to dynamically configure the routes and middlewares:

hcl securing traefik’s API, dashboard and metrics


task "traefik" {
  driver = "docker"
  config {
    image = "traefik:2.4"
    args = [
      # defining 3 entrypoints, on ports 80, 443 and 8080 ( for admin stuff)
      # and putting the healthcheck and metrics (with manual routing) on the admin endpoint
      "--entryPoints.http.address=:80",
      "--entryPoints.https.address=:443",
      "--entryPoints.admin.address=:8080",
      "--entrypoints.http.http.redirections.entryPoint.to=https",
      "--accesslog=true",
      "--api=true",
      "--metrics=true",
      "--metrics.prometheus=true",
      "--metrics.prometheus.entryPoint=admin",
      "--metrics.prometheus.manualrouting=true",
      "--ping=true",
      "--ping.entryPoint=admin",
      "--providers.consulcatalog=true",
      "--providers.consulcatalog.endpoint.address=http://172.17.0.1:8500",
      "--providers.consulcatalog.prefix=traefik",
    ]
    ports = ["http", "https", "admin"]
  }
  service {
    name = "traefik"
    port = "https"
    tags = [
      # using Consul service tags prefixed by traefik (as defined in `--providers.consulcatalog.prefix`) 
      # to configure api&dashboard routers
      # with a headers check ( spoofable ) and a basic auth middleware attached inline
      "traefik.enable=true",
      "traefik.http.routers.api.rule=Host(`traefik.example.com`) && HeadersRegexp(`X-Real-Ip`, `^(10.1.1.1)$`)",
      "traefik.http.routers.api.service=api@internal",
      "traefik.http.routers.api.middlewares=basic-auth",
      "traefik.http.middlewares.basic-auth.basicauth.users=admin:xxx",
    ]
    # healthcheck using the appropriate port
    check {
      name     = "alive"
      type     = "http"
      port     = "admin"
      path     = "/ping"
      interval = "5s"
      timeout  = "2s"
    }
  }
}

Minimising downtime during updates

Traditionally, an ingress/load balancer/reverse proxy such as Traefik would run on all client nodes and route appropriately. In such a scenario, you need some way to distribute the incoming traffic between the client nodes (such as round-robin DNS, L4 load balancing on the router/provider, BGP/Anycast, etc.), and to know their health and stop sending requests their way if they’re unavailable. One of the potential scenarios to deal with gracefully are node draining or configuration updates.

To achieve that on Nomad with minimal downtime, we use system jobs, staggered updates that don’t restart all instances simultaneously with the uppdate stanza, and a combination of the kill_timeout task parameter and Traefik’s lifeCycle.requestAcceptGraceTimeout and lifeCycle.graceTimeOut to allow for requests to finish gracefully:

job "traefik" {
  region      = "global"
  datacenters = ["dc1"]
  type        = "system"
  # only one Traefik instance will be restarted at a time, with 1 minute delay between each such action
  # and automatic rollback to the previous version if the new one doesn't pass the health check
  update {
    max_parallel = 1
    stagger      = "1m"
    auto_revert  = true
  }
  # Nomad will wait for 30s after sending the kill signal to the task before forcefully shutting it down
  # by default it's 10s ( not enough to properly drain connections )
  # and the maximum is limited by the max_kill_timeout setting on the Nomad client ( default 30s)
  kill_timeout = "30s"
  group "traefik" {
    task "traefik" {
      driver = "docker"
      config {
        image = "traefik:2.4"
        # Traefik will continue serving new requests for 15s while failing its healthcheck,
        # giving time to downstream load balancers to take it out of their pool
        # and will *then* wait for 10s for existing requests to finish before shutting down,
        # before Nomad forcefully kills it 30s after initiating
        args = [
          "--entryPoints.http.address=:80",
          "--entryPoints.http.transport.lifeCycle.requestAcceptGraceTimeout=15s",
          "--entryPoints.http.transport.lifeCycle.graceTimeOut=10s",
          "--entryPoints.https.address=:443",
          "--entryPoints.https.transport.lifeCycle.requestAcceptGraceTimeout=15s",
          "--entryPoints.https.transport.lifeCycle.graceTimeOut=10s",
          ...
        ]
        ...
      }
    }

Sidecars

A sidecar is a task that runs alongside the main task doing something auxiliary like collecting logs, metrics, traces. I run two sidecars with my Traefik instances, one for logs and another for traces, due to the heavy load and specific configuration required for it.

Promtail for logs

Promtail is a logging agent from Grafana Labs that can get logs from a variety of sources, and then send them to Loki, a highly-scalable yet lightweight log aggregation system inspired by Prometheus. It’s drastically simpler and lighter than ElasticSearch (sometimes a default choice for centralised log management) to setup, and does a very good job, so that’s what I use (there’s a blog post in the works on that too).

Here’s an example task block for a Promtail sidecar that can collect Traefik’s logs, parse them with a regex, and add some labels (index fields for searching), with the associated lifecycle policy(sidecar, poststart), healthcheck, resource limitations:

task "promtail" {
  driver = "docker"
  config {
    image = "grafana/promtail:2.2.0"
    args = [
      "-config.file",
      "local/config.yaml",
      "-print-config-stderr",
    ]
    # the only port required is for the healthcheck
    ports = ["promtail_healthcheck"]
  }
  # the template with Promtail's YAML configuration file, configuring the files to scrape,
  # the Loki server to send the logs to (based on a registered Consul service, but it could be a fixed URL),
  # the regex to parse the Common Log Format (https://en.wikipedia.org/wiki/Common_Log_Format) used for access logs,
  # and the labels ( HTTP method and status code)
  template {
    data = <<EOH
      server:
        http_listen_port: 3000
        grpc_listen_port: 0
      positions:
        filename: /alloc/positions.yaml
      client:
        url: {{ range service "loki" }}{{ .Address }}:{{ .Port }}{{ end }}/loki/api/v1/push
      scrape_configs:
      - job_name: local
        static_configs:
        - targets:
            - localhost
          labels:
            job: traefik
            __path__: "/alloc/logs/traefik.std*.0"
        pipeline_stages:
          - regex:
              expression: '^(?P<remote_addr>[\w\.]+) - (?P<remote_user>[^ ]*) \[(?P<time_local>.*)\] "(?P<method>[^ ]*) (?P<request>[^ ]*) (?P<protocol>[^ ]*)" (P<status>[\d]+) (?        P<body_bytes_sent>[\d]+) "(?P<http_referer>[^"]*)" "(?P<http_user_agent>[^"]*)"?'
          - labels:
              method:
              status:
    EOH
    destination = "local/config.yaml"
    }
    resources {
      cpu    = 50
      memory = 256
    }
    # poststart, and sidecar=true, so Promtail will start *after* Traefik ( since it has nothing to do before Traefik isup and running),
    # and run for as long as it does
    lifecycle {
      hook    = "poststart"
      sidecar = true
    }
    # a service for a health check to determine the state of Promtail
    service {
      check {
        type     = "http"
        port     = "promtail_healthcheck"
        path     = "/ready"
        interval = "10s"
        timeout  = "2s"
      }
    }
}

OpenTelemetry collector for traces

OpenTelemetry is a collection of tools, APIs, and SDKs. You use it to instrument, generate, collect, and export telemetry data(metrics, logs, and traces) for analysis in order to understand your software’s performance and behaviour.

It’s the future standard for telemetry adopted by just about everyone in the branch (Datadog, AWS, Splunk, Google,Elastic, Honeycomb, Lighstep come to mind), which would allow easily switching backends while keeping the same instrumentation and collector. At the time of writing, only the tracing part is stable, logs and metrics are a work in progress in various stages (Prometheus / OpenMetrics-compatible metrics are in alpha and should be ready by the end of November 2021, while the logs spec is at the design stage), so personally I’d only use it for that part while waiting for the rest to be ready.

The OpenTelemetry(OTEL) collector is an agent you can run alongside tasks (as a sidecar) or on each Nomad client as a system job to collect traces (and one day metrics and logs).

Specifically in regards to Traefik, we can use the latter’s Jaeger tracing compatibility combined with the Jaeger thrift compact receiver in OTEL. Due to the potentially heavy traffic, I run a collector per Traefik, as a sidecar task, just in case.

Here’s an example task block with the OpenTelemetry collector sidecar which sends traces to a Jaeger server (it could be anything else supported by OTEL), with extra tags set via the JAEGER_TAGS env variable:

task "opentelemetry-agent" {
  driver = "docker"
  config {
    image = "otel/opentelemetry-collector-contrib:0.22.0"
    args = [
      "--config=local/otel/config.yaml",
    ]
    # pass the healthcheck and Jaeger Thrift compact ( UDP ) ports 
    ports = ["otel_health", "jaeger_thrift_compact"]
    # extra JAEGER_TAGS
    env {
      JAEGER_TAGS = "mytag=value,mytag2=test"
    }
  }
  # it's a good idea to limit the amount of resources available to the collector
  resources {
    cpu    = 128 # Mhz
    memory = 256 # MB
  }
  # prestart, and sidecar=true, so the OTEL collector will start *before* Traefik, and run for as long as it does
  lifecycle {
    hook    = "prestart"
    sidecar = true
  }
  # a service for a health check to determine the state of the OTEL collector
  service {
    check {
      name     = "health"
      type     = "http"
      port     = "otel_health"
      path     = "/"
      interval = "5s"
      timeout  = "2s"
    }
  }
  # the template with OTEL's YAML configuration file, defining the Jaeger Thrift compact (UDP) receiver,
  # Jaeger Thrift ( HTTP ) exporter to a centrally running Jaeger server registered with Consul
  # and some boilerplate ( healthcheck, batching, retries)
  template {
    data = <<EOH
         receivers:
           jaeger:
             protocols:
               thrift_compact:
         exporters:
           jaeger_thrift:
             url:  {{ range service "jaeger-api-thrift" }}{{ .Address }}:{{ .Port }}{{ end }}/api/traces
             timeout: 2s
           logging:
             loglevel: debug
         processors:
           batch:
           queued_retry:
         extensions:
           health_check:
         service:
           extensions: [health_check]
           pipelines:
             traces:
              receivers: [jaeger]
              processors: [batch]
              exporters: [jaeger_thrift]
    EOH
    destination = "local/otel/config.yaml"
  }
}

And to make use of it, Traefik needs the following extra arguments:

"--tracing.jaeger=true",
"--tracing.jaeger.localAgentHostPort=${NOMAD_ADDR_jaeger_thrift_compact}",

To send traces in the Jaeger thrift compact protocol to the IP and port(NOMAD_ADDR) of the jaeger_thrift_compact port.

Discuss

Hacker News

Twitter