Using Traefik on Nomad
Table of Contents
Introduction
Traefik is a great load balancer, which uses dynamic configuration from a variety of providers, notably in this case Consul Catalog, which Nomad jobs can register into, providing a fast and easy way of having automatic virtual hosts and load balancing (ingress) for all of our Nomad jobs. There’s already a decent basic tutorial on Hashicorp Learn about doing just that, so I’ll focus on more advanced patterns and use cases that I’ve had to deal with.
A full Nomad job file can be found here.
Base
Traefik’ s configuration comes in two parts - static, like on which ports to listen, tracing, logs, etc. and dynamic, which comes from a third-party source(like Consul’ s service catalogue) and is used to dynamically create routers (aka virtual hosts, which can be HTTP, TCP or UDP) and services(aka backends).
The static configuration can be provided via CLI flags on start-up, or auto-reloaded YAML / TOML configuration files (which in the case of Traefik running on Nomad can come from Consul or Vault via the template
stanza), and contains the basics - entrypoints(on which ports to listen), access logs, metrics, etc. and middlewares (Attached to the routers, pieces of middleware are a means of tweaking the requests before they are sent to your service).
Personally I prefer to have as much as possible in the Nomad file for better version history and rollbackability (if the configuration of a middleware comes from a YAML file stored in Consul, its updates are independent of Nomad, and you if you rollback your Nomad deployment to version X, it won’ t rollback the value in Consul as well), and specifically in the case of Traefik, I think that CLI flags are a bit leaner and sufficiently readable for the basics.
A small example, minimal Traefik configuration with the dashboard and API enabled, healthcheck, Prometheus-compatible metrics, access logs :
# to make use of the file, we need to tell Traefik to watch the folder where it is
args = ["--providers.file.directory=local/traefik/"]
template {
data = <<EOH
ping: {}
accessLog: {}
api:
dashboard: true
metrics:
entryPoint: metrics
EOH
destination = "local/traefik/base-config.yaml"
args = [
"--ping=true",
"--accesslog=true",
"--api=true",
"--metrics=true",
"--metrics.prometheus=true",
"--metrics.prometheus.entryPoint=metrics",
]
I’ll provide most examples from here on in the CLI format, but converting them to TOML or YAML shouldn’t be very complicated in most cases with the help of the Traefik docs, which show everything in the different versions.
The bare minimum to get Traefik running under Nomad looks like this:
task "traefik" {
driver = "docker"
config {
image = "traefik:2.4"
args = [
"--entryPoints.http.address=:80",
"--accesslog=true",
"--api=true",
"--metrics=true",
"--metrics.prometheus=true",
"--metrics.prometheus.entryPoint=http",
"--ping=true",
"--ping.entryPoint=http",
]
ports = ["http"]
}
Static configuration from Consul via templates
Traefik has a file
configuration provider (with automatic and hot reloads, hence the noop
change_mode), and with the following argument --providersfile.directory=local/traefik/
we can use Nomad templates to generate extra configuration coming from Consul:
template {
data = "{{ key \"traefik/auth\" }} "
destination = "local/traefik/auth.yaml"
change_mode = "noop"
}
As I already mentioned though, I prefer to have most of Traefik’ s configuration in the Nomad job file directly for rollbackability.
TLS configuration from Vault via templates
Using the same argument --providers.file.directory=local/traefik/
and the following templates, we can have TLS certificates stored in Vault with restarts (due to the fact that Traefik doesn’t support hot reload of certificates).
template {
data = <<EOH
tls:
certificates:
- certFile: "local/secret/example.com.crt"
keyFile: "local/secret/example.com.key"
EOH
destination = "local/traefik/cert.yaml"
change_mode = "noop"
}
template {
data = "{{ with secret \"secret/data/example.com\" }}{{.Data.data.key}}{{end}}"
destination = "local/secret/example.com.key"
change_mode = "restart"
splay = "1m"
}
template {
data = "{{ with secret \"secret/data/example.com\" }}{{.Data.data.crt}}{{end}}"
destination = "local/secret/example.com.crt"
change_mode = "restart"
splay = "1m"
}
Consul Catalog
To be able to create dynamic routers based on the services we have in Consul, Traefik needs some configuration (full doc here) on itself, and then tags on those services telling it the specifics:
"--providers.consulcatalog=true",
"--providers.consulcatalog.exposedByDefault=false",
"--providers.consulcatalog.endpoint.address=http://172.17.0.1:8500",
"--providers.consulcatalog.prefix=traefik",
"--providers.consulcatalog.defaultrule=Host(`{{ .Name }}.example.com`)",
OR
[providers.consulCatalog]
prefix = "traefik"
exposedByDefault = false
defaultRule = "Host(`{{ .Name }}.{{ index .Labels \"customLabel\"}}`)"
constraints = "Tag(`a.tag.name`)"
[providers.consulCatalog.endpoint]
address = "http://172.17.0.1:8500"
Basically we enable the Consul Catalog provider and point it to a Consul agent, determine the “prefix” (tags on Consul services starting with
prefix
will be looked at for extra configuration, so for prefix = traefik
, tags like traefik.http.routers.api.service = api@internal
will be used), define
whether or not services are exposed by default (IMHO they shouldn’t be), the default routing rule (Host {{.Name }}.example.com
will use service-name.example.com
by default, over-ridable per-service via tags).
Optionally we can also add constraints, which allows to do complex matching on tags to determine if a router should be created or not, mostly useful if exposedByDefault
is true.
Note on the Consul address - since Traefik runs inside Docker, pointing it to the default localhost:8500
won’t work unless we run it in network_mode = host
(sharing the host’s networking, more on that below); in my case, I’m pointing it to the host’s docker0
IP, 172.17.0.1, which is accessible from all Docker containers, and is the same across all my hosts, and making Consul listen on the docker0
on top of usual localhost via the following configuration on the consul agent.
addresses {
http = "127.0.0.1 {{ GetInterfaceIP \"docker0\" }}"
}
There’s also bunch of more advanced options around Consul auth, API read consistency requirements and refreshInterval.
Here’s how to make use of the provider in the Nomad job file via the service
stanza:
service {
name = "my-service"
port = "https"
tags = [
# HTTPS-only service which will be called when the URL is example.com or example.org/something
# with my-middleware attached
"traefik.enable=true",
"traefik.http.routers.my-service.rule=Host(`example.com`) || (Host(`example.org`) && Path(`/something`))",
"traefik.http.routers.my-service.tls=true",
"traefik.http.routers.my-service.middlewares=my-middleware",
]
}
If exposedByDefault
is true and you have all the configuration you need by default (e.g.http->https redirect and other middlewares you need on the entrypoint level), you don’t even need per-service configuration unless you want to make something specific.
Networking
There are two main ways you can configure Nomad in regards to Traefik networking - host
networking or with static
ports. Host networking, which is the same as Docker’s --network="host"
, adds the task to the host’s network namespace and shares its network stack, and is generally not recommended.
Static ports are host-level ports that are declared statically (compared to regular Nomad ports, which are ephemeral and random on the host level, and rather impractical for incoming http / https traffic) and forwarded to the task, like so:
group "traefik" {
network {
port "http" {
static = 80
}
port "https" {
static = 443
}
}
task "traefik" {
driver = "docker"
config {
image = "traefik:2.4"
args = [
"--entryPoints.http.address=:80",
"--entryPoints.https.address=:443",
]
ports = ["http", "https"]
...
This declares the ports 80 and 443, and attaches them to the Traefik task, which uses them as entrypoints named http
and https
.
If you have multiple networks, you can make use of Nomad’s host_network
configuration, where you declare the available networks with aliases on the Nomad client and target them with the host_network
option on ports. ( In reality that’s what Nomad does behind the scenes, using the first network it finds for a hidden default
host_network, and attaching ports to it). Do note however, that if you declare host_networks
, ports which don’t specify the host_network will use the default value of default
, and if no such network exists, allocations will fail (silently for system
jobs).
group "traefik" {
network {
port "http-priv" {
static = 80
host_network = "private"
}
port "https-priv" {
static = 443
host_network = "private"
}
port "http-pub" {
static = 80
host_network = "public"
}
port "https-pub" {
static = 443
host_network = "public"
}
}
task "traefik" {
driver = "docker"
config {
image = "traefik:2.4"
args = [
"--entryPoints.http.address=:80",
"--entryPoints.https.address=:443",
]
ports = ["http-priv", "https-priv", "http-pub", "https-pub"]
...
This will make available the http
and https
Traefik endpoints on both networks, and can of course be adapted to have specific endpoints only on specific networks.
If for some reason the above doesn’t work for you (like if you use VRRP with a virtual IP floating between multiple machines, host_network
might not work due Nomad fingerprinting the networks on start-up, and since the floating IP won’t be available on all machines simultaneously, Nomad won’t bind to it), you can resort to host-mode networking (again, it ’ s generally discouraged) where the task shares the host’s networking, like so:
group "traefik" {
network {
port "http" {
static = 80
}
port "https" {
static = 443
}
}
task "traefik" {
driver = "docker"
config {
image = "traefik:2.4"
args = [
"--entryPoints.http.address=:80",
"--entryPoints.https.address=:443",
]
ports = ["http", "https"]
network_mode = "host"
...
With this, Traefik will be available on ports 80 and 443 on all network interfaces on the host.
Security
Traefik’s API, dashboard and metrics endpoints should be protected for security reasons, which can be done by putting them on a separate entrypoint which is firewalled off and/or adding middlewares with auth. The API and dashboard are represented by a special service, api@internal
, and Prometheus-compatible /metrics
can be put on a dedicated service as well (via themanualRouting = true
option, which will create a prometheus@internal
service ).
Here’s a basic example of that, making use of tags on the Consul service of Traefik to dynamically configure the routes and middlewares:
task "traefik" {
driver = "docker"
config {
image = "traefik:2.4"
args = [
# defining 3 entrypoints, on ports 80, 443 and 8080 ( for admin stuff)
# and putting the healthcheck and metrics (with manual routing) on the admin endpoint
"--entryPoints.http.address=:80",
"--entryPoints.https.address=:443",
"--entryPoints.admin.address=:8080",
"--entrypoints.http.http.redirections.entryPoint.to=https",
"--accesslog=true",
"--api=true",
"--metrics=true",
"--metrics.prometheus=true",
"--metrics.prometheus.entryPoint=admin",
"--metrics.prometheus.manualrouting=true",
"--ping=true",
"--ping.entryPoint=admin",
"--providers.consulcatalog=true",
"--providers.consulcatalog.endpoint.address=http://172.17.0.1:8500",
"--providers.consulcatalog.prefix=traefik",
]
ports = ["http", "https", "admin"]
}
service {
name = "traefik"
port = "https"
tags = [
# using Consul service tags prefixed by traefik (as defined in `--providers.consulcatalog.prefix`)
# to configure api&dashboard routers
# with a headers check ( spoofable ) and a basic auth middleware attached inline
"traefik.enable=true",
"traefik.http.routers.api.rule=Host(`traefik.example.com`) && HeadersRegexp(`X-Real-Ip`, `^(10.1.1.1)$`)",
"traefik.http.routers.api.service=api@internal",
"traefik.http.routers.api.middlewares=basic-auth",
"traefik.http.middlewares.basic-auth.basicauth.users=admin:xxx",
]
# healthcheck using the appropriate port
check {
name = "alive"
type = "http"
port = "admin"
path = "/ping"
interval = "5s"
timeout = "2s"
}
}
}
Minimising downtime during updates
Traditionally, an ingress/load balancer/reverse proxy such as Traefik would run on all client nodes and route appropriately. In such a scenario, you need some way to distribute the incoming traffic between the client nodes (such as round-robin DNS, L4 load balancing on the router/provider, BGP/Anycast, etc.), and to know their health and stop sending requests their way if they’re unavailable. One of the potential scenarios to deal with gracefully are node draining or configuration updates.
To achieve that on Nomad with minimal downtime, we use system
jobs, staggered updates that don’t restart all instances simultaneously with the uppdate
stanza, and a combination of the kill_timeout
task parameter and Traefik’s lifeCycle.requestAcceptGraceTimeout and lifeCycle.graceTimeOut to allow for requests to finish gracefully:
job "traefik" {
region = "global"
datacenters = ["dc1"]
type = "system"
# only one Traefik instance will be restarted at a time, with 1 minute delay between each such action
# and automatic rollback to the previous version if the new one doesn't pass the health check
update {
max_parallel = 1
stagger = "1m"
auto_revert = true
}
# Nomad will wait for 30s after sending the kill signal to the task before forcefully shutting it down
# by default it's 10s ( not enough to properly drain connections )
# and the maximum is limited by the max_kill_timeout setting on the Nomad client ( default 30s)
kill_timeout = "30s"
group "traefik" {
task "traefik" {
driver = "docker"
config {
image = "traefik:2.4"
# Traefik will continue serving new requests for 15s while failing its healthcheck,
# giving time to downstream load balancers to take it out of their pool
# and will *then* wait for 10s for existing requests to finish before shutting down,
# before Nomad forcefully kills it 30s after initiating
args = [
"--entryPoints.http.address=:80",
"--entryPoints.http.transport.lifeCycle.requestAcceptGraceTimeout=15s",
"--entryPoints.http.transport.lifeCycle.graceTimeOut=10s",
"--entryPoints.https.address=:443",
"--entryPoints.https.transport.lifeCycle.requestAcceptGraceTimeout=15s",
"--entryPoints.https.transport.lifeCycle.graceTimeOut=10s",
...
]
...
}
}
Sidecars
A sidecar is a task that runs alongside the main task doing something auxiliary like collecting logs, metrics, traces. I run two sidecars with my Traefik instances, one for logs and another for traces, due to the heavy load and specific configuration required for it.
Promtail for logs
Promtail is a logging agent from Grafana Labs that can get logs from a variety of sources, and then send them to Loki, a highly-scalable yet lightweight log aggregation system inspired by Prometheus. It’s drastically simpler and lighter than ElasticSearch (sometimes a default choice for centralised log management) to setup, and does a very good job, so that’s what I use (there’s a blog post in the works on that too).
Here’s an example task
block for a Promtail sidecar that can collect Traefik’s logs, parse them with a regex, and add some labels (index fields for searching), with the associated lifecycle policy(sidecar, poststart), healthcheck, resource limitations:
task "promtail" {
driver = "docker"
config {
image = "grafana/promtail:2.2.0"
args = [
"-config.file",
"local/config.yaml",
"-print-config-stderr",
]
# the only port required is for the healthcheck
ports = ["promtail_healthcheck"]
}
# the template with Promtail's YAML configuration file, configuring the files to scrape,
# the Loki server to send the logs to (based on a registered Consul service, but it could be a fixed URL),
# the regex to parse the Common Log Format (https://en.wikipedia.org/wiki/Common_Log_Format) used for access logs,
# and the labels ( HTTP method and status code)
template {
data = <<EOH
server:
http_listen_port: 3000
grpc_listen_port: 0
positions:
filename: /alloc/positions.yaml
client:
url: {{ range service "loki" }}{{ .Address }}:{{ .Port }}{{ end }}/loki/api/v1/push
scrape_configs:
- job_name: local
static_configs:
- targets:
- localhost
labels:
job: traefik
__path__: "/alloc/logs/traefik.std*.0"
pipeline_stages:
- regex:
expression: '^(?P<remote_addr>[\w\.]+) - (?P<remote_user>[^ ]*) \[(?P<time_local>.*)\] "(?P<method>[^ ]*) (?P<request>[^ ]*) (?P<protocol>[^ ]*)" (P<status>[\d]+) (? P<body_bytes_sent>[\d]+) "(?P<http_referer>[^"]*)" "(?P<http_user_agent>[^"]*)"?'
- labels:
method:
status:
EOH
destination = "local/config.yaml"
}
resources {
cpu = 50
memory = 256
}
# poststart, and sidecar=true, so Promtail will start *after* Traefik ( since it has nothing to do before Traefik isup and running),
# and run for as long as it does
lifecycle {
hook = "poststart"
sidecar = true
}
# a service for a health check to determine the state of Promtail
service {
check {
type = "http"
port = "promtail_healthcheck"
path = "/ready"
interval = "10s"
timeout = "2s"
}
}
}
OpenTelemetry collector for traces
OpenTelemetry is a collection of tools, APIs, and SDKs. You use it to instrument, generate, collect, and export telemetry data(metrics, logs, and traces) for analysis in order to understand your software’s performance and behaviour.
It’s the future standard for telemetry adopted by just about everyone in the branch (Datadog, AWS, Splunk, Google,Elastic, Honeycomb, Lighstep come to mind), which would allow easily switching backends while keeping the same instrumentation and collector. At the time of writing, only the tracing part is stable, logs and metrics are a work in progress in various stages (Prometheus / OpenMetrics-compatible metrics are in alpha and should be ready by the end of November 2021, while the logs spec is at the design stage), so personally I’d only use it for that part while waiting for the rest to be ready.
The OpenTelemetry(OTEL) collector is an agent you can run alongside tasks (as a sidecar) or on each Nomad client as a system
job to collect traces (and one day metrics and logs).
Specifically in regards to Traefik, we can use the latter’s Jaeger tracing compatibility combined with the Jaeger thrift compact receiver in OTEL. Due to the potentially heavy traffic, I run a collector per Traefik, as a sidecar task, just in case.
Here’s an example task
block with the OpenTelemetry collector sidecar which sends traces to a Jaeger server (it could be anything else supported by OTEL), with extra tags set via the JAEGER_TAGS
env variable:
task "opentelemetry-agent" {
driver = "docker"
config {
image = "otel/opentelemetry-collector-contrib:0.22.0"
args = [
"--config=local/otel/config.yaml",
]
# pass the healthcheck and Jaeger Thrift compact ( UDP ) ports
ports = ["otel_health", "jaeger_thrift_compact"]
# extra JAEGER_TAGS
env {
JAEGER_TAGS = "mytag=value,mytag2=test"
}
}
# it's a good idea to limit the amount of resources available to the collector
resources {
cpu = 128 # Mhz
memory = 256 # MB
}
# prestart, and sidecar=true, so the OTEL collector will start *before* Traefik, and run for as long as it does
lifecycle {
hook = "prestart"
sidecar = true
}
# a service for a health check to determine the state of the OTEL collector
service {
check {
name = "health"
type = "http"
port = "otel_health"
path = "/"
interval = "5s"
timeout = "2s"
}
}
# the template with OTEL's YAML configuration file, defining the Jaeger Thrift compact (UDP) receiver,
# Jaeger Thrift ( HTTP ) exporter to a centrally running Jaeger server registered with Consul
# and some boilerplate ( healthcheck, batching, retries)
template {
data = <<EOH
receivers:
jaeger:
protocols:
thrift_compact:
exporters:
jaeger_thrift:
url: {{ range service "jaeger-api-thrift" }}{{ .Address }}:{{ .Port }}{{ end }}/api/traces
timeout: 2s
logging:
loglevel: debug
processors:
batch:
queued_retry:
extensions:
health_check:
service:
extensions: [health_check]
pipelines:
traces:
receivers: [jaeger]
processors: [batch]
exporters: [jaeger_thrift]
EOH
destination = "local/otel/config.yaml"
}
}
And to make use of it, Traefik needs the following extra arguments:
"--tracing.jaeger=true",
"--tracing.jaeger.localAgentHostPort=${NOMAD_ADDR_jaeger_thrift_compact}",
To send traces in the Jaeger thrift compact protocol to the IP and port(NOMAD_ADDR) of the jaeger_thrift_compact port.