Table of Contents

Pre-introduction

Recently I stumbled upon and then stumbled upon again on David Anderson’s interesting post about “new Kubernetes”, based on a discussion he had with Vallery Lancey about what they would do differently if they were rewriting Kubernetes from scratch. Interestingly, a decent part of the proposals for a “new Kubernetes” are design choices made by Hashicorp for Nomad, which is a pretty underrated orchestrator, and drastically simpler ( one of the main goals of said “new Kubernetes”).

Some people are aware that Docker Swarm kinda exists but is abandonware/on life support, and isn’t really recommended anymore, but it still comes up in discussions due to how easy it is to use. For most, that leaves Kubernetes as the only “serious” option, but it is a very complex piece of software, with a lot of moving parts, which isn’t actually required or need in most cases.

This inspired me to write a series on Nomad, what it is, why it’s great, where it’s lacking and how to use it.

Introduction - what is Nomad and why it’s great

Hashicorp’s Nomad is a simple to run and maintain, yet very flexible task scheduler/orchestrator. It relies on plugins for execution, autoscaling and other features, and can run pretty much anything via its task drivers - Docker, contairnerd, LXC, rkt, podman, Java, fork/exec, QEMU, firecracker, FreeBSD jails.

It comes in the form of a single binary, run in two modes (server, in groups of 3 or 5, which make scheduling decisions and host the APIs and configuration, and an unlimited number of workers which actually run whatever it is you want to run), and can be automatically clustered via Consul. The configuration ( both for jobs and of Nomad itself) is in HCL (I’ll get into more detail about how great that is a bit later) or JSON (mainly for when the jobs are submitted by machines/scripts/tooling and not humans). Multiple clusters can be connected via multi-region federation for sharing ACLs and for API forwarding ( you can submit a job or request logs to any server for any region and it will be forwarded to the appropriate server). Deployments can be complex out of the box ( rolling, canary, blue/green), and everything is version controlled and rollbackable.

Like most HashiCorp tools, it’s “open core”, meaning that the majority of features are available in an open source version, and some more advanced/enterprise-y ones ( in Nomad’s case, multi-region/cluster deployments - deploying something simultaneously to multiple separate clusters, policy as code with Sentinel and similar ) require upgrading to Nomad Enterprise.

Primitives

  • job is a declarative file which contains groups of tasks, each task being a container/binary/anything run by an exec driver
  • system jobs (run on all client nodes, equivalent to Kubernetes DaemonSets, for monitoring/logging agents/load balancers)
  • periodic jobs (equivalent to cronjobs)
  • service, which registers as a Consul service and is thus discoverable ( via API or DNS)
  • deployment, each version of a job, they’re tracked and can be rollbacked to
  • allocation, each instance of a task ( group ) on a node
  • namespace, a logical unit to organise jobs in and ACLs around

Jobs

Example of a very basic job that runs a Docker container (jaegertracing/all-in-one:1.21), with limits of 1000Mhz of CPU and 1024MB of RAM, and registers the service with Consul:

job "jaeger" {
        type = "service"
        group "api" {
            task "jaeger" {
                driver = "docker"
                config { 
                  image = "jaegertracing/all-in-one:1.21"
                }
                resources {
                  cpu = 1000
                  memory = 1024
                }
                service {
                  name = "jaeger-query"
                }
            }
        }            
}

Note that this is a very basic job, there are no healthchecks, no persistent storage, no extra configuration, no update strategy, no autoscaling, no exposed ports.

Deployment history and rollback

Nomad tracks each job’s full definitions and deployment history, and allows you to easily rollback and compare them, via the UI, CLI or API, e.g.:

# List the versions of the job named "opentelemetry-collector"
$ nomad job history opentelemetry-collector
Version     = 1
Stable      = false
Submit Date = 2021-01-08T21:30:30+01:00

Version     = 0
Stable      = true
Submit Date = 2021-01-08T21:29:48+01:00

# Check the difference between versions
$ nomad job history -p opentelemetry-collector
Version     = 1
Stable      = false
Submit Date = 2021-01-08T21:30:30+01:00
Diff        =
+/- Job: "opentelemetry-collector"
+/- Task Group: "opentelemetry-collector"
  +/- Task: "opentelemetry-collector"
    +/- Config {
          args[0]:  "--config=local/otel/config.yaml"
      +/- image:    "otel/opentelemetry-collector-contrib:0.15.0" => "otel/opentelemetry-collector-contrib:0.16.0"
          ports[0]: "health"
          ports[1]: "jaeger_thrift_compact"
        }

Version     = 0
Stable      = true
Submit Date = 2021-01-08T21:29:48+01:00

# Revert job "opentelemetry-collector" to version 0
$ nomad job revert opentelemetry-collector 0

State tracking and job planning

Nomad keeps the desired state and its history, and with nomad job plan, similar to terraform plan, allows us to preview what will change upon applying a new job file. There’s also a feature to verify nothing has changed between the plan and run (equivalent to terraform apply with a plan file) with the -check-index flag:

$ nomad job plan otel.nomad
+/- Job: "otel"
+/- Task Group: "opentelemetry" (1 create/destroy update)
  +/- Task: "opentelemetry-collector" (forces create/destroy update)
    +/- Config {
          args[0]:  "--config=local/otel/config.yaml"
      +/- image:    "otel/opentelemetry-collector-contrib:0.15.0" => "otel/opentelemetry-collector-contrib:0.20.0"
          ports[0]: "health"
          ports[1]: "jaeger_thrift_compact"
        }
Scheduler dry-run:
- All tasks successfully allocated.

Job Modify Index: 413
To submit the job with version verification run:

nomad job run -check-index 413 otel.nomad

When running the job with the check-index flag, the job will only be run if the
job modify index given matches the server-side version. If the index has
changed, another user has modified the job and the plan's results are
potentially invalid.

Overall, it’s a very useful feature, especially when collaborating, locally or via CI/CD.

Checking the status and logs

To check the status of a job, there are a few commands under nomad job and nomad alloc

$ nomad job status otel
ID            = otel
Name          = otel
Submit Date   = 2021-02-27T20:41:29+01:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
otel      0       0         1        0       0         0

Latest Deployment
ID          = ea533b6f
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
otel      1        1       1        0          2021-02-27T20:51:45+01:00

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created  Modified
89031cfd  d3cbeb7e  otel      0        run      running  20s ago  4s ago

# logs are at the allocation level ( similar to Kubernetes, where they're at the container level), so we get them with the alloc id
$ nomad alloc logs 89031cfd
[...]

lifecycle and sidecars

Nomad allows defining the lifecycle of tasks in task groups, and their status, with the lifecycle stanza. We can have prestart ( for initialisation ), poststart ( companion, for proxying (aka ambassador and adapter pattern in Kubernetes )) or poststop for clean up, and via the sidecar bool we define whether or not it should run as long as the main task(s), e.g.:

  task "init" {
    lifecycle {
      hook = "prestart"
      sidecar = false
    }
    driver = "docker"
    config {
      image = "alpine/httpie"
      command = "http"
      args = [
        "POST",
        "https://some-internal-service-for-provisioning-stuff.local/v1/new",
        "job_id='${NOMAD_JOB_ID}!'"
      ]
    }
  }

  task "fluentd" {
    lifecycle {
      hook = "poststart" # should start after the main task
      sidecar = true # should run as long as the main task does, and be restarted if it fails
    }
    driver = "docker"
    config {
      image = "fluentd/fluentd"
    }
    ...
  }

  task "main-app" {
    ...
  }

  task "cleanup" {
    lifecycle {
      hook = "poststop"
    }
    driver = "docker"
    config {
      image = "alpine"
      command = "rm -rf"
      args = [
        "/var/lib/volume-with-super-secret-data"
      
    }
  }

ACL/RBAC

ACL ( access-control list ), or RBAC ( role-based access control ) as it’s known in Kubernetes, allow defining who can do what, so that not everyone with network access can have full administrator privileges and run/stop whatever. Nomad’s ACL system is pretty similar to Consul and Vault’s, and uses JSON ( mostly for non-humans ) or HCL to define policies with rules, which describe what action is allowed on what object.

# a basic policy which allows the predefined read policy with read-only access to list and read:
# job, volume and scaling details, and extra capabilities for job creation and log access within the default namespace
namespace "default" {
  policy = "read"
  capabilities = ["submit-job","dispatch-job","read-logs"]
}

Assignment of policies is done only via the CLI, unlike Kubernetes where that happens in YAML, as does policy management:

# create/update the policy within Nomad
nomad acl policy apply -description "Application Developer policy" my-policy my-policy.hcl
nomad acl token create -name="Test token" -policy=my-policy -type=client
Accessor ID  = 4e3c1ac7-52d0-6c68-94a2-5e75f17e657e
Secret ID    = 0be3c623-cc90-3645-c29d-5f0629084f68
Name         = Test token
Type         = client
Global       = false
Policies     = [my-policy]
Create Time  = 2021-02-10 18:41:53.851133 +0000 UTC
Create Index = 15
Modify Index = 15

Just for comparison, the (in my opinion weirdly verbose to write due to YAML ) syntax for the equivalent in Kubernetes:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: test
rules:
- apiGroups: [""] 
  resources: ["pods", "services"] # a Nomad job contains both the pod equivalents ( task groups ) and services
  verbs: ["get", "watch", "list", "logs", "create", "update", "patch"]

And with this token, either as an env variable ( NOMAD_TOKEN) or flag (-token) for the CLI or HTTP header ( X-Nomad-Token) for the API, we can do things.

ACL policies and tokens are optionally shared with federated clusters for simplified management. We can also have ephemeral tokens via Vault’s Nomad Secret Backend, which generates single/short-use tokens with specific policies, but there’s no implicit or explicit job role equivalent to Kubernetes' Service Accounts, one has to pass through Vault for that ( assign a Vault policy to the job).

Simplicity

Nomad is easy to install, maintain, update and scale, even with “advanced” features such as linking multiple clusters across datacenters/regions.

Running locally

Running Nomad locally for development/testing is just a matter of downloading the binary and running nomad agent -dev ( significantly easier than microk8s or minikube or kind), and the same goes for Consul and Vault ( which you might need to replicate the production Nomad environment locally).

Upgrading

Upgrading Nomad servers is just a matter of replacing the binary and restarting the service. There are detailed upgrade guides which list the main changes and potential breaking ones/things to take care of, but it’s relatively rare ( and will be even less so since it’s already on 1.0+). Clients need to be drained first before upgrading ( for which there’s also a detailed guide ), and the behaviour of jobs during that operation can be tweaked via the migrate stanza.

Monitoring

Nomad collects extensive metrics on itself and everything it runs within it, which can be send to compatible agents ( statsD, Datadog ) or scraped by a Prometheus/OpenMetrics-compatible scraper, and they even have a guide on setting up Prometheus to monitor and alert.

Cluster/multi-cluster joining

Forming/joining a cluster can be done manually via nomad server join, via the server_join configuration block( which can use static IPs/DNS, or dynamic cloud auto-join ( based on cloud provider tags or similar)), or via Consul. Federation is done by joining a server in another cluster via its WAN IP/DNS and port:

nomad server join 172.31.26.138:4648 

Integrations

Nomad has an extensive API (which includes cool recent additions like the event stream, allowing the creation integrations and tools that act based on what happens in your Nomad cluster), and it integrates well with a bunch of other tools, some of them from Hashicorp, nicely complementing Nomad to rival the features of the more feature-rich Kubernetes and its ecosystem.

Things like Service Discovery and K/V storage (Services and ConfigMaps respectively) and secret storage (Secrets) and even features part of the larger Kubernetes ecosystem like service mesh are delegated to other, existing, well-used and battle tested parts of the HashiStack (Consul and Vault), which makes sense - it follows the Unix philosophy “do one thing and do it well”, and makes things easier for Hashicorp and its users - Consul and Vault are already heavily used, stable and very popular in their respective niches, and the decoupling and separation of concerns allows us to use them outside of Nomad (e.g. you can use Vault, hosted or not on Nomad, for secret storage/dynamic secrets/auth for apps running in Nomad or anywhere else; maybe you even already have a running Vault cluster, and you can just connect your Nomad cluster to it and you have the same storage for all your secrets, regardles of where they are used from).

The big downside is that if you just want to run a Nomad cluster, you kinda need two other tools to install, maintain, stay up to date on, etc. but seeing that all three are similar ( single binary, Raft consensus algorithm, great documentation including very detailed upgrade guides ), it’s not that complex to achieve.

There are also integrations, via Consul, with third-party tools such as Traefik and Fabio for automatic Reverse proxy/Load Balancing.

Integrated Templating

From Consul or Vault

One way that you can make use of Vault and Consul in Nomad job files is via the template stanza (configuration block) in nomad jobs, which allows the creation of files or environment variables based on templates, and is based Hashicorp’s venerable consul-template.

# creating a YAML configuration file from Consul K/V and services, Nomad metadata and Vault secrets:
template {
  data = <<EOH
  ---
    bind_port:   {{ env "NOMAD_PORT_db" }}  # the port "db"
    scratch_dir: {{ env "NOMAD_TASK_DIR" }} # the task folder
    node_id:     {{ env "node.unique.id" }} # the unique ID of the Nomad node
    service_key: {{ key "service/my-key" }} # populated by service/my-key from Consul K/V store
    loki_addr:   {{ range service "loki" }}{{ .Address }}:{{ .Port }}{{ end }} # populated by the IP and port of the service named "loki" in Consul's Service catalogue
    some_secret: {{ with secret \"secret/data/my-secret\"  }}{{.Data.data.value}} {{end}}" # populated by the secret/my-secret secret in Vault
  EOH
  destination = "local/file.yml"
}
# doing the same but instead of writing in a YAML file, export the values as env variables:
template {
 ...
  destination = "secrets/file.env" # secrets is a special, per-task folder that you can use to store secrets and isn't browseable via API/CLI/UI. More on that below
  env         = true
}

HCLv2

HCLv2 greatly improves upon HCLv1 with more dynamism and better type management. One can use variables, for loops, include files, etc. Some examples with common use cases:

Using an “env” variable to determine if we’re in local dev or production, and set variables accordingly; and using a list variable with the default arguments in all cases, default ones for local/production, additional arguments, and merging them.

Note: local variables are local to the file, and variable variables are more akin to function parameters ( they can be passed via file, env variable, with defaults, etc.)

variable "env" {
  default = "local"
}
variable "args" {
    type = list(string)
    default = []
}
locals {
  datacenter = var.env == "local" ? "dc1" : "eu-west-1"
  count = var.env == "local" ? 1 : 5
  default_args = ["--something"]
  local_args = ["--verbose"]
  prod_args = ["--production-mode 1", "--secure"]
  args = var.env == "local" ? concat(local.default_args, local.local_args, var.args) : concat(local.default_args, local.prod_args, var.args)
}

job "test" {
  ...
  datacenters = [local.datacenter] # Nomad expects a list here, hence the []
  group "test"{
    count = local.count
    task "test" {
      driver = "docker"
      config {
        image = "registry.test.com:0.1"
        args = [for a in var.args: a] # Nomad expects a list here, hence the []
      }
  }
}

For comparison, either of those are impossible in Kubernetes-land without third-party tooling - Helm, jsonnet, tanka, ytt, etc. for basic templating/logic, which come with a lot of overhead on top of YAML, and things like Vault Agent for sidecar secret injection or the Secrets Store CSI driver, which allows reading secrets as filesystem objects via a CSI driver. Of course, Kubernetes ConfigMaps and Secrets exist for the latter issue, but there are several limitations compared to the Nomad way of doing things:

  • Kubernetes Secrets aren’t really secret, they’re base64 encoded strings stored in etcd ( so you’re relying on it being encrypted for “security”)
  • Kubernetes ConfigMaps and Secrets are fully static, you can’t throw in any dynamism ( an if, a for loop, the IP/port of a service, etc.)
  • both are disconnected from deployments - if you update a ConfigMap used by a Pod, the latter needs to be restarted for the changes to be recongized ( compared to Nomad’s change_mode which allows you to control how and if the task is restarted/reloaded upon configuration change)
  • you can’t mix and match, like having a single file with secrets, a service address, K/V config, etc. ( unless you do somethig manually with an initContainer and bash)
  • you can’t have dynamic secrets ( like AWS or database credentials with a limited time to live) without external tooling taking care of renews and restarts

Deployment primitives

Nomad supports natively a few deployment methods - rolling updates ( each instance is updated X at a time ), canary ( a small part of the running instances are updated, they’re monitored for anomalies/bugs/errors/etc., and if everything is fine over some period of time, the rest are updated as well) and blue/green (two equally-sized environments are running, blue and green one, but only one is serving traffic; the other one gets the updated version in full, tests are run, and once all is good, traffic is switched to it ), and can do automated rollbacks.

All of that, including timeouts, counts, etc. is configured via the update stanza :

# generic configuration
count = 3 # 3 allocations 
update {
  max_parallel = 1 # number of instances to upgrade in parallel
  health_check = "checks" # what determines the state of the allocation - it could be health *checks*, *state* (if all tasks are running) or *manual* for human/monitoring/etc. via the API
}

# canary
count = 3 # 3 allocations 
update {
  max_parallel = 1
  canary = 1 # 1 canary allocation
}

# blue/green
count = 3 # 3 allocations 
update {
  max_parallel = 3 
  canary = 3 # canary=max_parallel=count => Green env
}

Canaries or blue/green deployments can be promoted via the CLI, web UI or API ( e.g. automatically based on metrics/logs/traces data). Hashicorp have a detailed-ish guide with examples on their HashiCorp Learn platform.

Networking

Networking on Nomad is, at its base, very simple and basic, at least when it comes to containers. By default each task gets an IP (as provided by Docker/etc.) in bridge mode, in a shared namespace with the other tasks in its group ( to allow sidecar proxies ). Ports can be exposed by using host networking, dynamic or static port forwarding and accessed directly or via sidecar proxies, or via a CNI plugin if advanced features are needed.

The more or less standard/recommended behaviour is inspired by Google’s Borg - any scheduled task group gets a random port from the host if it requires it, and service discovery is responsible for linking and discovering services across the dynamic ports. In Nomad’s case, that’s done via Consul ( either DNS, API or in Nomad with templating). It’s also possible to have the containers be IPv6-only, and advertise that to Consul.

More advanced options include scheduling ports only on specific networks and only exposing tasks via Consul Connect/Service Mesh, which can be used with Kubernetes, Nomad, regular virtual or physical machines.

Storage

By default, allocations are ephemeral and with no persistence. Each one gets a shared alloc folder, to which all tasks within the allocation can read and write, with 3 standard folders inside - data for ephemeral_disk-declared data ( more on that later), logs which contains all tasks' logs, and tmp, which used as scratch space by task drivers. Besides that, each tasks gets its own folder ( on the same level as alloc ), with local ( to be used at will, for e.g. configuration files), secrets (for secrets, and is therefore unavailable for browsing via the UI, API or nomad alloc fs command, unlike the others) and tmp subfolders. Tasks can’t access other tasks' folders, so cross-task things ( e.g. if you’re using a sidecar log shipper for custom logs) should use the alloc folder.

There’s an ephemeral_disk stanza, which allows for somewhat persistent storage:

job "example" {
  group "example" {
    ephemeral_disk {
      migrate = true
      size    = 500 # in MB
      sticky  = true
    }
  }
}

This will create a 500MB “disk” ( it isn’t an enforced limit, it’s more of a capacity planning thing and it’s used for scheduling decisions) that should be migrated to another node if the one it’s currently on gets drained ( migrate = true), and Nomad will try to place updated allocations on the same node and move the ephemeral_disk ( corresponding to the alloc/data on the allocation level and local on the task level folders ) to it. This is all best-effort, with zero guarantees and of course, if the node fails, the data will be lost.

Actual persistence is achieved in two main ways:

  • Host volumes ( folders on the hosts mounted inside the allocations ), redundant via some external to the Nomad cluster tooling ( NFS, GlusterFS, Ceph, Portworx, etc.)
  • CSI volumes ( more on that below )

Plugins

A lot of the key components of Nomad are delegated to plugins ( some included and distributed with Nomad, some external, and all with an open spec, so anyone can contribute custom plugins), such as autoscaling, task drivers, advanced networking (CNI) and storage (CSI) to enable greater flexibility.

Task drivers

As mentioned before, Nomad can schedule and run all sorts of different things ( Docker, contairnerd, LXC, rkt, podman, Java, fork/exec, QEMU, firecracker, FreeBSD jails, Windows IIS). A few of them are maintained by the community, and there’s an official guide on writing custom ones, which is infinitely more flexible than Kubernetes, which is fixed on Docker/contairnerd ( even if projects such as kubevirt exist).

CSI and CNI

In a bid to make Nomad more compatible with cloud-native software, it implements the CSI and CNI specs for storage and networking respectively, meaning that you can use plugins following those specs ( popularised by Kubernetes and hosted by the CNCF) with Nomad. In theory that means that Nomad can tap in to a vast ecosystem of existing tooling and plugins and one can ( more or less ) easily move from Kubernetes to Nomad and keep the same third-party helpers/tooling/infrastructure - external storage system ( AWS EBS/EFS or NetApp or Ceph, etc.), network overlay ( Weave, Flannel, etc.), but in practice there are some limitations, most notably the fact that not the whole CSI spec is implemented. Nonetheless, that’s a great direction and the HashiCorp team is working on improving.

Autoscaling

Nomad supports autoscaling of various types, managed by external autoscalers, and Hashicorp provide the Nomad Autoscaler project, which supports horizontal application ( add more instances of task group) and cluster (add more Nomad workers) scaling in the open source version, and dynamic application sizing ( add more resources to a task based on its actual real life usage) in Nomad Enterprise, but anyone can implement a custom autoscaler via Nomad’s API.

Nomad Autoscaler supports plugins for the following components:

  • APM ( what to query for application metrics to use for decisions ) - Prometheus, Datadog or a limited integrated Nomad one
  • strategy ( based on the current and desired state, decide what to do ) - target value ( what the value of a metric should be) and the Enterprise-only dynamic application sizing average strategy, max strategy, percentile strategy
  • target ( what to scale ) - Nomad task group, dynamic application sizing for horizontal application scaling, and AWS Autoscaling Group, Azure Virtual Machine Scale Set for cluster autoscaling ( a GCP Managed Instance group plugin is in the works ).

And as per usual, anyone can write custom plugins for specific use cases.

Advantages

Easy, lightweight and low maintenance

Nomad is a single binary, with relatively few moving components ( optionally, but more often than not ) required - most notably Consul, Vault and potentially some plugins. Upgrading, adding extra nodes etc, is easy. Making the jump from test to production is also straightforward and consists of more or less the following:

Roughly speaking, you can be up and running from zero to pre-production within a day or two, and maintenance is minimal outside of updates, which rarely introduce breaking changes.

Scale

On top of all that, Nomad can also scale well. Hashicorp did a few experiments with that, first with 1 million containers back in 2016, and in December 2020, for the 1.0 release, they ran 2 million containers across 6,100 AWS Spot instances, with just 3 Nomad servers (AWS i3.16xlarges, but still) orchestrating all of that.

HCL

No YAML

As I mentioned, Nomad eschews the wildly popular YAML and uses Hashicorp’s own HCL ( Hashicorp Configuration Language ), which was created because none of JSON or YAML were good enough for configuration purposes, and that’s still as true as it was a few years ago. JSON is overly verbose and not really fit for configuration ( e.g. no comments - I know that JSON5 exists, but isn’t widely supported, and the official implementations are few, and mostly archived ), and YAML is just terrible beyond 10-20 lines or 3-4 levels. It’s so bad, there’s an entire site dedicated on the subject - https://noyaml.com/, which i won’t quote in it’s entirety, but let’s just say that using spaces for logic is a bad idea and leads to weird, undebuggable errors when copy/pasting and templating via external tools ( such as Helm or Jinja - anyone seen Ansible’s “error at line X, but it probably isn’t there” message? ).

Furthermore, both are limited in that templating/logic is impossible/hacky/dependent on third-party tools ( like Helm, jsonnet/tanka, kustomize or ytt), while HCL can do ( in some cases since not that long ago, but nonetheless ) basic logic such as if/else, for loops, dynamic blocks, variable substitution, data types ( maps, lists, objects ), strict typing thereof, file including and plenty of other goodies. You can have basic logic in your configuration if you need it, you can comment it, you can copy/paste at will without fear of breaking anything, you can split your configuration into multiple files ( e.g. sharing a sidecar logging/tracing/etc. agent). It’s actually pretty great! There are even (sadly third-party) tools to fmt ( in Go, and some other Hashicorp tools, like terraform, there’s format fmt subcommand to format your code according to a basic set of rules, making code more easily readable and always looking the same ).

Self-contained jobs

A very nice thing about nomad job files is that they contain everything related to the job - all the groups and tasks, sidecars, resultant services, configuration, etc. HCL remains legible even at length ( especially within an IDE or text editor with enough features to be able to highlight corresponding curly brackets {} ), but of course if one wants, there’s a file function to import file contents, so some parts can be split into different files if required/preferred,

CSI, CNI

Hashicorp opting for widely used standards where appropriate and practical is a big advantage, allowing Nomad users to profit from developments done in the larger ecosystem. There are caveats, however, and some improvements are needed in Nomad ( e.g. a more complete implementation of the CSI spec ); nonetheless, it’s a great direction and the bases are already there.

Extensibility

The fact that a lot of the main functionalities ( task drivers, device drivers, autoscaling, storage and networking ) are offloaded to plugins is absolutely great. It allows you to move at your own pace when something gets deprecated ( e.g. if you were using rkt and hadn’t had the time to move to something else yet, the fact that you can still run the rkt task driver, even if it’s deprecated and no longer a part of Nomad, is probably reassuring) or add your custom use cases ( e.g. the Firecracker and IIS task drivers were added by community users that needed them). And at the same time, the basic ones are baked in and distributed with the Nomad binary, so you don’t need to install anything to do the “standard” things ( Docker, raw fork/exec, etc.). If you want to make Nomad autoscale on your Proxmox or Nutanix AHV cluster, you can develop it yourself and aren’t stuck waiting for Hashicorp to do it.

Disadvantages

Ecosystem

The Kubernetes ecosystem is massive. There are entire companies, tools and whole niches being built around it ( ArgoCD, Rook, Istio, etc. etc. etc.). In some cases tools exist only because Kubernetes is itself so complex - Helm, Kustomize, there are a bunch of web UIs and IDEs ( Octant, Kubevious, Lens, etc.), specialised tooling to get an overview into the state and security of your Kubernetes cluster ( Sonobuoy, kube-hunter, kube-bench, armosec, pixie). Furthermore, there are literally hundreds of operators that allow abstracting the running of complex software within Kubernetes.

There isn’t a lot of software that even comes close, and Nomad doesn’t. There’s a decent amount of tools it works with, and CNI and CSI support helps it tap into the wider Kubernetes/cloud native ecosystem, but it still isn’t as popular. In practice this means that for running some things, you’re on your own - e.g. if you want to use PostgreSQL, Cassandra, Prometheus, CockroachDB or anything of the like, you can’t just use existing operator with a few CRDs, you have to write the full Nomad configuration for it to work.

Managed service

At the time of writing, there is no Nomad managed service akin to Google Kubernetes Engine, Amazon Elastic Kubernetes Service, Digital Ocean Kubernetes, etc.; if you want to use Nomad, you have to run it yourself ( maintenance is minimal, but still, in some cases offloading the maintenance to a cloud provider is worth the cost even for easy-to-maintain software). Hashicorp have their Hashicorp Cloud Platform, which uses Nomad underneath for their managed Vault and Consul offerings; and Armon Dadgar, co-founder of Hashicorp, said at Hashiconf US 2020 that they are planning on offering a HCP Nomad. Until that arrives ( or AWS decide to make a Nomad managed service), you’re on your own.

Enterprise

Hashicorp are a for-profit company with open core software distribution model. That presents some risks, most notably that some day they might decide to take Nomad ( or Terraform or Consul for that matter) in a direction you don’t like, or, to the extreme, shift features from the open source Nomad version to the Enterprise one - like when InfluxData removed clustering from InfluxDB OSS and moved it to InfluxDB Enterprise. I personally consider that highly unlikely ( because doing so would destroy most of the goodwill they have within the community, make conversion to Enterpise versions harder, etc. etc.) and they genuinely seem like nice people, but it’s still a possibility. In any case, there’s an inherent conflict of interest - Hashicorp wouldn’t allow features competing with their Enterprise offering to be added to Nomad OSS, because that would lower it’s value proposition.

Policy management

One of the main things I’d consider “missing” in Nomad OSS is policy management ( as in Open Policy Agent, Gatekeeper and company - roughly equivalent to Kubernetes admission controllers) - it’s done only via Sentinel which is IMHO a fine policy as code DSL, but it’s only available in Nomad Enterprise. One can work around this with conftest’s HCLv2 parser in CI/CD, coupled with ACLs forcing users to only submit jobs via “GitOps”; but it’s certainly not as powerful (e.g. you can’t easily use runtime data from Nomad).

vs “New Kubernetes”

As I started with my inspiration, “new kubernetes”, it’s fitting to end comparing Nomad to it.

"[…] my guide star is Go. If Kubernetes is C++, what would the Go of orchestration systems look like? Aggressively simple, opinionated, grown slowly and warily, and you can learn it in under a week and get on with what you were actually trying to accomplish"

~ David Anderson

Nomad mostly fits that bill - it’s simple and opinionated. In terms of the features proposed, it does by default:

And it allows, with some work, under some conditions and with some limitations:

  • highly distributed clusters - more specifically, Nomad’s cluster federation is decently featured and stable ( but deploying jobs across multiple regions simultaneously is only available in Nomad Enterprise), and can probably be done over WAN via Consul Mesh/Terminating/Ingress gateways ( also useful for inter-service traffic and failover). Nevertheless, Nomad isn’t designed for clients to stay disconnected from the servers for a long time, and Nomad will reschedule allocations on “lost” nodes after a certain ( tunable ) time has elapsed, so for “edge” scenarios a cluster per location is more appropriate - as CloudFlare do

However, Nomad doesn’t have the concept of control loops, which drastically simplifies it and allocation debugging, but also limits how far you can integrate with it from within to control its behaviour.

Conclusion

Overall, Nomad is a pretty great, simple, opinionated and flexible orchestrator. It has some advantages in features over Kubernetes ( stable cluster federation, task drivers, version control, integrated deployment logic and templating, no YAML), but lacks some other features (CRD), polish ( full CSI spec ) and ecosystem depth.

Some scenarios where Nomad would be more appropriate than Kubernetes:

  • on-prem deployments ( even with RKE/VMware Tanzu/kubeadm etc., running Nomad on bare metal/virtual machines is drastically easier )
  • multi-DC/region presence ( due to cluster federation )
  • not only container needs (QEMU, Java, raw exec, FreeBSD jails, IIS, etc.)
  • small(-er) sized teams
  • limited amount of things to deploy ( e.g. using Kubernetes for Django + Redis + PostgreSQL is certainly overkill; Nomad’s overhead is acceptable for such a stack)
  • following the KISS ( keep it simple, stupid ) principle

Some scenarios where it might not be more appropriate:

  • it might be useful to tap into the wider Kubernetes ecosystem, e.g. operators - if you want to run PostgreSQL, Redis, Cassandra, ElasticSearch, Kafka with limited human resources, it might be easier to do so via Kubernetes Operators ( whether or not such operational complexity, even abstracted, is worth it with a limited team, is an entirely different discussion)
  • if you’re, by choice or not, limited to a single public cloud provider, and can outsource the management of Kubernetes (or even a higher level product, like AWS' Fargate or GCP’s Google Cloud Run/AppEngine), like with GKE and especially Autopilot, there’s little left to you; ( even then, there’s still some complexity that might be unneccessary - e.g. dealing with AWS EKS to run Wordpress might be overkill)

Coming up next

I intend to write a few more articles on Nomad, mostly to tell my recent experiences with specific things around it, like log management ( how to run Loki on top of Nomad, ship logs to it), tracing ( Jaeger, integrate with Traefik/Consul, etc.).

Discuss

Hacker News

Twitter

Telegram