Orchestrating Edge AI workloads on a Jetson Orin Nano with Nomad

Introduction

Recently I’ve been tinkering around with my Home Assistant setup, more specifically adding voice control. I wanted to enhance it with a local LLM (large language model) to be able to do more advanced natural language processing (I’m in a multilingual household, and Home Assistant’s voice control is pretty static - you have to predefine phrases and actions, you can’t just improvise). I could have used a cloud service, but not only is that not fun, but it also has privacy implications. So I wanted to run it all locally, and preferably on a small, power-efficient device that can handle the workload.

Enter the Jetson Orin Nano from NVIDIA.

Jetson Orin Nano

The Jetson Orin Nano is a powerful, compact AI development board from NVIDIA. It has a 6 core ARM CPU, a Tensor GPU with 1024 CUDA/32 tensor cores, and 8GB of 102 GB/s (relatively slow, but usable) RAM. It’s a great platform for running small AI workloads locally, and is very power efficient. Recently NVIDIA released a software update (confusingly called Jetson Orin Nano Super) that added a significant performance improvement (mostly in terms of TOPS and memory bandwidth).

It runs “Jetpack”, a NVIDIA customised (with all the NVIDIA drivers and libraries) version of Ubuntu, and there is a pretty good community, ecosystem and documentation around it. In particular, Jetson Containers is a very good collection of ready-to-use containers (with correct versions of CUDA and all necessary configs, etc.) for popular AI / ML / robotics-related software.

As the underlying OS is based on Ubuntu, the software catalogue is vast, including HashiCorp Nomad, to plug into a larger homelab setup and orchestrate everything from a common control plane.

In my case, I’d like to run Ollama, a local LLM hosting server, to integrate it with Home Assistant, and maybe some other AI workloads. They need to run on the Jetson, but consumers don’t (such as Open WebUI, a frontend for Ollama, or Home Assistant). They can run elsewhere in my lab to avoid wasting precious memory that could be used as VRAM for the Large Language Models. For instance, OpenWebUI takes a good 800+ MB of RAM, and the Jetson has only 8GB, so it would be a waste to run it there.

Nomad

Nomad is a powerful, flexible, and easy to use orchestrator that can run on a wide range of platforms, including edge devices like the Jetson Orin Nano. It supports a variety of workloads, including containers and raw executables, and crucially for Edge scenarios, supports various distributed clusters topologies and tolerates unreliable networks.

Nomad supports linux/arm64, so everything should work out of the box. Furthermore, Nomad has an NVIDIA Device Driver so there should be direct support to fingerprint (discover and make targetable via attributes) the GPU and attach it to jobs.

This doesn’t sound too complex, right? While building it, I was pleasantly surprised with how simple the whole process was. Let’s dive in.

Jetson setup

NVIDIA have a detailed getting started tutorial on how to set up the Jetson Orin Nano, so I won’t go into too much detail about it here. The only tricky part was that the one I got was running a very old firmware version, so I had to jump through multiple steps to update it to a recent one that allows running the latest Jetpack 6.x (the version that delivers the drastically improved performance that led to NVIDIA calling it “Jetson Orin Nano Super”). Other than that, it was basically flashing a micro SD card, inserting it into the Jetson, and booting it up. Rinse, repeat this step several times until you get to the desired version.

An NVME drive is strongly recommended to store the heavy stuff (container images, models, etc.) - microSD cards are more expensive per GB at the higher end, aren’t made for such heavy random write I/O, and probably won’t survive that long. The Jetson Orin Nano has two available M.2 slots (a third one is used for the WiFi card), a PCIe 3.0 x4 in the 2280 format, and a PCIe 3.0 x2 in the 2230 format.

To allow for remote headless control, enable SSH from the Ubuntu GUI setup, and configure SSH keys.

Performance tuning

To get the most out of the Jetson, you need to disable the Desktop GUI (which just wastes resources), and set the power mode to “Max Performance”, which came with Jetpack 6.x. By default there’s a cap of 15W power consumption, which can be increased to 25W or “unlimited” - the setting I went for. This can be done using the nvpmodel command, which is part of the Jetpack installation. The command to set the power mode to unlimited is:

# set the power mode to unlimited
sudo /usr/sbin/nvpmodel -m 0

# disable desktop GUI to free up resources
sudo systemctl set-default multi-user.target

# reboot for the changes to take effect and everything to be cleaned
sudo reboot

Installing Nomad

If you’re new to Nomad, consider taking a look at the Introduction to Nomad.

Nomad has releases for arm64 (the CPU architecture of the Jetson family) available via an APT repository which works for Jetpack.

Connect to the Jetson via SSH, and run the following commands to install Nomad:


sudo apt-get update && sudo apt-get install wget gpg coreutils
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main"  | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt-get update && sudo apt-get install nomad

# Install CNI plugins which are used to configure networking for Nomad allocations
export ARCH_CNI=$( [ $(uname -m) = aarch64 ] && echo arm64 || echo amd64)
export CNI_PLUGIN_VERSION=v1.7.1
curl -L -o cni-plugins.tgz "https://github.com/containernetworking/plugins/releases/download/${CNI_PLUGIN_VERSION}/cni-plugins-linux-${ARCH_CNI}-${CNI_PLUGIN_VERSION}".tgz && \

We also need a configuration file for the Nomad agent to know which addresses to listen on, which folder to use for the Nomad and allocation data (on the NVME to avoid overtaxing the SD card), etc. In my case I’m connecting to an existing Nomad cluster, but it can also be a standalone single machine cluster with only the Jetson. The configuration file is in HCL format, and should be placed in /etc/nomad.d/.

hcl /etc/nomad/client.hcl


name = "jetson-nano-01"
region = "global"
datacenter = "home"

# the address the Nomad client will be available on, important to adjust especially on public networks where 
# it shouldn't be 0.0.0.0
bind_addr = "0.0.0.0"

# the folder where Nomad stores its data, which we want on the NVME drive
data_dir = "/ssd/nomad"

log_level = "INFO"
enable_syslog = true

client {
    enabled = true

    # the address of the Nomad servers/control plane; 
    # if there are none and this will be a single node cluster, add a server block 
    # with enabled = true and bootstrap_expect = 1 in a separate configuration file
    # in the same folder (e.g. /etc/nomad/server.hcl), and use 172.0.0.1 for the server address
    servers = ["nomad-servers.home"]
    # Node Pool, used to group nodes with similar capabilities (in this case, Jetsons)
    node_pool = "jetsons"
}

Setting up the NVIDIA Jetson GPU/APU for use with Nomad

Nomad has a concept called device drivers, which enable Nomad to fingerprint (discover) devices, collect information about them, and make them available for scheduling and targeting. They are with an open spec, anyone can write their own, and there are community ones for USB and Xlinix FPGAs, as well as an official NVIDIA driver. It discovers NVIDIA GPUs, collects their info (type, wattage, memory use, etc.) and allows you to deploy workloads specifically targeting them, like so:

hcl Example job using an NVIDIA GPU


job "gpu-test" {
  datacenters = ["dc1"]
  type = "batch"

  group "smi" {
    task "smi" {
      driver = "docker"

      config {
        image = "nvidia/cuda:12.8.1-base-ubuntu22.04"
        command = "nvidia-smi"
      }

      resources {
        device "nvidia/gpu" {
          count = 1

          # Add an affinity for a particular model
          affinity {
            attribute = "${device.model}"
            value     = "Tesla H100"
            weight    = 50
          }
        }
      }
    }
  }
}

I thought I’d need to use it, so compiled my own (currently there are no arm64 releases), tried running it… and got a bunch of errors from NVML, the underlying NVIDIA library for GPU management and monitoring. After some research, I discovered that contrary to my expectations, Jetsons aren’t supported by NVML, at all. It goes as far as detecting there’s an NVIDIA device of type “Orin”, and that’s it, none of the other methods such as getting the name, wattage, memory, etc. work.

Therefore the NVIDIA device driver, which relies entirely on NVML, is out. But is it actually needed?

Turns out, no, not really. CUDA workloads detect the integrated GPU on their own and directly use it. The only thing needed for containers is the NVIDIA container toolkit (which installs a special nvidia runtime), which comes preinstalled with Jetpack. The last part of the puzzle is a way to target the correct client, the one with the GPU, but this is easily solved by the fact that I put the Jetson in its own node pool, and use that to target it.

Note: Nomad can run applications with raw_exec where binaries/scripts are executed directly on the host OS, with no isolation. This could be marginally faster than with containers, but comes at the expense of increased difficulty when managing dependencies such as CUDA and various other libraries’ versions. There is already some complexity due to the dependency on the host NVIDIA driver and kernel, which is enough of a potential minefield, so I decided to stick to Docker containers for now.

Speaking of Docker containers, this brings me back to the jetson-containers repository by Dustin Franklin. It contains ready to use (with all necessary libraries and dependencies) containers for popular robotics, IoT, AI, ML, etc. frameworks/tools/platforms, as well as wrapper tools to start them with the appropriate settings. That’s a great starting point for running workloads on the Jetson, and what I’ll use to make my life easier. The containers are available on Docker Hub, and can be pulled and run like any other container.

Deploying Ollama and Open WebUI on Nomad

With all this set up, we can now deploy Ollama and Open WebUI on Nomad. The process is pretty straightforward, and is similar to deploying any other container on Nomad. The only difference is that Docker needs to use the nvidia runtime - NVIDIA’s Jetson guides recommend setting it as the default for the docker daemon (in /etc/docker/daemon.json), but it can also be specified in Nomad with runtime in the job spec.

hcl Ollama job spec


job "ollama" {
  # targeting the "jetsons" node pool which contains the Jetson Orin Nano
  datacenters = ["*"]
  node_pool = "jetsons"
  namespace = "llm" # namespace for all LLM-related jobs, needs to be created first
  type      = "service"

  group "ollama" {
    # persistent volume to store Ollama's models, data, cache, etc.
    volume "ollama" {
      type            = "host"
      source          = "ollama"
      access_mode     = "single-node-single-writer"
      attachment_mode = "file-system"
    }

    # service and network to make Ollama's API discoverable to other jobs, like Open WebUI
    # and Home Assistant
    service {
      name     = "ollama"
      port     = "ollama"
      provider = "nomad"
    }
    network {
      port "ollama" {
        static = 11434
      }
    }
    # Ollama Docker container (from jetson-containers), using the NVIDIA runtime
    task "ollama" {
      driver = "docker"
      config {
        image = "dustynv/ollama:r36.4.0"
        command = "ollama"
        args    = ["serve"]
        runtime = "nvidia"
        shm_size = 8192
        ports = ["ollama"]
      }
      # env vars to configure Ollama
      env {
        OLLAMA_HOST = "0.0.0.0"
        OLLAMA_MODELS = "/ollama/models"
        OLLAMA_LOGS = "/ollama/logs"
      }
      volume_mount {
        volume      = "ollama"
        destination = "/ollama"
      }
      resources {
        cpu    = 9000
        memory = 7168 # leave a GB for the OS, Nomad and other processes
      }
    }
  }
}

Next, a Nomad job spec for Open WebUI:

hcl OpenWebUI job spec


job "openwebui" {
  datacenters = ["*"]
  namespace = "llm"
  node_pool = "default" # default node pool, so that it doesn't run on the Jetson
  type       = "service"

  group "openwebui" {
    volume "openwebui" {
      type            = "host"
      source          = "openwebui"
      access_mode     = "single-node-single-writer"
      attachment_mode = "file-system"
    }

   service {
      name = "open-webui"
      port = "http"
      provider = "nomad"
    }

    network {
        port "http" {
            static = 8080
            to = 8080
        }
    }

  task "open-webui" {
    driver = "docker"
    config {
        image   = "ghcr.io/open-webui/open-webui:0.6"
        ports   = ["http"]
    }
    resources {
        cpu    = 600
        memory = 1024
    }

  # dynamic template to inject the Ollama server address into the OpenWebUI container,
  # with automatic restart on changes
  template {
    data = <<EOH
    # any extra environment variables needed by OpenWebUI can be added here
    # such as ENABLE_WEB_SEARCH=true and WEB_SEARCH_ENGINE=duckduckgo
OLLAMA_BASE_URL=http://{{range nomadService "ollama"}}{{ .Address}}:{{ .Port }}{{end}}
EOH

    destination = "secrets/file.env"
    env         = true
    change_mode = "restart"
    }
   volume_mount {
        volume      = "openwebui"
        destination = "/app/backend/data"
      }
  
  }
 }

For both jobs, I used Nomad volumes to create a persistent storage for the data (for models, cache, configuration). Volumes can be managed by CSI plugins, but I used the newly released builtin dynamic host volumes, which are much simpler to set up and use. They are created on the host, and mounted into the container. Example volume definition that will create a volume called ollama in the the Nomad data_dir (/ssd/nomad/host_volumes) folder on the host:

name            = "ollama"
namespace       = "llm"
type            = "host"
plugin_id       = "mkdir"
node_pool       = "jetsons"

capacity_min = "10G"
capacity_max = "100G"

capability {
  access_mode     = "single-node-single-writer"
  attachment_mode = "file-system"
}

Deploying all of those is a matter of running nomad volume run and nomad job run with the job spec files. The jobs will be scheduled on the Jetson Orin Nano, the Ollama server will be available on port 11434, and OpenWebUI on port 8080.

nomad volume create jobs/llm/ollama.volume.hcl
nomad volume create jobs/llm/openwebui.volume.hcl

nomad job run jobs/llm/ollama.hcl
nomad job run jobs/llm/openwebui.hcl

After a few minutes, the Ollama server should be up and running, and you can access it via the OpenWebUI at http://jetson|other-machine-IP:8080. Clients such as Home Assistant can connect to Ollama at http://jetson-IP:11434.

Conclusion

The NVIDIA Jetson Orin Nano is a decently powerful (for limited home use) AI development board, and with Nomad it’s very easy to orchestrate AI workloads on it and integrate it into a larger homelab setup.