Using HashiCorp Sentinel to validate Terraform configuration/plan

Let me preface this by saying that i really like HashiCorp, their products and their open core business model. Their Enterprise stack is great with awesome features (Vault selective intercluster replication, Terraform team collaboration tooling, Sentinel Policy as Code, etc.), but if itss for personal use or a small business case that doesn’t require all Terraform Enterpirse Features and associated budget, using the freely available Sentinel Simulator in a somewhat hacky way could be a viable option (certainly beats having to write a custom tool to achieve the same thing).

Use case

I have some terraform configuration files, and mostly, plan files that i’d like to validate before applying. Things like the terraform version is appropriate, resources to add aren’t too big/expensive, there’s always a tag with some key, etc. I could write a custom tool to achieve that, or i could take advantage of Sentinel, HashiCorp’s Policy as Code framework/tool, which is sadly closed source and available only in Enterprise versions of HashiCorp’s stack. However, HashiCorp have Sentinel Simulator freely available to download, which “enables you to develop and test Sentinel policies locally on your own machine or in a CI environment”, which sounds great for what i need.

Getting started with Sentinel Simulator

Start by downloading Sentinel Simulator’s latest version for your OS and architecture, setting the appropriate permissions (chmod +x sentinel on *nix) and executing the binary, which should result in a similar output:

./sentinel
Usage: sentinel [--version] [--help] <command> [<args>]

Available commands are:
    apply      Execute a policy and output the result
    doc        Show documentation for an import from a doc file
    fmt        Format Sentinel policy to a canonical format
    test       Test policies
    version    Prints the Sentinel Simulator version

Once that’s out of the way, you can start writing a policy - which is just a file with rules that assert and parse conditions and return true/false (pass/fail).

An example policy is the following:

# file: basic.sentinel
hour = 4
main = rule { hour >= 0 and hour < 12 }

With it, we declare a static variable hour which is 4, and in the main rule (each policy needs to have a main rule) we check if the hour variable is between 0 and 12. Since it is, we’ll get a true/pass with exit code 0 when doing a sentinel apply:

sentinel apply basic.sentinel
Pass

Rules can be very complex (Sentinel’s rule getting started guide), and most interestingly, can use dynamic data sources with imports, which are, roughly speaking, plugins that communicate with an external data source like time, terraform plan output, etc.

An example use of the time import and more complex rules (which do the same thing, checking if the hour is between 0 and 12), with a print:

# file: time.sentinel
import "time"

hour = time.now.hour
minute = time.now.minute
print("current time is", hour, minute)

past_midnight = rule { hour >= 0 } 
before_noon   = rule { hour < 12 }

main = rule {
    past_midnight and 
    before_noon
}

Here we’re importing the “time” plugin, and using the time.now.hour attribute (which gives us the current hour) to compare against 0 and 12, allowing us to do time-based rules (e.g. enforce “read-only Friday”).

In my case, since it’s 15:25, it will fail:

date
Thu May 23 15:25:29 UTC 2019
./sentinel apply time.sentinel
Fail

Execution trace. The information below will show the values of all
the rules evaluated and their intermediate boolean expressions. Note that
some boolean expressions may be missing if short-circuit logic was taken.

Print messages:

current time is 15 25

FALSE - time.sentinel:10:1 - Rule "main"
  TRUE - time.sentinel:11:5 - past_midnight
    TRUE - time.sentinel:7:24 - hour >= 0
  FALSE - time.sentinel:12:5 - before_noon
    FALSE - time.sentinel:8:24 - hour < 12

FALSE - time.sentinel:8:1 - Rule "before_noon"

TRUE - time.sentinel:7:1 - Rule "past_midnight"

You can do much more than simple comparisons (contains, is, not contains, is not, is not undefined, etc. - full spec of the Sentinel language here)

Advanced reading

The good stuff

Sentinel, when used as a part of Terraform Enterpirse, has some terraform-related plugins like tfplan, which allow to do validations based on planned values. Sentinel Simulator, however, doesn’t have those closed source plugins, so the only way to use it in a similar fashion is via its JSON parser plugin, since terraform configuration and plan can be previewed as JSON (Note: terraform show (plan|state) -json exists only in terraform 0.12, for prior versions you’d need to use a tool like tfjson).

If you are already on terraform 0.12, skip to the relevant section.

Using tfjson

First you need to install a (functional) version of tfjson, like tapirs/tfjson via go get:

go get github.com/tapirs/tfjson
tfjson
usage: tfjson terraform.tfplan

As an example, I manage my Digital Ocean Kubernetes cluster via terraform, I’ll increase the node_count to 3, and do a terraform plan -out terraform.plan :

terraform plan -out terraform.plan                                                                                                                    
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

digitalocean_kubernetes_cluster.my_digital_ocean_cluster: Refreshing state... (ID: f8ee8505-af76-437e-864f-3751371374d8)

------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  ~ digitalocean_kubernetes_cluster.my_digital_ocean_cluster
      node_pool.0.node_count: "2" => "3"


Plan: 0 to add, 1 to change, 0 to destroy.

------------------------------------------------------------------------

This plan was saved to: terraform.plan

To perform exactly these actions, run the following command to apply:
    terraform apply "terraform.plan"

Running tfjson on the terraform.plan file results in a JSON with the resources that will get added/modified with the appropriate values:

tfjson terraform.plan
{
    "destroy": false,
    "digitalocean_kubernetes_cluster.my_digital_ocean_cluster": {
        "destroy": false,
        "destroy_tainted": false,
        "node_pool.0.node_count": "3"
    }
}

Sentinel can read JSON, either inline via the json plugin or via the global values configuration, which is available as a configuration file (sentinel apply -config config.json) or an a command-line flag (sentinel apply -global key=value).

Assuming tfjson’s output, an example policy that checks destroy is true (meaning that applying the plan will result in destruction of resources - normally you’d want to check the opposite but this is just a test) :

# file: tf.policy

print("input.destroy is",input.destroy)
main = rule {input.destroy is true}

Now we just need to pass the output of tfjson as the global variable input to Sentinel:

sentinel apply -global input="`tfjson terraform.plan`" tf.policy
Fail

Execution trace. The information below will show the values of all
the rules evaluated and their intermediate boolean expressions. Note that
some boolean expressions may be missing if short-circuit logic was taken.

Print messages:

input.destroy is false

FALSE - tf.policy:2:1 - Rule "main"

As we can see, the JSON output of tfjson was successfully read, parsed and evaluated.

Sadly, this would only work for the simplest of checks (destroy is false) due to the way tfjson’s output is structured and Sentinel parses JSON (you can’t iterate on root-level keys). To be able to do more complex checks, a hacky workaround is required, to encapsulate tfjson’s output before passing it to Sentinel:

sentinel apply -global input="{\"data\": `tfjson terraform.plan`}" tf.policy

With the slighly modified policy:

# file: tf.policy

print("input.data.destroy is", input.data.destroy)

main = rule {input.data.destroy is true}

Now a more complex policy with multiple advanced checks:

# file: tf.policy

print("input.data.destroy is", input.data.destroy)

no_destroys = rule {
	input.data.destroy is false
}

dont_touch_my_kube = func(data) {
	for data as i, j {
		// skipping the destroy value since there is a dedicated rule for it
		if i == "destroy" {
			continue
		} else {
			print(i)
			if i contains "digitalocean_kubernetes_cluster" {
				print("Refusing modification of digitalocean_kubernetes_cluster resources")
				return (false)
			}
		}
	}
	return(true)
}

main = rule { no_destroys and dont_touch_my_kube(input.data) }

Result:

sentinel apply -global input="{\"data\": `tfjson terraform.plan`}" tf.policy 
Fail

Execution trace. The information below will show the values of all
the rules evaluated and their intermediate boolean expressions. Note that
some boolean expressions may be missing if short-circuit logic was taken.

Print messages:

input.data.destroy is false
digitalocean_kubernetes_cluster.my_digital_ocean_cluster
Refusing modification of digitalocean_kubernetes_cluster resources

FALSE - tf.policy:23:1 - Rule "main"
  TRUE - tf.policy:23:15 - no_destroys
    TRUE - tf.policy:4:2 - input.data.destroy is false
  FALSE - tf.policy:23:31 - dont_touch_my_kube(input.data)

TRUE - tf.policy:3:1 - Rule "no_destroys"

Since there are no destroys, the first rule is successful, but the second (which is a function to be able to properly parse the JSON object) fails since there are modifications to be done on a digitalocean_kubernetes_cluster resource, which isn’t allowed.

Terraform show -json (starting from terraform 0.12)

Since terraform version 0.12 (which includes some very cool new features, if you haven’t checked it out here’s the announcement blog post), there’s a -json option to terraform show which works on terraform plans, so you can preview them in JSON format, perfect for checks via Sentinel or other tools. Let’s see what that looks like with Sentinel:

terraform -v
Terraform v0.12.0
+ provider.digitalocean v1.3.0
terraform plan -out terraform.plan
terraform show -json terraform.plan 
...

This should result in a pretty sizeable JSON, including:

terraform version
providers’ configuration
all current resources
planned changes
output changes
yet unknown values
variables

Overall, it’s pretty cool and you can get wild with policies. Off the top of my head:

providers’ configuration doesn’t have static values, only variables / Vault data source
terraform version is good
there are no variables with X or Y (like passwords or generic admin username)
no resources are used at the root level (only modules)
and the plain old (value X isn’t bigger than Y), like AWS/GCP/Azure instance types, Lambda memory settings, security groups SSH open from 0.0.0.0/, etc.

A few examples for inspiration:

# file tf.policy
allowed_node_sizes = [
	"s-1vcpu-2gb",
]

allowed_regions = [
	"ams3",
]

allowed_tf_versions = [
	"0.12.0",
]

node_pool_max_size = 3

tf_version = rule {
	input.terraform_version in allowed_tf_versions
}

no_destroys = func() {
	for input.resource_changes as change {
		if change.change.actions contains "destroy" {
			print("No destroys allowed")
			return (false)
		}
	}
	return (true)
}

max_nodepool_size = func() {
	for input.resource_changes as change {
		if change.address contains "digitalocean_kubernetes_cluster" {
			for change.change.after.node_pool as pool {
				if pool.node_count > node_pool_max_size {
					error("Node pool", pool.name, "shouldn't exceed max node size", node_pool_max_size, "it's currently", pool.node_count)
				}
			}
		}
	}
	return (true)
}

region_check = func() {
	for input.configuration.root_module.resources as resource {
		if resource.expressions.region.constant_value not in allowed_regions {
			return (false)
		}
	}
	return (true)
}

node_size = func() {
	for input.resource_changes as change {
		if change.address contains "digitalocean_kubernetes_cluster" {
			for change.change.after.node_pool as pool {
				if pool.size not in allowed_node_sizes {
					return (false)
				}
			}
		}
	}
	return (true)
}

main = rule {
	tf_version and no_destroys() and region_check() and max_nodepool_size() and node_size()
}

This example policy checks the following:

node types
number of nodes
no destroys
terraform version hasn’t been updated
no resources are created outside of the allowed region (AMS3)

To run it, you need to apply sentinel with terraform show’s output:

sentinel apply -global input="`terraform show -json terraform.plan`" tf.policy

It took me roughly 15 minutes to write and test basic functionality, and i have a decent chunk of stuff covered (for my limited use case).

TL;DR

Sentinel is an awesome Policy As Code tool from HashiCorp which can be run locally via Sentinel Simulator and be used to validate any sort of JSON, like the output from a terraform plan.

Coming up next

While searching for a Policy/Validation as code tool, i asked around the DevOps’ish Telegram group and someone proposed conftest, which seems more flexible, advanced and complicated (with a Go-like syntax compared to Sentinel’s HCL-like DSL), not to mention free, open source and usable directly without workarounds. I’d like to give it a try and compare the two tools.