Posted on ::

Recently I've been learning Terraform. I feel a bit late to the Infrastructure as Code party.

I'm going into this very green on the approach but with a lot of infrastructure, programming, and cloud architecture experience.

One of the things that came up in a disagreement betwene some coworkers has been the structure of the HCL we write and how we standardize as a team on a way to form our code.

Not an authoritative source

Before I get started, I want to highlight that this blog post is documenting my learning process. There are many great sources from folks with more experience, many of which I'll link to throughout, that I'd recommend reviewing before me.

The disagreement

In any organization that has a decent sized infrastructure engineering team working on various projects, standards are extremely important. Not just from a compliance and audit perspective, but also ensuring that builds and repeatable and easily understandable by other engineers.

In a group conversation, we were reviewing two different modes of thought to how we were approaching Terraform in our environment. The goal of the discussion was to come to an agreement of how we structure our Terraform code. We stumbled upon what seems to be a "holy war" of sorts with Terraform.

In one case, we had a repository built in the following structure:

$ tree project1
project1
├── dev
│   ├── backend.tf
│   ├── main.tf
│   ├── outputs.tf
│   ├── project1.dev.auto.tfvars
│   ├── providers.tf
│   └── variables.tf
└── prod
    ├── backend.tf
    ├── main.tf
    ├── outputs.tf
    ├── project1.prod.auto.tfvars
    ├── providers.tf
    └── variables.tf

Where each environment has a folder completely independent one one another. There also isn't a use of any locals, which will be relevant later.

The second case was the following structure:

$ tree project2
project2
├── env
│   ├── dev
│   │   └── project2.dev.auto.tfvars
│   └── prod
│       └── project2.prod.auto.tfvars
└── solution
    ├── backend.tf
    ├── locals.tf
    ├── main.tf
    ├── outputs.tf
    ├── providers.tf
    └── variables.tf

Where there is no separate code for each environment. Instead the root module is shared with all environments and configuration is handled through an environment variable and called module settings are derived within locals.

Now, I've simplified both of these examples to show the disagreement more specifically. The real world application was more complex. What I want you to take away is the following:

  • project1 has completely independent code for each environment of the application and exclusively uses variables to set every aspect of any called external modules
  • project2 has all code for the application the same between each environment. Each environment differs only by the variables input. Locals are used to determine what features change between environments.

As I looked at this problem, I decided to break it down into two main points: Variables and Locals and Environments.

Variables and Locals

In project1, the main.tf may look something like this:

module "mymodule" {
    source = "app.terraform.io/example/module"
    version = "1.0.0"

    name = var.mything_name
    resource_group = var.mything_rg
    description = var.mything_description
    sku = var.mything_sku
}

Where each module has its parameters set to 1:1 variables. The tfvars file would then have explicit values for every one of these variables.

In project2, the main.tf may look something like this:

e "mymodule" {
    source = "app.terraform.io/example/module"
    version = "1.0.0"

    name = local.mything_name
    resource_group = local.resource_group
    description = local.mything_description
    sku = local.mything_sku
}

Which, on its own doesn't directly tie variables to these locals. You need to look at the locals.tf file:

locals {
    resource_group = lower("rg-mything-${var.environment}-001")

    mything_name = lower("app-mything-${var.environment}-001")
    mything_description = lower("App resource for mything ${var.environment}")
    mything_sku = var.environment == "prod" ? "ds16_v5" : "ds4_v5"
}

In this way, the only variable set by the tfvars is the environment from which everything else is derived.

Now, this gets close to where our second point is, which is how to divide environments.

Environments

The other point that these setups differed is on how to treat separate environments.

On state files

I'm intentionally not talking about state files here. In this setup, Terraform Cloud's workspaces are used per environment in all scenarios described. As such, state files are already separated by application and environment

In project1 each environment has a folder. Within each environment's folder, we have a complete version of the Terraform code for that environment. There are no common references on interdependencies between the environments.

Whereas project2 takes a different approach. There is only one set of files defining the application and deployments to different environments is handled by an environment variable and local expressions that change the deployment.


So, with those two points defined, which is right?

For the rest of this article, I'm going to attempt to answer that.

Terraform style guide

Before we try and the best way, we should review what is the standard structure for Terraform modules.

File structure

The Hashicorp Terraform Style Guide has a lot of information about what Hashicorp recommends. There is some great stuff here, but I want to focus on the File names section for now.

  • backend.tf
  • main.tf
  • outputs.tf
  • providers.tf
  • variables.tf
  • locals.tf

These files are the most common names you'll see in a Terraform module. When Terraform runs, it will just mush all of these together into one long config, so while technically irrelevant, the goal here is to make code readable by the engineers working on it.

Terraform is composed of modules. By convention the directory where you run Terraform from is referred to as the "root module" which can make reference to other modules either local or remote.

The style guide also has an example of a repository structure.

.
├── modules
│   ├── function
│   │   ├── main.tf      # contains aws_iam_role, aws_lambda_function
│   │   ├── outputs.tf
│   │   └── variables.tf
│   ├── queue
│   │   ├── main.tf      # contains aws_sqs_queue
│   │   ├── outputs.tf
│   │   └── variables.tf
│   └── vpc
│       ├── main.tf      # contains aws_vpc, aws_subnet
│       ├── outputs.tf
│       └── variables.tf
├── main.tf
├── outputs.tf
└── variables.tf

But let's not get too ahead of ourselves! We want to answer what's best practice about this structure, but we're still missing some components.

Variables and Locals

Obviously one of the points we have to answer is how and when to use variables and locals. The style guide has an opinion on that too.

Variables are for inputs. Think of them like parameters to a function. A module may define a set of inputs, be them strings, arrays, objects, integers, or boolean values. These inputs will change how that module is processed. It may change the name of resources, change how many resources there are, or if a resource is created at all.

While variables make your modules more flexible, overusing variables can make code difficult to understand. When deciding whether to expose a variable for a resource setting, consider whether that parameter will change between deployments.

Terraform style guide

Now the part I want to highlight here is something Hashicorp takes an opinionated stance on. "overusing variables can make code difficult to understand." The implication here is that using variables a lot in your code will make it difficult, by simply looking at main.tf on its own, to get an idea of what all is being done. You would need to refer to the tfvars file to know what the values are.

Another type of value you can refer to is a local. What does the style guide say about them?

Local values let you reference an expression or value multiple times. Use local values sparingly, as overuse can make your code harder to understand.

Terraform style guide

Another opinionated statement, but it echoes the same message from variables. "Use local values sparingly, as overuse can make your code harder to understand."

We'll analyze this more later, but for now lets move on to environments.

Multiple Environments

The style guide also has a section on multiple environments.

In this section, the style guide shows two examples. The first being the case that you're using Hashicorp's cloud platform for Terraform.

.
├── compute
│   ├── main.tf
│   ├── outputs.tf
│   └── variables.tf
├── database
│   ├── main.tf
│   ├── outputs.tf
│   └── variables.tf
└── networking
    ├── main.tf
    ├── outputs.tf
    └── variables.tf

In this structure, you have multiple root modules. However, you do not separate into folders by environment. Instead, Hashicorp recommends having one set of code, like above, and deploy separate environments using workspaces.

However, workspaces are a concept backed by Terraform cloud. If you do not have Hashicorp's cloud service, they recommend the following

├── modules
│   ├── compute
│   │   └── main.tf
│   ├── database
│   │   └── main.tf
│   └── network
│       └── main.tf
├── dev
│   ├── backend.tf
│   ├── main.tf
│   └── variables.tf
├── prod
│   ├── backend.tf
│   ├── main.tf
│   └── variables.tf
└── staging
    ├── backend.tf
    ├── main.tf
    └── variables.tf

Now, this is interesting because we introduce two concepts here. One that we have different folders for each environment that are their own root module, but then we have this "modules" folder. These are local modules that are called by each of the environments.

The goal of this structure is that you separate the state for each environment, but the local modules provide a place of common configuration between all environments.

These local modules are something expanded on more in the style guide's suggestions for repository structure.

Repository

The style guide's section on repository structure shows how they suggest reflecting the above folder structures in a git repository.

.
├── modules
│   ├── function
│   │   ├── main.tf      # contains aws_iam_role, aws_lambda_function
│   │   ├── outputs.tf
│   │   └── variables.tf
│   ├── queue
│   │   ├── main.tf      # contains aws_sqs_queue
│   │   ├── outputs.tf
│   │   └── variables.tf
│   └── vpc
│       ├── main.tf      # contains aws_vpc, aws_subnet
│       ├── outputs.tf
│       └── variables.tf
├── main.tf
├── outputs.tf
└── variables.tf

This is reinforcing this concept of separating your services into local modules apart from your environment code.

Overall, the Hashicorp style guide gives us a lot to think about, but what do people do in the real world?

Community Standards

As with any platform, the creator/publisher/primary developer can only make suggestions on how their tool is used. Users are creative folks and come up with all kinds of ways to use and abuse your tool.

Terraform is no different. Whether it be folks in very small environments with problems unique to them, or engineers in large enterprises with their unique problems.

A fun twist on Terraform as well is we don't just have one common toolset. People are out there using:

  • Straight Terraform CLIL
  • The fork OpenTofu
  • HCP
  • Terraform Cloud
  • Terragrunt
  • Terramate

Or whatever else I'm sure you can find on the Internet. All of these different environments leads to a lot of fragmentation when it comes to file structure, coding standards, and repository use. What works for one toolset, doesn't necessarily work for another.

Let's look at some discussions on the topics we're covering here.

Variables and Locals

Almost universally, variables and locals have a common understanding.

In many Reddit threads, the comparison is drawn to programming.

Variables are like parameters on a function, whereas locals are local variables within that function. It's really that simple.

Environments

Now here we have a lot less consensus.

You can review many threads in various communities which discuss this problem.

In this thread the top comment cautions about using Terraform workspaces and a common configuration. Their reason being that when you have differences in the platform between environments, supporting variables that act as "feature flags" gets complex fast. Some people also warn about limiting the scope of changes. Meaning, if you change the common code, you would affect the higher environments.

In other threads you have some more out there ideas of using git branches for environments (please don't).

Direct disagreements exactly like what we're discussing have happened time and time again.

The concern with having completely independent main.tf files for each environment is that you can get a growing set of differences between environments. It makes it far less straight forward to promote a component to each environment.

Ultimately, the community doesn't have a consistent answer.

What about some of the "big players" like companies and their own guides?

Google has an article with their suggested standard for Terraform root modules. Their structure is very similar to Hashicorp's recommendation when not using Terraform cloud or HCP. They have a repository structure with folders for environments and a common local modules folder.

-- SERVICE-DIRECTORY/
   -- OWNERS
   -- modules/
      -- <service-name>/
         -- main.tf
         -- variables.tf
         -- outputs.tf
         -- provider.tf
         -- README
      -- ...other…
   -- environments/
      -- dev/
         -- backend.tf
         -- main.tf

      -- qa/
         -- backend.tf
         -- main.tf

      -- prod/
         -- backend.tf
         -- main.tf

My conclusions

So what are my takeaways from this research? How should I structure my Terraform root modules? Should I make everything a variable? Should I make everything a local? Should I separate environments in folders or not?

Let's start with Variables and Locals

Variables and Locals

When it comes to variables and locals, I'm inclined to minimize the use of variables and overuse locals. However, a compromise I've come to in order to address to "readability" concern that was brought up are these two guidelines:

  • Use a variable only when the value will change between environments
  • Use a local only when the value will be referenced more than once

Environments

This is a lot more difficult to answer. Ultimately, there are a few considerations.

  1. We want to limit state as to not have too many components in one state
  2. Support multiple environments
  3. While keeping those environments as close to in sync as possible
  4. And still have the ability to allow for differences
  5. Supporting workspaces while not necessarily depending on them

What I have arrived at is something that I hope covers each of these cases.

.
├── environments
│   ├── beta
│   │   ├── backend.tf
│   │   ├── main.tf
│   │   └── terraform.tfvars
│   ├── dev
│   │   ├── backend.tf
│   │   ├── main.tf
│   │   └── terraform.tfvars
│   ├── prod
│   │   ├── backend.tf
│   │   ├── main.tf
│   │   └── terraform.tfvars
│   └── test
│       ├── backend.tf
│       ├── main.tf
│       └── terraform.tfvars
└── modules
    ├── app
    │   ├── locals.tf
    │   ├── main.tf
    │   ├── outputs.tf
    │   └── variables.tf
    ├── database
    │   ├── locals.tf
    │   ├── main.tf
    │   ├── outputs.tf
    │   └── variables.tf
    └── network
        ├── locals.tf
        ├── main.tf
        ├── outputs.tf
        └── variables.tf

The idea here is that the HCL for each environment is very straight forward, simply calling each of the service modules that make up the solution.

State is limited to each environment and can be tied to a Terraform cloud workspace within the backend.tf

While differences between environments can exist, the simplicity of the main.tf should make those differences very apparent.

Final thoughts

Obviously the most important thing is that the structure reflects how your organization works. In environments where you have many engineers from different domains working on components of an overall solution and rapidly iterating on them, this approach may not work as well. Also, automation can influence the approach you take as this structure may not lend to the copy+paste building block approach.

I think the right approach is, if your organization is just starting with Infrastructure as Code, to simply put together some "dummy" applications that reflect common patterns and see where the pain points are.

Table of Contents