Terraform infrastructure: Reproducible & reliable scripts

Terraform helps to eliminate manual processes and human error and speed up the deployment process. It’s no surprise that it has gained significant popularity among DevOps and cloud infrastructure teams. However, as with any automation tool, Terraform could be more foolproof. If used incorrectly, Terraform scripts can become unreliable and difficult to reproduce. This is where e2e-testing in Terraform comes in. By testing Terraform scripts in a way that simulates a real-world environment, teams can ensure that their deployments are reliable and reproducible.

The reliable execution of e2e testing demands a robust resting architecture with well-crafted scenarios that account for potential discrepancies. Technologies, such as Golang with Terratest library and GitHub Actions, help to establish a reliable Continuous Integration (CI) process during e2e-testing in Terraform.

In this blog post, we’ll explore what Terraform is, how it can be used, and how e2e-testing in Terraform can help teams make their Terraform scripts more reliable and reproducible.

What is Terraform?

Terraform is a powerful open-source tool that automates infrastructure management by defining infrastructure as code. It enables teams to create and manage infrastructure in a repeatable and consistent manner, eliminating the need for manual configuration.

By using Terraform, teams can specify the desired state of their infrastructure using code, and then Terraform applies that code to provision, modify, and delete resources in the cloud. This allows teams to manage their infrastructure more efficiently and scalable, reducing errors and saving time.

Terraform’s support for multiple cloud providers and managing resources, like Kubernetes and GitHub, is a significant advantage, as it allows teams to manage infrastructure across various cloud environments using a single tool. With Terraform, teams can create and manage resources on Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), and many others. Terraform provides a consistent interface for infrastructure management, meaning teams don’t have to learn different tools and processes for each cloud provider.

Another key benefit of Terraform is that it enables teams to version their infrastructure in Terraform code. By using version control tools like Git, teams can track changes to their infrastructure code over time. This makes it easy to roll back to a previous version if necessary, or identify when and why changes were made, or discover unauthorized changes. This versioning capability significantly improves traditional infrastructure management, which often involves manual processes and ad-hoc changes.

Terraform is a valuable tool for teams that need to manage infrastructure at scale across multiple cloud providers. It simplifies infrastructure management, improves consistency and reliability, and enables teams to version their infrastructure code.

How can you leverage Terraform?

You can utilize Terraform in various ways depending on the specific use case and the organization’s needs. Here are some of the most common ways Terraform is used:

Infrastructure provisioning: Terraform allows teams to provision infrastructure on cloud providers such as AWS, Azure, GCP, and others using a declarative language. This means that teams can define the desired state of their infrastructure using code, and Terraform will create and configure the resources necessary to achieve that state.
Infrastructure automation: Terraform can be employed to automate the configuration of infrastructure, eliminating the need for manual intervention. For example, teams can use Terraform to create and configure virtual machines, Kubernetes clusters, databases, and load balancers automatically.
Infrastructure management: Terraform enables teams to manage their infrastructure as code, which means that infrastructure changes can be versioned, tested, and deployed in a consistent and repeatable way. This improves visibility and control over infrastructure and makes it easier to manage infrastructure at scale.
Continuous integration and delivery (CI/CD): Terraform can be integrated with CI/CD pipelines to automate the deployment of infrastructure changes. This allows teams to test and deploy infrastructure changes quickly and consistently, reducing the risk of errors and downtime. Additionally, Terraform is able to prepare a change, which the manager must approve before any change is done.
Infrastructure as a service (IaaS): Terraform can be used to provide infrastructure as a service to other teams or departments within an organization. This allows teams to create and manage the infrastructure patterns that others can easily consume, making collaborating and sharing resources easier. This way of sharing infrastructure patterns ensures that teams consistently deploy infrastructure in line with security and governance policies.

Terraform can be operated in various scenarios, from small-scale infrastructure provisioning to large-scale infrastructure management and automation. Its flexibility and support for multiple cloud providers and other internet services make it a popular choice for teams looking to manage infrastructure consistently and repeatedly. It’s worth noting that Terraform prevents multiple changes at the same time, which would render the infrastructure inconsistent. Teams keep then the infrastructure state file in remote storage.

Why is it essential to make Terraform scripts reproducible and reliable?

Reproducible and reliable Terraform scripts ensure that infrastructure changes are consistent and predictable and reduce the risk of errors and downtime. Here are some of the key reasons why reproducibility and reliability are critical in Terraform:

Consistency: By making Terraform scripts reproducible and reliable, teams can ensure that infrastructure changes are consistent across environments. This means that changes made in a development environment can be replicated in a production environment without the risk of unexpected behavior or errors.
Predictability: When Terraform scripts are reproducible and reliable, teams can predict how infrastructure changes will affect their applications and services. This means that changes can be tested and validated before they are deployed, reducing the risk of downtime and errors.
Automation: Terraform is often used to automate infrastructure management, meaning changes can be made automatically without human intervention. By making Terraform scripts reproducible and reliable, teams can ensure that these automated changes are consistent and predictable, reducing the risk of errors and downtime.
Rollbacks: Reproducible and reliable Terraform scripts make it easier to roll back infrastructure changes if necessary. If an error occurs during an update, teams can quickly revert to a previous version of the infrastructure code, reducing the risk of downtime and data loss.
Collaboration: When Terraform scripts are reproducible and reliable, they are easier to collaborate. Multiple teams can work on the same codebase, and changes can be reviewed and tested before merging into the main branch. This improves the quality of the infrastructure code and reduces the risk of errors and conflicts.

One way to make your Terraform scripts reproducible and reliable is by using end-to-end (e2e) testing.

What is end-to-end testing?

End-to-end testing (e2e testing) is a software testing technique that involves testing an entire application from start to finish, ensuring that all the components of the application work together as expected. This type of testing is designed to simulate a real user scenario to ensure that the application works as intended in the production environment.

End-to-end testing involves testing all the application layers, including the user interface, application logic, and database. It tests the entire flow of the application, from the user input to the database and back to the user output, to ensure that all the components are integrated and working as expected.

It is typically performed using automated testing tools that simulate real user scenarios, such as filling out a form, submitting data, and verifying the results. The tests can be run on different browsers and operating systems to ensure the application works as expected on all platforms.

End-to-end testing is a crucial part of the software development lifecycle, as it helps to identify issues and defects that other types of testing, such as unit testing or integration testing, might miss:

E2e testing in Terraform: What are the benefits?

Overall, end-to-end testing in Terraform is an essential practice for ensuring that infrastructure changes are reliable and perform as expected, which is crucial for maintaining the overall stability and availability of the system.

There are several benefits of end-to-end (e2e) testing in Terraform, including:

Validation of the entire infrastructure stack, from the creation of resources to their configuration and deletion. This ensures that the Terraform code works as expected and helps identify any issues or defects before deployment to the production environment.
Help to increase confidence in infrastructure changes by ensuring that the changes don’t cause any unexpected issues or break the existing functionality. This helps to reduce the risk of downtime or service interruptions caused by faulty changes.
Time and cost reduction by catching issues early in the development process when they are easier and cheaper to fix. This helps to reduce the risk of costly rework and delays caused by faulty changes.

Benefits of e2e Testing for Terraform Scripts	With e2e Testing	Without e2e Testing
Ensure infrastructure configuration correctness	✓	✗
Validate integration with other components	✓	✗
Catch configuration errors early	✓	✗
Streamline troubleshooting and debugging	✓	✗
Reduce risks of unexpected failures	✓	✗
Increase confidence in the deployment process	✓	✗
Faster development and deployment cycle	✗	✓ Not counting bugs and fixes for them
Lower initial time and resource investment	✗	✓

Without e2e testing, the time and resources needed to set up, maintain, and execute tests can be reduced. However, it’s essential to consider the trade-offs, as skipping e2e testing might lead to grave issues that could have been detected and resolved during the testing phase. Although a faster development and employment cycle and lower time and resource investment are true, they leave your company exposed to critical bugs and losses and the time and money spent on fixing those.

How to implement e2e testing in Terraform?

End-to-end (e2e) testing of Terraform infrastructure code can be implemented using a combination of Terraform, testing frameworks, and CI/CD pipelines. Here are the steps involved in implementing e2e testing in Terraform:

Define e2e test scenarios: Define the test scenarios you want to run to validate your Terraform infrastructure code. These scenarios should test the entire infrastructure stack from start to finish, including the creation, updating, and deletion of resources.
Use testing frameworks: Use a testing framework like Terratest or Kitchen-Terraform to define and run the e2e test scenarios. These frameworks provide functions and helpers that enable you to create and manage test infrastructure and run the tests against your Terraform code.
Set up a test environment: Set up a separate test environment that mirrors your production environment. This environment should include all the necessary resources and dependencies for running the e2e tests.
Run e2e tests: Run the e2e tests. The tests should create and configure the resources defined in the Terraform code according to the test scenario.
Monitor and analyze test results: Monitor the test results to ensure that the tests pass consistently. Analyze the test results to identify any issues or failures and make necessary changes to the Terraform code.

Implementing e2e testing in Terraform helps to validate the entire infrastructure stack, from the creation of resources to their configuration and deletion. This ensures that the infrastructure code works as expected and helps identify any issues or defects before deployment to the production environment.

E2e test can be integrated into CI/CD and run automatically as part of the pipeline. This ensures that any changes to the infrastructure code are tested before being deployed to the production environment.

Why is destroying the e2e testing infrastructure in Terraform important

The purpose of e2e testing is to ensure that the infrastructure deployment process is working correctly and that the desired infrastructure is being deployed as expected. However, this might lead to huge data cumulation that increases costs. Therefore, it is vital to destroy the infrastructure after you’re done. This can be done by running the Terraform destroy command to remove all the resources created during the testing process.

Destroying the infrastructure after e2e testing is an important step within the whole process. This is true for a few reasons:

Testing clean-up: Cleaning up after resources are redundant ensures that the deletion is a smooth process and does not leave residual resources like disks, virtual networks etc., that may be created as a side effect of creating a more abstract infrastructure.
Cost Savings: Infrastructure resources such as cloud instances, databases, and storage can be expensive, especially when left running for an extended period. Destroying the infrastructure after testing, it helps to save costs and avoid unnecessary charges.
Consistency: Destroying the infrastructure after testing ensures that each test run starts from a clean state. This helps to ensure consistency between test runs and reduces the risk of false positives or false negatives in test results.
Security: By destroying the infrastructure after testing, it reduces the risk of security breaches or data leaks if the infrastructure is left running and exposed to potential vulnerabilities. But keep in mind that production data shouldn’t be used for test purposes in the first place.
Resource Clean-up: Destroying the infrastructure after testing helps to clean up any resources that were created during the testing process. This ensures that the testing environment is clean and free from any unnecessary resources that could impact future tests or deployments.

Example: E2E testing in Terraform of a public exposition of an endpoint

In this example, we test the public exposition of an endpoint by a virtual machine. Our assessment covers the creation of the virtual machine itself, the establishment of a public IP address, the binding of the IP to the virtual machine, and the hosting of HTTP requests.

Additionally, we examine the performance of a site hosted on the virtual machine, ensuring that it meets our expectations. We explore the functionality of passing variables to the module, as well as the seamless creation and teardown of the infrastructure.

We won’t break down the whole code to make it easier for you to copy if you wish so. We will:

Create a virtual machine
Create a public IP, which will be bound to the virtual machine
Host http requests on the virtual machine
Test if we get the expected outcome
Test if the passing of variables to the module works
Ensure smooth creation and removal of the infrastructure without encountering any issues

Let’s have a closer look at the Terraform code:

1terraform {
2 required_providers {
3   azurerm = {
4     source  = "hashicorp/azurerm"
5     version = "=3.50.0"
6   }
7 }
8}
9
10provider "azurerm" {
11 features {}
12}
13
14locals {
15 init_script= <<EOF
16#!/bin/bash
17echo "Hello, World!" > index.html
18nohup busybox httpd -f -p 8080 &
19EOF
20}
21
22variable "prefix" {
23 default = "virtuslab"
24}
25
26resource "azurerm_resource_group" "rg_vm" {
27 name     = "${var.prefix}-resources"
28 location = "West Europe"
29}
30
31resource "azurerm_virtual_network" "vn_main" {
32 name                = "${var.prefix}-network"
33 address_space       = ["10.0.0.0/16"]
34 location            = azurerm_resource_group.rg_vm.location
35 resource_group_name = azurerm_resource_group.rg_vm.name
36}
37
38resource "azurerm_subnet" "vns_internal" {
39 name                 = "internal"
40 resource_group_name  = azurerm_resource_group.rg_vm.name
41 virtual_network_name = azurerm_virtual_network.vn_main.name
42 address_prefixes     = ["10.0.2.0/24"]
43}
44
45resource "azurerm_public_ip" "pip_vm" {
46 name                = "myPublicIP"
47 location            = azurerm_resource_group.rg_vm.location
48 resource_group_name = azurerm_resource_group.rg_vm.name
49 allocation_method   = "Dynamic"
50}
51
52resource "azurerm_network_interface" "interface_vm" {
53 name                = "${var.prefix}-nic"
54 location            = azurerm_resource_group.rg_vm.location
55 resource_group_name = azurerm_resource_group.rg_vm.name
56
57 ip_configuration {
58   name                          = "testconfiguration1"
59   subnet_id                     = azurerm_subnet.vns_internal.id
60   private_ip_address_allocation = "Dynamic"
61   public_ip_address_id          = azurerm_public_ip.pip_vm.id
62 }
63}
64
65resource "azurerm_linux_virtual_machine" "linux_vm" {
66 name                = "example-machine"
67 resource_group_name = azurerm_resource_group.rg_vm.name
68 location            = azurerm_resource_group.rg_vm.location
69 size                = "Standard_DS1_v2"
70 disable_password_authentication = false
71 admin_username      = "adminuser"
72 admin_password      = "Password1234!" # For tests purposes!
73 network_interface_ids = [ azurerm_network_interface.interface_vm.id ]
74 custom_data = base64encode(local.init_script)
75
76 os_disk {
77   caching              = "ReadWrite"
78   storage_account_type = "Standard_LRS"
79 }
80 source_image_reference {
81   publisher = "Canonical"
82   offer     = "UbuntuServer"
83   sku       = "16.04-LTS"
84   version   = "latest"
85 }
86}
87
88output "public_ip" {
89 value = azurerm_linux_virtual_machine.linux_vm.public_ip_address
90}

Now let’s test it in Golang using Terragrunt:

1package test
2
3import (
4   "fmt"
5   "testing"
6   "time"
7
8   http_helper "github.com/gruntwork-io/terratest/modules/http-helper"
9
10   "github.com/gruntwork-io/terratest/modules/terraform"
11)
12
13func TestTerraformHelloWorldVm(t *testing.T) {
14   t.Parallel()
15
16   // Set up vars
17   vars := map[string]interface{}{
18      "port":   9090,
19      "prefix": "virtuslab",
20   }
21   terraformOptions := &terraform.Options{
22      Vars: vars,
23   }
24
25   // Cleanup after tests.
26   defer terraform.Destroy(t, terraformOptions)
27
28   // Apply infrastructure
29   terraform.InitAndApply(t, terraformOptions)
30
31   // Gather data about infrastructure
32   publicIp := terraform.Output(t, terraformOptions, "public_ip")
33   port := terraform.Output(t, terraformOptions, "port")
34
35   // Check if endpoint is working
36   url := fmt.Sprintf("http://%s:%s", publicIp, port)
37   http_helper.HttpGetWithRetry(t, url, nil, 200, "Hello, World!", 10, 5*time.Second)
38}

And finally, let’s test the output:

1$ go test -v -run TestTerraformHelloWorldVm -timeout 30m
2=== RUN   TestTerraformHelloWorldVm
3Running command terraform with args [init]
4Initializing provider plugins...
5[...]
6Terraform has been successfully initialized!
7[...]
8Running command terraform with args [apply -input=false -auto-approve -var port=9090 -var prefix=test -lock=false]
9
10Terraform used the selected providers to generate the following execution
11plan. Resource actions are indicated with the following symbols:
12  + create
13
14Terraform will perform the following actions:
15
16  # azurerm_linux_virtual_machine.linux_vm will be created
17  + resource "azurerm_linux_virtual_machine" "linux_vm" {
18  	+ admin_password              	= (sensitive value)
19  	+ admin_username              	= "adminuser"
20[...]
21Apply complete! Resources: 6 added, 0 changed, 0 destroyed.
22Outputs:
23port = "9090"
24public_ip = "108.143.19.96"
25[...]
26terraform [output -no-color -json public_ip]
27Running command terraform with args [output -no-color -json public_ip]
28"108.143.19.96"
29terraform [output -no-color -json port]
30Running command terraform with args [output -no-color -json port]
31"9090"
32HTTP GET to URL http://108.143.19.96:9090
33Making an HTTP GET call to URL http://108.143.19.96:9090
34[...]
35Running command terraform with args [destroy -auto-approve -input=false -var port=9090 -var prefix=test -lock=false]
36[...]
37Destroy complete! Resources: 6 destroyed.
38--- PASS: TestTerraformHelloWorldVm (264.28s)
39PASS
40ok  	github.com/VirtusLab/e2e-example    	264.294s
41

Throughout the evaluation, we had no issues, suggesting a smooth and efficient operation. We have:

Created a virtual machine
Created a public IP
Bound the IP to the virtual machine
Hosted http requests on the virtual machine
Received an expected result of the site hosted on the virtual machine
Determined that the passing of variables to the module works
Created and tore down the infrastructure without any issues

Conclusion

It is essential to make Terraform scripts reproducible and reliable to ensure that infrastructure changes are deployed consistently and predictably. By implementing e2e testing in Terraform, you can validate the entire infrastructure stack, from the creation of resources to their configuration and deletion, and ensure that changes don’t cause any unexpected issues or break the existing functionality.

E2e testing also enables versioning of infrastructure code, facilitates collaboration between teams, saves time, reduces costs, and ultimately helps to maintain the overall stability and availability of the system. By combining Terraform’s infrastructure-as-code approach with e2e testing, organizations can achieve greater consistency and reliability in their infrastructure deployments, which is essential for supporting the demands of modern, cloud-based architectures.

Curated by Sebastian Synowiec

E2e infrastructure testing in Terraform: How to make scripts reproducible and reliable

What is Terraform?

How can you leverage Terraform?

Why is it essential to make Terraform scripts reproducible and reliable?

What is end-to-end testing?

E2e testing in Terraform: What are the benefits?

How to implement e2e testing in Terraform?

Why is destroying the e2e testing infrastructure in Terraform important

Example: E2E testing in Terraform of a public exposition of an endpoint

Conclusion

Subscribe to our newsletter and never miss an article

Explore more topics

Reference Architecture: A roadmap to efficiency and scalability

Cloud infrastructure end-to-end or conformance testing: What’s the difference

How to create a reference architecture with Kubernetes on Azure an extensive guide