E2e infrastructure testing in Terraform: How to make scripts reproducible and reliable

Author Image

Maciek Gołaszewski

Cloud Engineer

13 minutes read

Terraform helps to eliminate manual processes and human error and speed up the deployment process. It’s no surprise that it has gained significant popularity among DevOps and cloud infrastructure teams. However, as with any automation tool, Terraform could be more foolproof. If used incorrectly, Terraform scripts can become unreliable and difficult to reproduce. This is where e2e-testing in Terraform comes in. By testing Terraform scripts in a way that simulates a real-world environment, teams can ensure that their deployments are reliable and reproducible. 

The reliable execution of e2e testing demands a robust resting architecture with well-crafted scenarios that account for potential discrepancies. Technologies, such as Golang with Terratest library and GitHub Actions, help to establish a reliable Continuous Integration (CI) process during e2e-testing in Terraform.

In this blog post, we’ll explore what Terraform is, how it can be used, and how e2e-testing in Terraform can help teams make their Terraform scripts more reliable and reproducible.

What is Terraform?

Terraform is a powerful open-source tool that automates infrastructure management by defining infrastructure as code. It enables teams to create and manage infrastructure in a repeatable and consistent manner, eliminating the need for manual configuration.

By using Terraform, teams can specify the desired state of their infrastructure using code, and then Terraform applies that code to provision, modify, and delete resources in the cloud. This allows teams to manage their infrastructure more efficiently and scalable, reducing errors and saving time.

Terraform’s support for multiple cloud providers and managing resources, like Kubernetes and GitHub, is a significant advantage, as it allows teams to manage infrastructure across various cloud environments using a single tool. With Terraform, teams can create and manage resources on Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), and many others. Terraform provides a consistent interface for infrastructure management, meaning teams don’t have to learn different tools and processes for each cloud provider.

Another key benefit of Terraform is that it enables teams to version their infrastructure in Terraform code. By using version control tools like Git, teams can track changes to their infrastructure code over time. This makes it easy to roll back to a previous version if necessary, or identify when and why changes were made, or discover unauthorized changes. This versioning capability significantly improves traditional infrastructure management, which often involves manual processes and ad-hoc changes.

Terraform is a valuable tool for teams that need to manage infrastructure at scale across multiple cloud providers. It simplifies infrastructure management, improves consistency and reliability, and enables teams to version their infrastructure code.

How can you leverage Terraform?

You can utilize Terraform in various ways depending on the specific use case and the organization’s needs. Here are some of the most common ways Terraform is used:

  1. Infrastructure provisioning: Terraform allows teams to provision infrastructure on cloud providers such as AWS, Azure, GCP, and others using a declarative language. This means that teams can define the desired state of their infrastructure using code, and Terraform will create and configure the resources necessary to achieve that state.
     
  2. Infrastructure automation: Terraform can be employed to automate the configuration of infrastructure, eliminating the need for manual intervention. For example, teams can use Terraform to create and configure virtual machines, Kubernetes clusters, databases, and load balancers automatically.
     
  3. Infrastructure management: Terraform enables teams to manage their infrastructure as code, which means that infrastructure changes can be versioned, tested, and deployed in a consistent and repeatable way. This improves visibility and control over infrastructure and makes it easier to manage infrastructure at scale.
     
  4. Continuous integration and delivery (CI/CD): Terraform can be integrated with CI/CD pipelines to automate the deployment of infrastructure changes. This allows teams to test and deploy infrastructure changes quickly and consistently, reducing the risk of errors and downtime. Additionally, Terraform is able to prepare a change, which the manager must approve before any change is done.
     
  5. Infrastructure as a service (IaaS): Terraform can be used to provide infrastructure as a service to other teams or departments within an organization. This allows teams to create and manage the infrastructure patterns that others can easily consume, making collaborating and sharing resources easier. This way of sharing infrastructure patterns ensures that teams consistently deploy infrastructure in line with security and governance policies.

Terraform can be operated in various scenarios, from small-scale infrastructure provisioning to large-scale infrastructure management and automation. Its flexibility and support for multiple cloud providers and other internet services make it a popular choice for teams looking to manage infrastructure consistently and repeatedly. It’s worth noting that Terraform prevents multiple changes at the same time, which would render the infrastructure inconsistent. Teams keep then the infrastructure state file in remote storage.

Why is it essential to make Terraform scripts reproducible and reliable?

Reproducible and reliable Terraform scripts ensure that infrastructure changes are consistent and predictable and reduce the risk of errors and downtime. Here are some of the key reasons why reproducibility and reliability are critical in Terraform:

  1. Consistency: By making Terraform scripts reproducible and reliable, teams can ensure that infrastructure changes are consistent across environments. This means that changes made in a development environment can be replicated in a production environment without the risk of unexpected behavior or errors.
     
  2. Predictability: When Terraform scripts are reproducible and reliable, teams can predict how infrastructure changes will affect their applications and services. This means that changes can be tested and validated before they are deployed, reducing the risk of downtime and errors.
     
  3. Automation: Terraform is often used to automate infrastructure management, meaning changes can be made automatically without human intervention. By making Terraform scripts reproducible and reliable, teams can ensure that these automated changes are consistent and predictable, reducing the risk of errors and downtime.
     
  4. Rollbacks: Reproducible and reliable Terraform scripts make it easier to roll back infrastructure changes if necessary. If an error occurs during an update, teams can quickly revert to a previous version of the infrastructure code, reducing the risk of downtime and data loss.
     
  5. Collaboration: When Terraform scripts are reproducible and reliable, they are easier to collaborate. Multiple teams can work on the same codebase, and changes can be reviewed and tested before merging into the main branch. This improves the quality of the infrastructure code and reduces the risk of errors and conflicts.

One way to make your Terraform scripts reproducible and reliable is by using end-to-end (e2e) testing.

What is end-to-end testing?

End-to-end testing (e2e testing) is a software testing technique that involves testing an entire application from start to finish, ensuring that all the components of the application work together as expected. This type of testing is designed to simulate a real user scenario to ensure that the application works as intended in the production environment.

End-to-end testing involves testing all the application layers, including the user interface, application logic, and database. It tests the entire flow of the application, from the user input to the database and back to the user output, to ensure that all the components are integrated and working as expected.

It is typically performed using automated testing tools that simulate real user scenarios, such as filling out a form, submitting data, and verifying the results. The tests can be run on different browsers and operating systems to ensure the application works as expected on all platforms.

End-to-end testing is a crucial part of the software development lifecycle, as it helps to identify issues and defects that other types of testing, such as unit testing or integration testing, might miss: 

E2e testing in Terraform: What are the benefits?

Overall, end-to-end testing in Terraform is an essential practice for ensuring that infrastructure changes are reliable and perform as expected, which is crucial for maintaining the overall stability and availability of the system.

There are several benefits of end-to-end (e2e) testing in Terraform, including:

  1. Validation of the entire infrastructure stack, from the creation of resources to their configuration and deletion. This ensures that the Terraform code works as expected and helps identify any issues or defects before deployment to the production environment.
     
  2. Help to increase confidence in infrastructure changes by ensuring that the changes don’t cause any unexpected issues or break the existing functionality. This helps to reduce the risk of downtime or service interruptions caused by faulty changes.
     
  3. Time and cost reduction by catching issues early in the development process when they are easier and cheaper to fix. This helps to reduce the risk of costly rework and delays caused by faulty changes.
Benefits of e2e Testing for Terraform ScriptsWith e2e TestingWithout e2e Testing
Ensure infrastructure configuration correctness
Validate integration with other components
Catch configuration errors early
Streamline troubleshooting and debugging
Reduce risks of unexpected failures
Increase confidence in the deployment process
Faster development and deployment cycle
Not counting bugs
and fixes for them
Lower initial time and resource investment

Without e2e testing, the time and resources needed to set up, maintain, and execute tests can be reduced. However, it’s essential to consider the trade-offs, as skipping e2e testing might lead to grave issues that could have been detected and resolved during the testing phase. Although a faster development and employment cycle and lower time and resource investment are true, they leave your company exposed to critical bugs and losses and the time and money spent on fixing those.

How to implement e2e testing in Terraform?

End-to-end (e2e) testing of Terraform infrastructure code can be implemented using a combination of Terraform, testing frameworks, and CI/CD pipelines. Here are the steps involved in implementing e2e testing in Terraform:

  1. Define e2e test scenarios: Define the test scenarios you want to run to validate your Terraform infrastructure code. These scenarios should test the entire infrastructure stack from start to finish, including the creation, updating, and deletion of resources.
     
  2. Use testing frameworks: Use a testing framework like Terratest or Kitchen-Terraform to define and run the e2e test scenarios. These frameworks provide functions and helpers that enable you to create and manage test infrastructure and run the tests against your Terraform code.
     
  3. Set up a test environment: Set up a separate test environment that mirrors your production environment. This environment should include all the necessary resources and dependencies for running the e2e tests.
     
  4. Run e2e tests: Run the e2e tests. The tests should create and configure the resources defined in the Terraform code according to the test scenario.
     
  5. Monitor and analyze test results: Monitor the test results to ensure that the tests pass consistently. Analyze the test results to identify any issues or failures and make necessary changes to the Terraform code.
     

Implementing e2e testing in Terraform helps to validate the entire infrastructure stack, from the creation of resources to their configuration and deletion. This ensures that the infrastructure code works as expected and helps identify any issues or defects before deployment to the production environment.

E2e test can be integrated into CI/CD and run automatically as part of the pipeline. This ensures that any changes to the infrastructure code are tested before being deployed to the production environment.

Why is destroying the e2e testing infrastructure in Terraform important

The purpose of e2e testing is to ensure that the infrastructure deployment process is working correctly and that the desired infrastructure is being deployed as expected. However, this might lead to huge data cumulation that increases costs. Therefore, it is vital to destroy the infrastructure after you’re done. This can be done by running the Terraform destroy command to remove all the resources created during the testing process.

Destroying the infrastructure after e2e testing is an important step within the whole process. This is true for a few reasons:

  • Testing clean-up: Cleaning up after resources are redundant ensures that the deletion is a smooth process and does not leave residual resources like disks, virtual networks etc., that may be created as a side effect of creating a more abstract infrastructure.
  • Cost Savings: Infrastructure resources such as cloud instances, databases, and storage can be expensive, especially when left running for an extended period. Destroying the infrastructure after testing, it helps to save costs and avoid unnecessary charges.
     
  • Consistency: Destroying the infrastructure after testing ensures that each test run starts from a clean state. This helps to ensure consistency between test runs and reduces the risk of false positives or false negatives in test results.
     
  • Security: By destroying the infrastructure after testing, it reduces the risk of security breaches or data leaks if the infrastructure is left running and exposed to potential vulnerabilities. But keep in mind that production data shouldn’t be used for test purposes in the first place.
     
  • Resource Clean-up: Destroying the infrastructure after testing helps to clean up any resources that were created during the testing process. This ensures that the testing environment is clean and free from any unnecessary resources that could impact future tests or deployments.

Example: E2E testing in Terraform of a public exposition of an endpoint

In this example, we test the public exposition of an endpoint by a virtual machine. Our assessment covers the creation of the virtual machine itself, the establishment of a public IP address, the binding of the IP to the virtual machine, and the hosting of HTTP requests. 

Additionally, we examine the performance of a site hosted on the virtual machine, ensuring that it meets our expectations. We explore the functionality of passing variables to the module, as well as the seamless creation and teardown of the infrastructure. 

We won’t break down the whole code to make it easier for you to copy if you wish so. We will: 

  • Create a virtual machine
  • Create a public IP, which will be bound to the virtual machine
  • Host http requests on the virtual machine
  • Test if we get the expected outcome
  • Test if the passing of variables to the module works
  • Ensure smooth creation and removal of the infrastructure without encountering any issues

Let’s have a closer look at the Terraform code:

java

terraform {
 required_providers {
   azurerm = {
     source  = "hashicorp/azurerm"
     version = "=3.50.0"
   }
 }
}

provider "azurerm" {
 features {}
}

locals {
 init_script= <<EOF
#!/bin/bash
echo "Hello, World!" > index.html
nohup busybox httpd -f -p 8080 &
EOF
}

variable "prefix" {
 default = "virtuslab"
}

resource "azurerm_resource_group" "rg_vm" {
 name     = "${var.prefix}-resources"
 location = "West Europe"
}

resource "azurerm_virtual_network" "vn_main" {
 name                = "${var.prefix}-network"
 address_space       = ["10.0.0.0/16"]
 location            = azurerm_resource_group.rg_vm.location
 resource_group_name = azurerm_resource_group.rg_vm.name
}

resource "azurerm_subnet" "vns_internal" {
 name                 = "internal"
 resource_group_name  = azurerm_resource_group.rg_vm.name
 virtual_network_name = azurerm_virtual_network.vn_main.name
 address_prefixes     = ["10.0.2.0/24"]
}

resource "azurerm_public_ip" "pip_vm" {
 name                = "myPublicIP"
 location            = azurerm_resource_group.rg_vm.location
 resource_group_name = azurerm_resource_group.rg_vm.name
 allocation_method   = "Dynamic"
}

resource "azurerm_network_interface" "interface_vm" {
 name                = "${var.prefix}-nic"
 location            = azurerm_resource_group.rg_vm.location
 resource_group_name = azurerm_resource_group.rg_vm.name

 ip_configuration {
   name                          = "testconfiguration1"
   subnet_id                     = azurerm_subnet.vns_internal.id
   private_ip_address_allocation = "Dynamic"
   public_ip_address_id          = azurerm_public_ip.pip_vm.id
 }
}

resource "azurerm_linux_virtual_machine" "linux_vm" {
 name                = "example-machine"
 resource_group_name = azurerm_resource_group.rg_vm.name
 location            = azurerm_resource_group.rg_vm.location
 size                = "Standard_DS1_v2"
 disable_password_authentication = false
 admin_username      = "adminuser"
 admin_password      = "Password1234!" # For tests purposes!
 network_interface_ids = [ azurerm_network_interface.interface_vm.id ]
 custom_data = base64encode(local.init_script)

 os_disk {
   caching              = "ReadWrite"
   storage_account_type = "Standard_LRS"
 }
 source_image_reference {
   publisher = "Canonical"
   offer     = "UbuntuServer"
   sku       = "16.04-LTS"
   version   = "latest"
 }
}

output "public_ip" {
 value = azurerm_linux_virtual_machine.linux_vm.public_ip_address
}

Now let’s test it in Golang using Terragrunt:

java

package test

import (
   "fmt"
   "testing"
   "time"

   http_helper "github.com/gruntwork-io/terratest/modules/http-helper"

   "github.com/gruntwork-io/terratest/modules/terraform"
)

func TestTerraformHelloWorldVm(t *testing.T) {
   t.Parallel()

   // Set up vars
   vars := map[string]interface{}{
      "port":   9090,
      "prefix": "virtuslab",
   }
   terraformOptions := &terraform.Options{
      Vars: vars,
   }

   // Cleanup after tests.
   defer terraform.Destroy(t, terraformOptions)

   // Apply infrastructure
   terraform.InitAndApply(t, terraformOptions)

   // Gather data about infrastructure
   publicIp := terraform.Output(t, terraformOptions, "public_ip")
   port := terraform.Output(t, terraformOptions, "port")

   // Check if endpoint is working
   url := fmt.Sprintf("http://%s:%s", publicIp, port)
   http_helper.HttpGetWithRetry(t, url, nil, 200, "Hello, World!", 10, 5*time.Second)
}

And finally, let’s test the output:

bash

$ go test -v -run TestTerraformHelloWorldVm -timeout 30m
=== RUN   TestTerraformHelloWorldVm
Running command terraform with args [init]
Initializing provider plugins...
[...]
Terraform has been successfully initialized!
[...]
Running command terraform with args [apply -input=false -auto-approve -var port=9090 -var prefix=test -lock=false]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # azurerm_linux_virtual_machine.linux_vm will be created
  + resource "azurerm_linux_virtual_machine" "linux_vm" {
  	+ admin_password              	= (sensitive value)
  	+ admin_username              	= "adminuser"
[...]
Apply complete! Resources: 6 added, 0 changed, 0 destroyed.
Outputs:
port = "9090"
public_ip = "108.143.19.96"
[...]
terraform [output -no-color -json public_ip]
Running command terraform with args [output -no-color -json public_ip]
"108.143.19.96"
terraform [output -no-color -json port]
Running command terraform with args [output -no-color -json port]
"9090"
HTTP GET to URL http://108.143.19.96:9090
Making an HTTP GET call to URL http://108.143.19.96:9090
[...]
Running command terraform with args [destroy -auto-approve -input=false -var port=9090 -var prefix=test -lock=false]
[...]
Destroy complete! Resources: 6 destroyed.
--- PASS: TestTerraformHelloWorldVm (264.28s)
PASS
ok  	github.com/VirtusLab/e2e-example    	264.294s

Throughout the evaluation, we had no issues, suggesting a smooth and efficient operation. We have:

  • Created a virtual machine
  • Created a public IP
  • Bound the IP to the virtual machine
  • Hosted http requests on the virtual machine
  • Received an expected result of the site hosted on the virtual machine
  • Determined that the passing of variables to the module works
  • Created and tore down the infrastructure without any issues

Conclusion

It is essential to make Terraform scripts reproducible and reliable to ensure that infrastructure changes are deployed consistently and predictably. By implementing e2e testing in Terraform, you can validate the entire infrastructure stack, from the creation of resources to their configuration and deletion, and ensure that changes don’t cause any unexpected issues or break the existing functionality. 

E2e testing also enables versioning of infrastructure code, facilitates collaboration between teams, saves time, reduces costs, and ultimately helps to maintain the overall stability and availability of the system. By combining Terraform’s infrastructure-as-code approach with e2e testing, organizations can achieve greater consistency and reliability in their infrastructure deployments, which is essential for supporting the demands of modern, cloud-based architectures.

Curated by

Sebastian Synowiec

Liked the article?

Share it with others!

explore more on

Take the first step to a sustained competitive edge for your business

Let's connect

VirtusLab's work has met the mark several times over, and their latest project is no exception. The team is efficient, hard-working, and trustworthy. Customers can expect a proactive team that drives results.

Stephen Rooke
Stephen RookeDirector of Software Development @ Extreme Reach

VirtusLab's engineers are truly Strapi extensions experts. Their knowledge and expertise in the area of Strapi plugins gave us the opportunity to lift our multi-brand CMS implementation to a different level.

facile logo
Leonardo PoddaEngineering Manager @ Facile.it

VirtusLab has been an incredible partner since the early development of Scala 3, essential to a mature and stable Scala 3 ecosystem.

Martin_Odersky
Martin OderskyHead of Programming Research Group @ EPFL

VirtusLab's strength is its knowledge of the latest trends and technologies for creating UIs and its ability to design complex applications. The VirtusLab team's in-depth knowledge, understanding, and experience of MIS systems have been invaluable to us in developing our product. The team is professional and delivers on time – we greatly appreciated this efficiency when working with them.

Michael_Grant
Michael GrantDirector of Development @ Cyber Sec Company