



A guide to cloud infrastructure testing. Check out test cases, helpful tips and the difference between end-to-end and conformance tests.
The rapid growth of cloud computing left a space for standardisation, leaving the DevOps approach kind of in the dark. The “you built it, you run it” approach became the de facto standard in the industry. Unfortunately, it didn’t scale enough and required specialised skills. Moreover, searching for people with this expertise was challenging and expensive.
According to multiple State of the Cloud surveys, 91% of businesses use public cloud, 76% of companies are already multi-cloud, and the trend is only upward. The amount of information we can retain hits the limits. We see increasing cognitive load caused by emerging technologies, tools, frameworks and new methodologies. That means a lot of code is being actively produced and maintained. That’s why the right infrastructure testing strategy becomes even more crucial today. In our best efforts, we believe we are doing everything to provide the best possible standard. But are we?
In this article, we’ll introduce you to infrastructure testing, provide test cases, and offer advice on how to set up your tests to achieve sustainability and resilience in the cloud.
To assure quality, we write tests with all of the application code. So why is writing tests for infrastructure not a standard practice? Thinking about edge cases for infrastructure requires not only vast knowledge about the proper way of utilising it but also understanding of underlying problems of the used solutions. As the Cloud Team at VirtusLab, we would like to share with you a few long-term observations and our solution for infrastructure testing.
This will help you to pave the way and give some insights to upgrade your infrastructure testing strategy. See how your organisation can benefit from it.
These are our observations on end-to-end infrastructure testing:
Infrastructure is written in stone in configuration files, which come in many different formats such as YAML, JSON, etc. They all have a way of checking spelling and indentation.
The infrastructure testing process can be as simple as generating actual files from templates and evaluating the correctness of the outcome in terms of values and formatting. It can also be as complicated as dry-running configuration files with their dependencies.
We can choose from a variety of linters and code snippets that take care of different aspects of code quality:
If you’re a cloud enthusiast, this deep dive into how to choose the right cloud technology might interest you:
Everything looks good on paper, but things mostly fail when dependencies on other components start to play a role. The following two types of tests need to create the infrastructure – end-to-end tests and conformance tests. What’s the difference between them?
Let’s take a look.
In end-to-end infrastructure testing, we create real components directly via the cloud providers’ API. They are ephemeral in the context of a particular test. We then check if the actual configuration aligns with the desired one and if everything results in the expected behaviour.
The most common test cases consist of checking whether:
On the other hand, applications have specific needs to conform to in terms of security, availability and connectivity. These are the things that can’t be tested thoroughly with E2E tests, because we need to recreate the environment in which the application is running 1:1 to be sure. In these cases, conformance testing can be used. These tests are run inside of the runtime environment, which is Kubernetes in the context of this article. Some things look different inside of the cluster, e.g. the access policies are different for the resources running inside of it than they are for users and other managed identities. Moreover, using conformance testing, we can mock failures of dependent resources and check the efficiency of the disaster recovery tactics.
Let’s take a quick look at the differences between end-to-end and conformance tests below:
End-to-end Infrastructure testing | Conformance testing | |
---|---|---|
Location | Infrastructure is created in the cloud, and tests are executed outside the Kubernetes cluster. | Infrastructure is created in the cloud, and tests are run in the Kubernetes cluster, creating Kuberbetes specific resources. |
Scope | Check standalone resources and their configurations and connectivity, outside the Kuberbetes cluster scope. Create, update, delete actions on resources. | Cluster-specific tests. Check if workloads have applied policies, access rights, connectivity to external resources (even in separate workspaces, like cloud environments), role assignments, DNS resolving, security and stability in case of disruptions. |
Use-cases | Run to test if the infrastructure is created correctly. | Run after creation and update of the infrastructure, to check if it is chreated/updated correctly or periodically to check its health or its behavious in case of disruptions. |
State after testing | Infrastructure is destroyed at the end of the test. | Infrastructure can remain undisrupted or disrupted. For periodic checks, only non-disruptive tests should be run, and only additional resources created specifically for testing purposes are deleted. |
E2E infrastructure testing can be divided into given-when-then parts, as any other test. The only difference is what comprises these parts. The illustration below gives us some overview.
Our team at Virtuslab has created a custom E2E framework, written in Golang, to encapsulate test logic for Terraform modules. It uses two modes:
In each of these contexts, resource templates are rendered from actual module directories. We can pass the necessary configuration variables to test specific use cases. All these rendered terraform files are then created in the actual cloud environment by running the init, plan and apply steps. By calling everything from code and using libraries like “Terratest” and wrapping them in additional logic, we get a custom library where we have all the necessary variables in the code that are ready to use, even the runtime values from Terraform outputs. We can easily use them for calling actions on the resources. That way, we can compare the desirable configurations with the actual remote state from the cloud. The additional advantage is that we can ensure that the providers and cloud API don’t inject any default values that would break our desired outcome.
Infrastructure testing scenarios differ for each resource. We can check network connectivity via different protocols, success of the CRUD operations, existence of secondary resources and configurations, etc. As all the infrastructure creation, updating and changing configurations need time to propagate, resource and status polling are a must.
The simplest example can be creating a Key Vault.
Let’s use the given-when-then diagram from above for better understanding.
In addition to the end-to-end test, conformance tests are run to check the infrastructure’s overall health and troubleshoot problematic features in runtime. They usually run in Kubernetes, in isolation, in dedicated namespaces with test resources. Such tests must be non-disruptive for existing infrastructure as well.
The most popular tool used for such tests is VMWare Tanzu’s Sonobuoy.
As Sonobuoy is designed to run in-cluster tests, we extend its use case to running a variety of different types of tests such as in-cluster connectivity, testing authorisation and lifecycle management. This gives us a nice baseline for future customisation.
In the case of extended conformance tests, the infrastructure already exists. So we only need to either try to create a working connection to a resource and trigger some logic or create a secondary resource to do it for us. As with the E2E test, this can be nicely divided into phases:
To be more descriptive, we have an example below of a test case checking the ability to pull images from Azure Container Registry into the Azure Kubernetes Service cluster. We can divide this simple test into the following steps:
func TestJobPullFromACR(t *testing.T) {
f := features.New("Pull from ACR").
Setup(func(ctx context.Context, t *testing.T, cfg *envconf.Config) context.Context {
var err error
ctx, err = createNSForTest(ctx, cfg, t)
require.NoError(t, err)
namespace := fmt.Sprint(ctx.Value(contextNamespaceNameKey))
t.Logf("Creating pod with testing image")
pod := buildPullACRImagePod(namespace)
assert.NoError(t, cfg.Client().Resources(namespace).Create(ctx, pod))
t.Logf("Pod %s/%s scheduled", namespace, podName)
return ctx
}).
Assess("Pull image from ACR", func(ctx context.Context, t *testing.T, cfg *envconf.Config) context.Context {
namespace := fmt.Sprint(ctx.Value(contextNamespaceNameKey))
pod := &corev1.Pod{ObjectMeta: metav1.ObjectMeta{Name: podName, Namespace: namespace}}
err := wait.For(conditions.New(cfg.Client().Resources()).PodRunning(pod), wait.WithImmediate(), wait.WithTimeout(time.Minute), wait.WithInterval(time.Second))
if assert.NoError(t, err) {
t.Logf("Pod reached phase 'Running'")
} else {
t.Logf("Pod didn't reach phase 'Running': %s", err.Error())
}
return ctx
}).Teardown(func(ctx context.Context, t *testing.T, config *envconf.Config) context.Context {
assert.NoError(t, deleteNSForTest(ctx, config, t))
return ctx
}).Feature()
testEnvironment.Test(t, f)
}
func buildPullACRImagePod(namespaceName string) *corev1.Pod {
acrName := buildAzureResourceNameWithoutHyphens("eun", "containerregistry")
return &corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: podName,
Namespace: namespaceName,
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "pull-from-acr-container",
Image: fmt.Sprintf("%s.azurecr.io/conformance-testing:latest", acrName),
},
},
},
}
}
Having gone through all these test types, we are ready to put together a solution that will make our infrastructure as resilient as possible. After making changes to the code base, the CI pipeline should run all the test types sequentially. Starting with the linting and static code analysis through the E2E and conformance tests. When we are ready to deploy, the CD pipeline should run the non-disruptive conformance tests to ensure that the environment is functioning properly after the update.
Testing cloud infrastructure comes with many hardships, but it definitely pays off. Following the steps presented in this article should get you started. There are many design details specific to each infrastructure that need to be treated separately, but they all fall into one of the main categories you need to remember about.
For end-to end-tests, check:
For conformance tests, check:
Give it a try, and enjoy a more stable and resilient infrastructure!
At VirtusLab, we have a long history of working with cloud solutions. Find out more about what we do by visiting the following pages: