How to evaluate cloud-native technology and build trust
This article presents cloud-native solutions that successfully passed evaluation and how you can evaluate cloud-native technology yourself.
Evaluation of cloud-native technology is a worthwhile effort. This article presents a selection of cloud-native solutions that successfully passed evaluation and worked efficiently in production. It also describes how you can evaluate technology and tools yourself. By evaluating solutions for your business, you can be confident that your selected technology will meet your needs and work reliably.
The evaluation procedure displayed here represents the unique goals and production methodology of VirtusLab. Therefore, you’ll need to tailor the evaluation method using your own criteria.
Please remember that using specific technologies to tackle problems is a part of a larger strategy and does not solve all issues. At VirtusLab, we combine well-informed technology choices with rigour in execution and clear communication.
Establishing a “good enough” grasp of cloud-native technologies and tools is incredibly challenging these days.
Firstly, the tech stack used in modern application development and operations is highly-complex and depends on multiple solution providers. Secondly, there are now more solutions than ever and the pace of cloud-native technical invention is increasingly fast.
A thorough evaluation method is vital since cloud-native technologies provide a launching pad for innovation and competitive advantages. Missing these innovations and competitive benefits might put your leading position at stake.
What can we do about it? We can develop the capacity to continuously evaluate cloud-native technology and tools. Ongoing evaluation is necessary as, month after month, developers release new products into the cloud-native ecosystem.
Evaluation starts when a new solution is released; then it repeats at regular intervals throughout the solution’s life-cycle. Evaluation only halts when a solution is adopted within the tech stack used for live projects or when evaluation reveals that it is too limited for practical use in a production setting.
In short, ongoing tech evaluation helps you discern whether a solution is:
Suited to your needs and ready for use today
Promising but not yet in a ‘ready to use’ condition
Too many limitations, so neither suitable nor ready to use.
The risk diminishes whenever evaluation accurately assigns one of these three categories. The risk is that your business might burn time and money integrating new solutions when their capabilities are overestimated, or they are not ready to work reliably in an operational setting.
This section shows a stack of tech proven in business since it moved from evaluation to production. This tech stack comes from Virtuslab, so the selections reflect the company’s deep involvement in cloud-native engineering. In addition, you can find notes in the sections below that explain each technology’s use, the situation it works best in, and the benefits it provides.
Despite these general benefits, public cloud offerings have their own strengths and weaknesses. As a result, there’s no obvious recommendation of one provider to suit everyone. Consequently, many organisations adopt a multi-cloud strategy to choose the best one according to each project’s needs. Before opting for a multi cloud strategy, let’s see what the largest providers have to offer. After all, using a single provider may prove sufficient while also making things simple, in one respect at least.
Microsoft Azure – This cloud has matured a lot in the past few years. It’s become more stable and it also provides a lot of managed services, monitoring and security tooling. Microsoft’s offering and engagement model is suitable for large enterprises.
Google Cloud Platform (GCP) – Bleeding edge technology, Google is innovating in the Kubernetes ecosystem, releasing new cloud-native technologies quickly and often; also provides solid edge infrastructure support with Anthos
Amazon Web Services (AWS) – Main player in the market, mature cloud and good automation capabilities (CloudFormation), serverless leaders (AWS Lambda). AWS works well for rapid prototyping and startups.
Public cloud evaluation needs to take into consideration:
Requirement of regulatory compliance, and security policies
Current technology domain, whether this is data engineering, eventing or simple microservices architecture
Organisation size and their current skills in this area
When building scalable cloud-native infrastructure, automation technology helps make resource management in a cloud environment much easier, especially when used with GitOps to create a robust end-to-end solution. Furthermore, a high level of automation can be achieved by putting a strong emphasis on a single source of truth for automation.
Here are the main approaches to automation and infrastructure provisioning, along with examples of technology solutions that have passed our evaluation.
Infrastructure as Code (IaC): Terraform, CloudFormation
Terraform is the leading solution here. It provides a declarative method for building infrastructure, which makes it easy to modularise and test codebases.
CloudFormation is IaC built into AWS which means it is always up to date with AWS upstream changes, new APIs and so on. Other differentiators are built-in state management and support for nested stacks.
Infrastructure as Software (IaS): Pulumi – Pulumi demonstrates a relatively new practice when it comes to the automation of infrastructure. It’s defining characteristics are:
IaS is the most natural approach to tackling infrastructure complexity for anyone who knows how to write software.
If we can model the system as a graph of resources and use APIs to manipulate those resources, then we can use programming languages to build this system.
IaS is one level above IaC in terms of expressibility.
Everyone who uses IaC, has to start programming at some point. IaC as a task consists of a mixture of scripting languages, build tools and IaC Domain Specific Languages (such as Terraform). A more scalable approach would be to just use a general purpose programming language from the start – in other words, the approach used in IaS.
These are a few of the issues to consider when evaluating automation technology:
Manage the consistency of cloud infrastructure between intended state, defined via Infrastructure as Code, and the actual state of the system.
Handling edge cases and backward incompatible changes. Especially when operating production-grade infrastructure with live traffic.
Interoperability with other technology in this area.
If technology is keeping up with upstream changes and APIs released by cloud providers.
Make sure the evaluation process includes these factors, as a minimum. If the evaluation is missing or faulty, the consequences of choosing an approach and technology that’s not a good fit include:
Development, staging environments not fully compatible with production environment – environments inconsistency problem
Creates friction for application delivery, slowing down everything
Doesn’t support automation capabilities of existing technology choices.
Container orchestration is a key enabler of agility and short lead time to deployment for large software development projects. The best-known orchestrator, Kubernetes, provides one single entry-point to managing both infrastructure and applications. Staying true to open standards in technology, Kubernetes adds flexibility between different clouds, interoperability with other technologies, and lower entry barriers for developers as they only need to learn one tech.
While Kubernetes is the same everywhere, extensions and addons tend to be unique to each public cloud provider, which limits interoperability and may create a challenge in multi-cloud environments.
Kubernetes-based container orchestration solutions that have passed our evaluation are:
Azure Kubernetes Service – Microsoft’s flavour of Kubernetes is cost-effective and has a lot of addons integrated with the wider MS Azure tech ecosystem.
Elastic Kubernetes Service – Kubernetes from AWS, the most popular and market-leading container orchestrator. Takes a hands-off approach giving flexibility and responsibility to the customer.
Google Kubernetes Engine – Google’s Kubernetes is a clear leader in terms of developer experience and number of features supported. Lags behind AWS and Azure in terms of adoption and usage.
Google Anthos – support in running Kubernetes on premises and on edge infrastructure.
Self-hosted Kubernetes – less popular, adds a lot of control and flexibility but also complexity and maintenance.
These are a few of the issues to consider when evaluating container orchestration solutions:
Integration with built-in and external cloud identity access management systems
Support for seamless and in-place upgrades of k8s version
Release and deprecation cycle
Enhanced security and reliability of Kubernetes itself
Make sure the evaluation process includes these factors, as a minimum. If the evaluation is missing or faulty, the consequences of choosing an approach and technology that’s not a good fit include:
Kubernetes is the core component of any cloud infrastructure these days, it implies other technology choices. Many projects start from here.
More hands-off work, requirement for integration with monitoring stack, glueing technologies together, this is extra work and required knowledge.
The features that each cloud provider offers can be very different so look carefully at what’s really needed.
These days modern software is highly distributed and complex to monitor. Humans can no longer reason about the full system status. Every system we operate in production is proactively monitored by automation. The amount of observability data might be overwhelming so the system must be scalable and provide meaningful insights, at the same time avoiding false positives.
Here are the monitoring and observability solutions that have passed our evaluation:
Dashboards: Grafana – the leading solution in this category. It’s hard to find something better and more customizable.
Metrics: Prometheus, Thanos – the leading solution in this category. Thanos adds more advanced capabilities such as a global query view, high availability, data backup with history and cheap data access as its core features in a single binary.
Splunk – managed monitoring solution, often used for security SOC, SIEM, can handle large volumes of data, can create dashboards, alerts in a single place.
These are a few of the issues to consider when evaluating monitoring and observability technology:
Support for real time monitoring and handling large volumes of data
Single glass pane for observability data, monitor multiple systems from one place
Scraping metrics from different sources with no need for custom implementation
Integration with external support, alerting and on-call duty systems
Pull-based vs push-based monitoring
Self hosted monitoring vs SaaS
Make sure the evaluation process includes these factors, as a minimum. If the evaluation is missing or faulty, the consequences of choosing an approach and technology that’s not a good fit include:
Egress data costs money, especially if sending large amounts of data between cloud regions | pull vs push has different tradeoffs (see article above)
Performance bottleneck or incomplete monitoring data leads to undetected/undiscovered incidents
Fragmentation of data, observability data is in various different places, hard to correlate events, reason about “big picture”
Technology leaders need to deliver software quickly and reliably to win in the market.
Deployment solution evaluation results will be of great interest to both business and technology leaders alike as there is a surprisingly strong correlation between organisational and technological performance. We see that, when compared to low performing organisations, the high performing organisations have:
46 times more frequent code deployments
440 times faster lead time from commit to deploy
170 times faster mean time to recover from downtime
5 times lower change failure rate (1/5 as likely for a change to fail)
These figures come from a study that shows organisational market performance and technical performance correlate very closely.
Bringing the focus back to the solutions, here are the deployment technologies that have passed our evaluation and gone on to prove themselves in active use:
Helm – de facto standard when it comes to k8s deployment
Simple, templating engine
Advanced lifecycle hooks
The way to package k8s manifest / k8s app
Helm chart can be published and stored either in Git repository or container registry
GitOps: ArgoCD – new and modern way of working with deployment and continuous deployment
Supports high-level of automation
Declarative configuration in Git approach – single source of truth when it comes to the codebase and system state
Support various plugins, extensibility, for example secret management with SOPS
Web UI which shows the entire system state, easy to see all k8s objects and its state
Provides CLI and k8s API
Automatically syncs Git repo with your cluster
Advanced notification features
GitHub Actions – an emerging trend
Everything is close to the source code, one platform for everything
Community driven plugins ecosystem
Support automation bots for checking code quality, security
Easy external integration with other systems
Azure DevOps – a full CI/CD ecosystem
Azure native approach, works well with Microsoft Azure cloud
Supports self-hosted runners
Built-in secret management
Jenkins – old but still great, good plugin ecosystem, we run it in k8s
GitLab CI – full CI/CD ecosystem, a lot of integrations and plugins
These are a few of the issues to consider when evaluating deployment technology:
Deployment software should enable us to act quickly
We should automate ourselves, software deployment is repeatable and predictable
Depending on a company’s expertise different approaches may be more suitable:
Traditional CI/CD approach – more predictable
GitOps (ArgoCD, Flux, Kubernetes Operators) – more towards autopilot mode
It should provide seamless integration with external artefact storage systems
Make sure the evaluation process includes these factors, as a minimum. If the evaluation is missing or faulty, the consequences of choosing an approach and technology that’s not a good fit include:
Cloud native storage must be highly-available and scalable using a software architecture that can grow with your business. It must also support predictable performance/SLA, be highly consistent (read and write data should return the correct data), and have no delays in operation. Finally, deployment of new storage options must be easy and fast.
These are a few of the issues to consider when evaluating datastores and eventing technology:
Understand the different data types:
Documents (XML, YAML, JSON)
Logs
Time series (metrics)
Media / streaming
FiIes / Blobs
Understand different storage capabilities according to workload
Queue, NoSQL, SQL, KeyValue, Object
Consistency: Eventual vs Strong
Replication, encryption, snapshot, cloning
Interfaces to container runtime and orchestration – it should work with Kubernetes
Infrastructure automation for storage
Role based access control, granular access, protecting data in the cloud, monitor storage policy compliance
Make sure the evaluation process includes these factors, as a minimum. If the evaluation is missing or faulty, the consequences of choosing an approach and technology that’s not a good fit include:
Low-latency performance (QoS, IOPS) and resource quotas – especially in a multi-tenant environment
Applications might not survive restarts and outages
Difficulty moving data and apps between public clouds.
The process of cloud-native technology evaluation always fits within a larger initiative. For example, when a company is well-connected with the technology ecosystem, it sees the current trends and how many projects use the tech under consideration. Additionally, being involved in the cloud-native community and partnering with other mature organisations allows companies to stay on top of technology development.
Review the summarised PoC results with an internal group of solutions architects and expert engineers and/or cloud centre of excellence. Always get feedback from multiple sources to learn the different intentions people have who will use the solution.
The technologies that succeed in evaluation move into a trial period. They are accompanied by internal design documents based on the knowledge gathered during the evaluation. The approach we follow relies on a standard way of Documenting Design Decisions using RFCs and ADRs.
Prepare a document according to these specifications:
Explain in one paragraph the problem space, context or the decision needed.
Why is this solution being implemented? What use cases does it support? What is the expected outcome?
Detailed design section
Explain the design for somebody without deep expertise in technology
Get into specifics and edge-cases and include examples
Explain trade offs, different possible solutions including pros and cons.
Integrate the technology into the developer organisation. First, evaluate current project conditions as these can determine when introducing a new technology is most appropriate. Following this, provide guidance about how to manage the technology enablement process, which looks as follows:
Cloud Native Assessment. Work on a detailed proposal, follow up with some clarification questions, then decide on the initial statement of work.
Support model and pricing. Depending on the size of the organisation, it is worth considering engaging directly with the vendor.
Design and implement reference architecture. Pave the road and establish reusable patterns for other teams within the organisation.
Distribute support documentation. An overview of the usage patterns, examples and how-to guides.
Share knowledge and train team members. Create guardrails that keep the organisation on a safe path.
Measure business benefits and publish case studies.
Now, you have a strategy that enables your company to choose specific, well-suited technologies you can depend upon to perform. Amongst others, the primary gains from this approach are:
A long-term sustainable technology strategy which supports innovation
Keeps your business on top of the cloud-native technology curve
Encourages the greatest people to come and work in an up-to-the-minute yet reliable technological environment.
Still find navigating the cloud-native landscape a challenge? Let’s talk.