Evaluation of cloud-native technology is a worthwhile effort. This article presents a selection of cloud-native solutions that successfully passed evaluation and worked efficiently in production. It also describes how you can evaluate technology and tools yourself. By evaluating solutions for your business, you can be confident that your selected technology will meet your needs and work reliably.
The evaluation procedure displayed here represents the unique goals and production methodology of VirtusLab. Therefore, you’ll need to tailor the evaluation method using your own criteria.
Please remember that using specific technologies to tackle problems is a part of a larger strategy and does not solve all issues. At VirtusLab, we combine well-informed technology choices with rigor in execution and clear communication.
Evaluating new cloud-native technology is a vital part of building modern systems
Establishing a “good enough” grasp of cloud-native technologies and tools is incredibly challenging these days.
Firstly, the tech stack used in modern application development and operations is highly complex and depends on multiple solution providers. Secondly, there are now more solutions than ever and the pace of cloud-native technical invention is increasingly fast.
A thorough evaluation method is vital since cloud-native technologies provide a launching pad for innovation and competitive advantages. Missing these innovations and competitive benefits might put your leading position at stake.
What can we do about it? We can develop the capacity to continuously evaluate cloud-native technology and tools. Ongoing evaluation is necessary as, month after month, developers release new products into the cloud-native ecosystem.
Evaluation starts when a new solution is released; then it repeats at regular intervals throughout the solution’s life cycle. Evaluation only halts when a solution is adopted within the tech stack used for live projects or when evaluation reveals that it is too limited for practical use in a production setting.
In short, ongoing tech evaluation helps you discern whether a solution is:
- Suited to your needs and ready for use today
- Promising but not yet in a ‘ready to use’ condition
- Too many limitations, so neither suitable nor ready to use.
The risk diminishes whenever evaluation accurately assigns one of these three categories. The risk is that your business might burn time and money integrating new solutions when their capabilities are overestimated, or they are not ready to work reliably in an operational setting.
Proven cloud-native technologies
This section shows a stack of tech proven in business since it moved from evaluation to production. This tech stack comes from Virtuslab, so the selections reflect the company’s deep involvement in cloud-native engineering. In addition, you can find notes in the sections below that explain each technology’s use, the situation it works best in, and the benefits it provides.
Cloud-native technology: Public cloud
The general benefits of today’s public cloud are already familiar to most of us…
- A flexible pay-as-you-go pricing model
- Compliance with standards, certifications, data protection, etc
- Reliability and performance
- Support is always available.
The major providers in comparison
Despite these general benefits, public cloud offerings have their own strengths and weaknesses. As a result, there’s no obvious recommendation of one provider to suit everyone. Consequently, many organizations adopt a multi-cloud strategy to choose the best one according to each project’s needs. Before opting for a multi-cloud strategy, let’s see what the largest providers have to offer. After all, using a single provider may prove sufficient while also making things simple, in one respect at least.
- Microsoft Azure – This cloud has matured a lot in the past few years. It’s become more stable and it also provides a lot of managed services, monitoring, and security tooling. Microsoft’s offering and engagement model is suitable for large enterprises.
- Google Cloud Platform (GCP) – Bleeding edge technology, Google is innovating in the Kubernetes ecosystem, releasing new cloud-native technologies quickly and often; also provides solid edge infrastructure support with Anthos
- Amazon Web Services (AWS) – Main player in the market, mature cloud and good automation capabilities (CloudFormation), serverless leaders (AWS Lambda). AWS works well for rapid prototyping and startups.
Public cloud evaluation needs to take into consideration:
- Requirement of regulatory compliance, and security policies
- Current technology domain, whether this is data engineering, eventing, or simple microservices architecture
- Organization size and their current skills in this area
- High availability and networking requirements.
Possible issues with public cloud
If the evaluation is missing or faulty, the consequences of choosing the wrong provider may include:
- A longer time to migrate to the new cloud service
- Vendor lock-in
- Technology and services that are not matched closely to current business goals.
Cloud-native technology: Automation and infrastructure provisioning
When building scalable cloud-native infrastructure, automation technology helps make resource management in a cloud environment much easier, especially when used with GitOps to create a robust end-to-end solution. Furthermore, a high level of automation can be achieved by putting a strong emphasis on a single source of truth for automation.
Main approaches in automation and infrastructure provisioning
Here are the main approaches to automation and infrastructure provisioning, along with examples of technology solutions that have passed our evaluation.
- Infrastructure as Code (IaC): Terraform, CloudFormation
- Terraform is the leading solution here. It provides a declarative method for building infrastructure, which makes it easy to modularise and test codebases.
- CloudFormation is IaC built into AWS which means it is always up to date with AWS upstream changes, new APIs, and so on. Other differentiators are built-in state management and support for nested stacks.
- Infrastructure as Software (IaS): Pulumi – Pulumi demonstrates a relatively new practice when it comes to the automation of infrastructure. Its defining characteristics are:
- IaS is the most natural approach to tackling infrastructure complexity for anyone who knows how to write software.
- If we can model the system as a graph of resources and use APIs to manipulate those resources, then we can use programming languages to build this system.
- IaS is one level above IaC in terms of expressibility.
Everyone who uses IaC, has to start programming at some point. IaC as a task consists of a mixture of scripting languages, build tools, and IaC Domain Specific Languages (such as Terraform). A more scalable approach would be to just use a general-purpose programming language from the start – in other words, the approach used in IaS.
Issues with automation and infrastructure provisioning
These are a few of the issues to consider when evaluating automation technology:
- Manage the consistency of cloud infrastructure between the intended state, defined via Infrastructure as Code, and the actual state of the system.
- Handling edge cases and backward incompatible changes. Especially when operating production-grade infrastructure with live traffic.
- Interoperability with other technology in this area.
- If technology is keeping up with upstream changes and APIs released by cloud providers.
It’s essential to incorporate these factors into the evaluation process, at the very least. Inadequate or flawed evaluations lead to choosing unsuitable approaches and technologies, resulting in serious consequences:
- Development, staging environments not fully compatible with production environment – environments inconsistency problem
- Creates friction for application delivery, slowing down everything
- Doesn’t support the automation capabilities of existing technology choices.
Cloud-native technology: Containers Orchestration
Container orchestration is a key enabler of agility and short lead time to deployment for large software development projects. The best-known orchestrator, Kubernetes, provides one single entry point to managing both infrastructure and applications. Staying true to open standards in technology, Kubernetes adds flexibility between different clouds, interoperability with other technologies, and lower entry barriers for developers as they only need to learn one tech.
Kubernetes-based container orchestration solutions
While Kubernetes is the same everywhere, extensions and addons tend to be unique to each public cloud provider, which limits interoperability and may create a challenge in multi-cloud environments.
Kubernetes-based container orchestration solutions that have passed our evaluation are:
- Azure Kubernetes Service – Microsoft’s flavor of Kubernetes is cost-effective and has a lot of add-ons integrated with the wider MS Azure tech ecosystem.
- Elastic Kubernetes Service – Kubernetes from AWS, the most popular and market-leading container orchestrator. Takes a hands-off approach giving flexibility and responsibility to the customer.
- Google Kubernetes Engine – Google’s Kubernetes is a clear leader in terms of developer experience and number of features supported. Lags behind AWS and Azure in terms of adoption and usage.
- Google Anthos – support in running Kubernetes on-premises and on edge infrastructure.
- Self-hosted Kubernetes – less popular, adds a lot of control and flexibility but also complexity and maintenance.
Possible issues with Kubernetes-based container orchestration
These are a few of the issues to consider when evaluating container orchestration solutions:
- Integration with built-in and external cloud identity access management systems
- Support for seamless and in-place upgrades of k8s version
- Release and deprecation cycle
- Enhanced security and reliability of Kubernetes itself
These factors should be integral to the evaluation process, at the very least. Insufficient or flawed evaluations may result in selecting an inappropriate approach and technology, causing negative outcomes:
- Kubernetes is the core component of any cloud infrastructure these days, it implies other technology choices. Many projects start from here.
- More hands-off work, requirement for integration with monitoring stack, gluing technologies together, this is extra work and required knowledge.
- The features that each cloud provider offers can be very different so look carefully at what’s really needed.
Create your robust Kubernetes Architecture on Azure
Learn how to navigate through the complexities of the evolving Kubernetes environment and Azure. In this in-depth guide, we will prepare you to maximize the use of Kubernetes on Azure,
Cloud-native technology: Monitoring and observability
These days modern software is highly distributed and complex to monitor. Humans can no longer reason about the full system status. Every system we operate in production is proactively monitored by automation. The amount of observability data might be overwhelming so the system must be scalable and provide meaningful insights, at the same time avoiding false positives.
Monitoring and observability solutions
Here are the monitoring and observability solutions that have passed our evaluation:
- Dashboards: Grafana – the leading solution in this category. It’s hard to find something better and more customizable.
- Metrics: Prometheus, Thanos – the leading solution in this category. Thanos adds more advanced capabilities such as a global query view, high availability, data backup with history and cheap data access as its core features in a single binary.
- Splunk – managed monitoring solution, often used for security SOC, SIEM, can handle large volumes of data, can create dashboards, and alerts in a single place.
Possible issues with monitoring and observability
These are a few of the issues to consider when evaluating monitoring and observability technology:
- Support for real-time monitoring and handling of large volumes of data
- Single glass pane for observability data, monitor multiple systems from one place
- Scraping metrics from different sources with no need for custom implementation
- Integration with external support, alerting, and on-call duty systems
- Pull-based vs push-based monitoring
- Self-hosted monitoring vs SaaS
Make sure that these factors are considered in the evaluation process as a minimum prerequisite. Incomplete or faulty evaluations can result in the adoption of an approach and technology that isn’t well-suited, leading to unfavorable consequences:
- Egress data costs money, especially if sending large amounts of data between cloud regions | pull vs push has different tradeoffs (see article above)
- Performance bottleneck or incomplete monitoring data leads to undetected/undiscovered incidents
- Fragmentation of data, observability data is in various different places, hard to correlate events, reason about “big picture”
- Blind spots / not enough insights.
Cloud-native technology: Deployment
Technology leaders need to deliver software quickly and reliably to win in the market.
Deployment solution evaluation results will be of great interest to both business and technology leaders alike as there is a surprisingly strong correlation between organizational and technological performance. We see that, when compared to low-performing organizations, the high-performing organizations have:
- 46 times more frequent code deployments
- 440 times faster lead time from commit to deploy
- 170 times faster mean time to recover from downtime
- 5 times lower change failure rate (1/5 as likely for a change to fail)
These figures come from a study that shows organizational market performance and technical performance correlate very closely.
Deployment solutions
Bringing the focus back to the solutions, here are the deployment technologies that have passed our evaluation and gone on to prove themselves in active use:
- Helm – de facto standard when it comes to k8s deployment
- Simple, templating engine
- Advanced lifecycle hooks
- The way to package k8s manifest / k8s app
- Helm chart can be published and stored either in Git repository or container registry
- GitOps: ArgoCD – new and modern way of working with deployment and continuous deployment
- Supports high-level of automation
- Declarative configuration in Git approach – single source of truth when it comes to the codebase and system state
- Support various plugins, extensibility, for example secret management with SOPS
- Web UI which shows the entire system state, easy to see all k8s objects and its state
- Provides CLI and k8s API
- Automatically syncs Git repo with your cluster
- Advanced notification features
- GitHub Actions – an emerging trend
- Everything is close to the source code, with one platform for everything
- Community-driven plugins ecosystem
- Support automation bots for checking code quality, security
- Easy external integration with other systems
- Azure DevOps – a full CI/CD ecosystem
- Azure native approach, works well with Microsoft Azure cloud
- Supports self-hosted runners
- Built-in secret management
- Jenkins – old but still great, good plugin ecosystem, we run it in k8s
- GitLab CI – full CI/CD ecosystem, a lot of integrations and plugins
Possible issues with deployment
These are a few of the issues to consider when evaluating deployment technology:
- Deployment software should enable us to act quickly
- We should automate ourselves, software deployment is repeatable and predictable
- Depending on a company’s expertise different approaches may be more suitable:
- Traditional CI/CD approach – more predictable
- GitOps (ArgoCD, Flux, Kubernetes Operators) – more toward autopilot mode
- It should provide seamless integration with external artifact storage systems
Incorporate these factors into the evaluation process as a fundamental requirement. If the evaluation is deficient or flawed, the potential outcomes of choosing an inappropriate approach and technology are significant:
- Long lead time to deploy
- Resistance to change, problematic rollback
- Not able to deploy applications to multiple clouds
- Potential security threats.
Cloud-native technology: Datastores and eventing
Cloud-native storage must be highly-available and scalable using a software architecture that can grow with your business. It must also support predictable performance/SLA, be highly consistent (read and write data should return the correct data), and have no delays in operation. Finally, the deployment of new storage options must be easy and fast.
Datastores and eventing solutions
Here are the datastores and eventing solutions that have passed our evaluation:
- Kafka – the leading solution in the field of eventing/streaming, scalable and flexible
- CockroachDB – a k8s native, distributed relational data store
- MongoDB – a distributed document datastore, that can be run in k8s
- Prometheus – a time series datastore for metrics.
Possible issues with datastores and eventing
These are a few of the issues to consider when evaluating datastores and eventing technology:
- Understand the different data types:
- Documents (XML, YAML, JSON)
- Logs
- Time series (metrics)
- Media/streaming
- FiIes / Blobs
- Understand different storage capabilities according to workload
- Queue, NoSQL, SQL, KeyValue, Object
- Consistency: Eventual vs Strong
- Replication, encryption, snapshot, cloning
- Interfaces to container runtime and orchestration – it should work with Kubernetes
- Infrastructure automation for storage
- Role-based access control, granular access, protecting data in the cloud, monitoring storage policy compliance
Ensure that these factors are part of the evaluation process at a minimum. If the evaluation is incomplete or flawed, the repercussions of selecting a misaligned approach and technology are significant:
- Low-latency performance (QoS, IOPS) and resource quotas – especially in a multi-tenant environment
- Applications might not survive restarts and outages
- Difficulty moving data and apps between public clouds.
How to evaluate new cloud-native technology: The guide
The process of cloud-native technology evaluation always fits within a larger initiative. For example, when a company is well-connected with the technology ecosystem, it sees the current trends and how many projects use the tech under consideration. Additionally, being involved in the cloud-native community and partnering with other mature organizations allows companies to stay on top of technology development.
This is how it works:
1. Internal proof of concept
Find out if using the solution makes sense or if the problems can be solved using technology that’s already in regular use in the company or industry.
Factors to take into account:
- What are the benefits? Understand how such technology can contribute to business goals and technology strategy.
- How is the solution built? Including software architecture and internal dependencies.
- Does it support the functionalities we need? Verify it against the key functional requirements.
- Does it fit with the technology stack currently used? This includes licensing, integrations, and service management aspects.
2. Verification
Review the summarised PoC results with an internal group of solutions architects and expert engineers and/or cloud center of excellence. Always get feedback from multiple sources to learn the different intentions people have who will use the solution.
The technologies that succeed in evaluation move into a trial period. They are accompanied by internal design documents based on the knowledge gathered during the evaluation. The approach we follow relies on a standard way of Documenting Design Decisions using RFCs and ADRs.
Prepare a document according to these specifications:
- Explain in one paragraph the problem space, context or the decision needed.
- Why is this solution being implemented? What use cases does it support? What is the expected outcome?
- Detailed design section
- Explain the design for somebody without deep expertise in technology
- Get into specifics and edge cases and include examples
- Explain trade-offs, and different possible solutions including pros and cons.
3. Deploy cloud-native technology
Integrate the technology into the developer organization. First, evaluate current project conditions as these can determine when introducing a new technology is most appropriate. Following this, provide guidance about how to manage the technology enablement process, which looks as follows:
- Cloud Native Assessment. Work on a detailed proposal, follow up with some clarification questions, and then decide on the initial statement of work.
- Support model and pricing. Depending on the size of the organization, it is worth considering engaging directly with the vendor.
- Design and implement reference architecture. Pave the road and establish reusable patterns for other teams within the organization.
- Distribute support documentation. An overview of the usage patterns, examples, and how-to guides.
- Share knowledge and train team members. Create guardrails that keep the organization on a safe path.
- Measure business benefits and publish case studies.
Now you know how to evaluate new technology in cloud-native
From now on, you have a strategy that enables your company to choose specific, well-suited technologies you can depend upon to perform. Amongst others, the primary gains from this approach are:
- A long-term sustainable technology strategy that supports innovation
- Keeps your business on top of the cloud-native technology curve
- Encourages the greatest people to come and work in an up-to-the-minute yet reliable technological environment.
Is navigating the cloud-native landscape still a challenge? Take a look at our capabilities page and see for yourself: