Designing and implementing a multi-cloud architecture

From the time the first computer was created, enterprises have been dealing with the vendor lock-in problem. Regardless of the technology, there is a constant push/pull between the temptation of using vendor-specific features and using industry-wide standards to switch vendors if licensing costs get out of control with a particular vendor.

A simple example of this is ANSI SQL and Oracle. If an enterprise is using Oracle for the database applications and they stick to the ANSI SQL standard, it greatly simplifies the migration of these applications from Oracle to other database vendors. But historically Oracle (as well as other vendors) have been offering additional features and functionality that do not adhere to the ANSI SQL standards. Developers are then tempted (or forced) to use these extensions and migration out of Oracle gets complicated because the code providing this additional functionality needs to be refactored or it will be lost during the migration process.

This duality of standards and common functionality vs. native features persists and is amplified in the cloud.

Any cloud provider with any kind of market share offers basic compute (Windows and Linux), database, and object storage solutions and it is simpler to migrate these basic services from one cloud service provider (CSP) to another.

They also offer some differentiated services and features that would not be so easy to migrate to another provider and may require some level of re-platforming and code refactoring. An example of this is AWS CloudFormation and Azure Resource Manager. Converting one of these scripts to the other is not a trivial effort.

The purpose of this article is not to sway you into using a multi-cloud approach in your organization or dissuade you from using it. The purpose is to go over some of the benefits and pitfalls of such an approach to hopefully assist you in deciding what approach is best for you and similarly try to avoid some of the pitfalls.

Let’s start with the definition of two common and similar terms that come up in this context.

Hybrid cloud vs multi-cloud

There are plenty of definitions of these two concepts. For the purposes of this article, we’ll define them as follows.

Hybrid cloud – A hybrid cloud is a computing environment that combines an on-premises data center (also called a private cloud) with a public cloud, allowing data and applications to be shared between them.

Multi-cloud – A multi-cloud is a computing environment that is deployed across multiple public clouds. If an enterprise uses more than one cloud service provider for their cloud hosting, storage, and the full application stack needs they are said to be using a multi-cloud approach.

For this article, we’ll focus more on multi-cloud deployments and less so on a hybrid cloud deployment. As you can imagine, a hybrid cloud approach adds yet another level of complexity.

Now as you can imagine there are many strategies and approaches on how to implement a multi-cloud strategy. Probably, the most common multi-cloud design is an accidental design or a “victim of circumstances” design. A common way that enterprises land with multi-cloud deployments is through mergers and acquisitions. A company’s leadership decides it makes business sense to acquire or merge with a competitor. Oftentimes, this decision is done due to business reasons, and the merger of the companies’ technology stack is an afterthought. If the merging companies have different CSPs, you will have a multi-cloud environment post-merger. At that point, the IT staff may decide to homogenize their cloud environment or stay as a multi-cloud environment. But if they decide to stay multi-cloud, there will still be a lot of work to ensure that both environments function smoothly. Often, the most likely scenario is to consolidate some of the applications and keep others separate but integrated with each other.

Regardless of how you get there, there are different KPIs that you will want to improve and optimize as you continue to evolve your multi-cloud strategy.

Multi-cloud KPIs and metrics

Different companies will put different priorities on certain KPIs and metrics. These metrics will probably vary across applications. For some mission-critical applications, availability may be more important than cost. For some other applications, keeping costs to a minimum may be more relevant. Some of these metrics may be juxtaposed so you may not be able to fully optimize all of them. For example, making an application highly fault-tolerant will be more expensive than applications that can afford some downtime. Let’s go over some of these metrics.

Cost Reduction

Certain services can be considered as “commodity” services and tools are starting to emerge that enable enterprises to migrate workloads from one Cloud Service Provider (CSP) to another with low friction. Some examples of these commodity services are:

  • AWS EC2/Azure VM/Google Compute Engine
  • Containers, Kubernetes, Docker
  • AWS S3/Azure Blob/Google storage

Some of the services starting to emerge to lower migration friction are:

Business Continuity and Disaster Recovery

Cost reduction may be achieved by migrating some functionality from one CSP to another. In that case, when the application is migrated from one CSP to the other, the original deployment will likely be decommissioned. Otherwise, there will not be any cost savings. Another reason to use a multi-code deployment is to maintain business continuity and for disaster recovery purposes. In this kind of deployment, the infrastructure that’s supporting the application will be running in both environments at the same time and if one of the CSPs suffers an outage, the other CSP can take the load and enable the business to continue as usual.

The services needed to support this kind of deployment will be different than the migration services listed in the section above. Again, this kind of deployment is just beginning to emerge and add a level of complexity to the environment. An example of this kind of architecture would be a multi-cloud Kubernetes (K8s) deployment. Here are some of the third-party vendors that are beginning to provide cross-cloud K8s solutions:

Best of Breed Adoption

Certain proprietary native services from some of the CSPs are gaining quite a bit of market traction. In some cases, it may make sense to get locked into a vendor’s service even though migrating out of that service in the future may prove to be cost-prohibitive. Possible examples:

Environmental homogeneity

All things being equal, it makes sense to have the various cloud environment be as homogeneous as possible. Homogeneity will result in lower training, deployment, and development costs among other costs. CSPs have little incentive to keep environments homogeneous across cloud providers. Sometimes the environments will be different because they are better. They have more features and functions, they are faster, or they have services that don’t exist in other cloud providers. Sometimes they are different because there were built differently but there is no compelling evidence that it’s better than similar offerings.

There are however a variety of 3rd party vendors that see this need in the market and create homogeneity across cloud providers. Three of the best examples of such vendors are:

  • Terraform – A cross-cloud Infrastructure as Code scripting provider
  • Snowflake – A cross-cloud highly-scalable database provider
  • Databricks – A cross-cloud Spark-based data processing provider with sophisticated machine learning offerings.

Keep in mind that none of these solutions are open-source and using any of these solutions shifts the vendor lock-in to another vendor. However, it does minimize the dependency on a single cloud provider.

Microsoft/AWS/Google MAG CSPs also provide services that enable other environments to act as an extension of the cloud. A description of them follows:

Azure Arc

Azure Arc is a set of technologies that brings Azure security and cloud-native services to hybrid and multi-cloud environments. It enables you to secure and govern infrastructure and apps anywhere, build cloud-native apps faster with familiar tools and services to run them on any Kubernetes platform, and modernize your data estate with Azure data and machine-learning services.

Azure Stack

Azure Stack enables building, deploying, and running computing apps consistently across a heterogeneous IT ecosystem, with flexibility for diverse workloads. Extends Azure services and capabilities to non-Azure environments

  • On-prem datacenter
  • Edge locations
  • Remote offices

Google Anthos

Google Anthos unifies the management of infrastructure and applications across on-premises, edge, and in multiple public clouds with a Google Cloud-backed control plane for consistent operation at scale.

  • Build, deploy, and optimize apps on GKE and VMs anywhere—simply, flexibly, and securely
  • Consistent development and operations experience for hybrid and multi-cloud environments

AWS Outposts

AWS Outposts is a family of fully managed solutions delivering AWS infrastructure and services to virtually any on-premises or edge location for a truly consistent hybrid experience. AWS Outposts solutions allow you to extend and run native AWS services on-premises and are available in a variety of form factors.

AWS Outposts enables some AWS services to run locally and connect to a broad range of services available in the local AWS Region. It allows users to run applications and workloads on-premises using familiar AWS services, tools, and APIs. AWS Outposts supports workloads and devices requiring low latency access to on-premises systems, local data processing, data residency, and application migration with local system interdependencies.

Cloud service providers extension service use cases

As impressive as it sounds that you can run Azure services in AWS, or AWS services on-prem, having this capability is not a silver bullet that should be used to homogenize your systems. There may be some use cases where it makes sense to deploy Anthos, Outposts, or Stacks, but using any of these services as a long-term enterprise-wide solution is not cost-effective and does not eliminate the vendor lock-in problem.

These services are more often used on on-premises systems and not with other cloud providers.

Conclusion

Multi-cloud architectures offer benefits for today’s enterprise and more importantly they can enable them to deliver business value, including cost optimization, best-of-breed service enablement, and business continuity. However, it also may have some pitfalls and drawbacks like increased training costs, higher latency, and added complexity.

Developing and implementing a multi-cloud strategy to meet a particular enterprise’s requirements takes careful analysis, planning, and expertise. Make sure to ensure that you engage people with the expertise and experience with multi-cloud environments before embarking on your multi-cloud journey.