Nutanix Xi Clusters in AWS – The True Hybrid
Nutanix brings a first-of-a-kind hybrid cloud offering that delivers true hybridity and true elasticity. Let us take a deeper look:
True Hybridity
- Many customers have existing AWS accounts. True hybridity calls for using the existing AWS accounts, VPCs, VPNs, Direct Connects while bringing the private and public clouds together. With Xi Clusters, current customers of AWS can leverage their existing environments and launch Nutanix Enterprise Cloud OS within their current environments, without the need to create a new AWS account, VPCs or WAN networking.
- True hybridity also allows for bringing together the cloud native services to the classic apps and containers running on Nutanix Enterprise Cloud OS without the need to go via inefficient network gateways or VPC peering. With Xi Clusters, not only can the classic apps be on the same subnets as the cloud-native services and apps, but they can also get native network performance with minimal overheads. This also simplifies migrating apps from the Xi Clusters to AWS EC2 native and vice versa without the need of IP address changes or any network reconfiguration.
- One of the key aspects of hybrid is to be able to manage both the private and public sides of infrastructure through the same console, without adding management overheads on the public cloud side. Xi Clusters simply brings up AOS nodes in AWS bare metal while managing them from existing Prism Central and imposing no networking management VMs or networking gateway VMs.
True Elasticity
- Cloud infrastructure should allow quick burstability. AWS provides an elastic bare metal service in EC2. Xi Clusters allows customers to spin up clusters on demand and in minutes. The cloud infrastructure is available from AWS at an hourly granularity. As the capacity requirement of a cluster increases or decreases nodes can be added or removed on demand.
- Cloud infrastructure should allow for the sporadic nature of business without the need to recreate or migrate the assets each time. Xi Clusters allow for hibernating a running cluster along with its VMs for any period of time into AWS S3 – another first in the industry feature. During hibernation, no compute costs are incurred. Whenever the workloads are required to run again, the Xi Cluster can be resumedand all the workloads are brought to life. This allows for an elastic infrastructure for seasonal but stateful workloads.
Xi Clusters Design Choices
Account Management
We had a choice between creating a new AWS account for the customer or using an existing one to manage Xi Clusters. Using a new AWS account would have given a clean working space and made it somewhat easier to build the product. However, from a customer’s perspective that proved to be less optimal because the customers cannot use their existing accounts and credits with AWS. Hence, we decided to not create the burden of new AWS accounts for the customer. Their current accounts can be used. The customer will be directly billed by AWS for the infrastructure spend and only pay Nutanix for the software cost of using Nutanix for the duration the Xi Clusters are used.
Networking Design
We had a choice between deploying the Nutanix VMs on an overlay network (using VXLAN) on top of the AWS subnets, or deploying the Nutanix VMs directly on the AWS subnets. Deploying an overlay network provides for easier integration with the underlying cloud networking because nothing needs to change in the way the hypervisor does IP address management.
However, choosing an overlay presents many challenges:
- Running an overlay requires management VMs (at least a controller and couple of Network Edge gateways). That overhead presents a challenge to our simple and efficient mantra.
- Encapsulating traffic does present CPU overhead that is non-trivial and achieving bandwidths higher than 10GBits/sec becomes hard.
- When IP addresses on the overlay talk to IP addresses in native AWS EC2, they go through the Network Edge gateways. That creates a performance bottleneck and if not scaled out (causing additional overhead), may lead to a downtime during the upgrades.
Hence, we decided to explore a more native integration with AWS EC2 networking. This new native networking model has the following features:
- There is no overlay needed, hence no VMs that act as network controller or network edge gateways. There are 0 management VMs needed saving expensive resources in the cloud and also reducing complexity of management.
- The VMs running on Nutanix AHV are assigned IP addresses that are provided by native AWS networking and recognized by AWS switching fabric.
- When VMs talk to each other within the Nutanix Xi Clusters or to native EC2 VMs, they do not have to go through any gateways but rather are directly switched by AWS. This allows user VMs to talk natively to cloud services without going through any translation of packets from overlay to underlay. This results in high performance and low latency networking.
- To achieve the above, AHV has been modified to add deep integration for AWS networking.
Xi Clusters Architecture
Xi Clusters are designed to look virtually the same as on-premises Nutanix clusters. These clusters run the complete Nutanix AOS and AHV stack with no change in CLI, UI or APIs. This allows existing IT processes or 3rd party integrations that work on-premises to continue to work with Xi Clusters in AWS.
With Xi Clusters, the complete Nutanix HCI stack runs directly on the AWS EC2 bare metal instances. The bare metal runs the AHV hypervisor and just like any on-premises deployment, runs a Controller Virtual Machine (CVM) with direct access to NVMe instance storage hardware. The Nutanix AOS software provides high-performance, low latency and highly available storage using these local NVMe disks. Xi Clusters running in AWS can be managed by an existing on-premises Prism Central or from a Prism Central deployed on Xi Clusters in AWS.