Azure Confidential Computing: Confidential VMs

⚠️ 19 July 2022: Azure Confidential VMs are now generally available. This blog post has been revised.

In my preceding blog post, I had written about the state of confidential computing on Microsoft Azure. More specifically I went over a few key IAAS services and zoomed in on the “Trusted Launch” capability for Azure virtual machines since it is very much a key part of what powers confidential computing virtual machines. Since that post, I’ve felt like I had perhaps spent a disproportionate amount of my time writing about “Trusted Launch”.

With this blog post, I wanted to highlight some interesting things about confidential virtual machines, mainly the ones that are powered by AMD SEV-SNP-enabled processors.

Getting up to speed

If this is your first time reading about confidential computing, don’t worry. In this section, I will attempt to distil a summary…

When running sensitive workloads, you may need to determine how much trust you’re able to place in a cloud provider operator or any other operator that your organisation is working with. If you want to guarantee that your administrators, data center operators or other (il)legal third parties cannot access the data (which is being processed inside of your VM) without your consent, it might be worth looking into Azure’s confidential compute offerings.

💡 The mechanism used for achieving these types of confidentiality guarantees is called a hardware-based trusted execution environment (TEE). By opting to execute computations inside of a TEE, we can prevent unauthorized access or modification of applications and data while they are in use.

Microsoft Azure already protects data that is in-transit (TLS/HTTPS) and at-rest (client and server-side encryption. Creating a mechanism to protect data as it is being processed is something that is certainly worth pursuing, if you ask me.

Azure Confidential VMs can help you get to a place where you can achieve significant confidentiality guarantees, similar to a private data center, but with the scale and flexibility of the Azure platform. This can make Azure Confidential VMs’ value proposition compelling to those who may be on the fence about moving sensitive workloads into an Azure data center.

A tale of two CPU vendors

When picking a confidential VM SKU, you can choose from two CPU flavours. Either you pick a CPU that supports Intel Software Guard Extensions (Intel SGX) or you pick one which supports AMD Secure Encrypted Virtualization - Secure Nested Paging. Picking an SKU will ultimately boil down to how much of a trusted computing base you’re willing to take on. The more code we end up running inside of the trusted execution environment, the larger the trusted computing base becomes!

📖 “The trusted computing base (TCB) refers to all of a system’s hardware, firmware, and software components that provide a secure environment. The components inside the TCB are considered “critical”. If one component inside the TCB is compromised, the entire system’s security may be jeopardized. A lower TCB means higher security.” - Azure docs - Reducing the attack surface

Both vendors essentially have a set of CPU features, that allows them to offer confidential computing capabilities. However, even though both technologies let you create TEEs, both are used for widely different scenarios.

Intel SGX

Intel SGX is a feature for a specific set of Intel processors, that lets you create enclaves. These enclaves can keep data encrypted, as the CPU is processing it. Nothing and no one is able to view the data from the outside.

Think of an enclave as a secured lockbox, you put encrypted code and data inside the lockbox. From the outside, you can’t see anything. The code inside of the lockbox will happily do its thing and optionally output a result to the outside world. More specifically, enclaves get an isolated portion of processor and memory space. Code and data inside of the enclave are inaccessible to the outside and cannot be tampered with due to the CPU blocking off the access. The data cannot go out of the box in an unencrypted form.

An overview of a virtual machine with Intel SGX capabilities. The application running inside an operating system is partitioned into a trusted and untrusted parts. The untrusted code will request to create an enclave when it received a request. The untrusted code will perform the attestation process and call into the enclave via a proxy function. The enclave will then execute a trusted piece of code and return a result to the untrusted part of the application. The untrusted application continues running as normal. Only the enclave portion of the application and the hardware are within the trust boundary. The operating system, Azure Hypervisor, CPU BIOs and device drivers are outside the trust boundary.

It’s important to highlight that with Intel SGX you can not trust the OS even though you are running a Confidential Compute VM SKU. With Intel SGX, you will typically have an untrusted application making API calls into the SGX enclave, where trusted code will perform computations on arbitrary data. In doing so, Intel SGX creates a direct execution to the CPU to remove the guest operating system, host OS, or hypervisor from the trust boundary. This, in turn, makes it so that the trusted computing base is rather small and less susceptible to attacks.

💡 Pick Intel SGX if you ever end up thinking: “For compliance reasons, I can only trust my app code and the Intel chip it runs on. I do not trust the operating system, nor can I trust any of the operators managing the machine.”

I personally think that if you want to take advantage of an Intel SGX confidential virtual machine, you will need to be comfortable with diving into your application code. How deep you need to dive depends on the route that you want to take:

  • By containerizing your application.
    • You essentially will be wrapping your application with the SGX runtime and thus create an SGX-capable confidential container.
  • Refactor parts of your application code to take advantage of Intel SGX.

AMD SEV-SNP

AMD Secure Encrypted Virtualization (SEV) is a technology designed in 2016 to isolate virtual machines from a hypervisor. Even though hypervisors are typically seen as trustworthy components in a security model, there are still customers that can benefit from having VM isolation at the hardware level.

An overview of a virtual machine with AND SEV-SNP capabilities. The operating system has a portion of encrypted memory.  The operating system and the hardware are within the trust boundary. All applications within the operating system are also within the trust boundary. The Azure Hypervisor, CPU BIOs and device drivers are outside the trust boundary

AMD approached this problem by using main memory encryption. Individual virtual machines would get a unique key that was used to automatically encrypt the data that was in use. Should someone inspect the memory of a running guest machine, they’d only see encrypted bytes.

💡 Fun fact: the AMD hardware is designed in such a way that even identical pieces of plain text at different memory locations will be encrypted differently. Neat!

SEV encrypted state (SEV-ES) was introduced shortly thereafter, which encrypted the guest VM CPU register state. This technology ensures that the virtual machine’s register state is encrypted on each hypervisor/guest transition and cannot be read or modified by the hypervisor. This should, perhaps unsurprisingly, protect the workload from unauthorized reading and modification attacks (exfiltration, control flow and rollback attacks).

That brings us to today, or a few years ago (2020), when AMD released “Secure Nested Paging” (SNP). This technology must work in tandem with SEV and SEV-ES and will provide confidentiality guarantees and integrity protection of VM memory, amongst other things. What sort of hypervisor-based threats does this type of protection protect us against?

  • Guest data corruption: when an attack puts some arbitrary garbage bytes of memory, the garbage data does not even have to be encrypted.
  • Guest replay attack: replacing parts of the memory with older ciphertext that was captured at some point in time.
  • Guest memory aliasing attack: mapping two guest pages to the same dynamic RAM (DRAM) page.
  • Memory re-mapping attack: switch dynamic RAM (DRAM) page to a guest page.

From what I was able to gather, many of these integrity guarantees are possible due to a system-wide data structure called the “Reverse Map Table”. Its purpose is to track the owner for each page of memory, which can be owned by the hypervisor, a specific VM or the AMD secure processor. Access to the memory is controlled so that only the owner of a specific page can write to it.

Communications to resources outside of the confidential virtual machine, for instance, when you send a file to a file server, are done via shared memory and are unencrypted! I think this makes sense, if the data is sent out in encrypted form (using the VM’s specific key) the receiver would not be able to interpret the data. In and outbound data must be placed in this shared memory and will be unencrypted, so you must continue to use encryption protocols (such as TLS) and security best practices.

💡 If you’d like to learn more about the internals of AMD SEV-SNP, I highly recommend reading through AMD’s “AMD SEV-SNP: strengthening VM isolation with integrity protection and more“whitepaper. It is quite an interesting read.

AMD SEV-SNP’s implementation of a trusted execution environment offers much more flexibility, compared to Intel SGX. However, because the entire operating system is inside the TEE, the trusted computing base is substantially larger and may be much more susceptible to all sorts of attacks. You should continue to use best practices when it comes to keeping systems up to date: applying security patches to your OS, keeping application dependencies up-to-date, etc…

💡 Pick AMD SEV-SNP if you ever end up thinking: “For compliancy reasons, Microsoft cannot touch anything that may be running inside of my virtual machine, but I do trust the operating system and VM administrators that have access to the machine.”

Full disk encryption

Confidential virtual machines use a small encrypted virtual machine guest state disk worth several megabytes. VMGS encapsulates the VM security state of components such as the vTPM and UEFI bootloader.

Due to the way that SEV-SNP is used to protect data in use, AMD highly recommends that you pair the technology with a full disk encryption solution. Microsoft Azure already provides encryption-at-rest out of the box, however, this is not entirely the same as full disk encryption (FDE).

In a confidential VM setting, full disk encryption will encrypt all disk partitions, including the boot and root partitions. This does not mean that standard disk encryption with Bitlocker or dm-crypt is no longer an option, these options are still available and can work in parallel to FDE.

💡 You could also bring your own key if you want to, which is an added plus when dealing with certain regulatory compliances. You can choose to combine this with Azure Key Vault or its dedicated hardware security module backed sibling, Azure Key Vault Managed HSM.

The Azure CVM team has worked with the Windows and Ubuntu OS teams to ensure that the changes were made at the OS level to support FDE. Should you choose to enable FDE, this will slow down the initial deployment time of the CVM, by a few minutes, since the OS disk must be encrypted before the first boot occurs.

The key that is used by the full disk encryption process is bound to the CVM and protected by its virtual Trusted Platform Module (vTPM). The key itself is “cryptographically sealed to a trusted measurement of the platform”, this means that to recreate the disk encryption key, all boot components (boot loader, etc..) must be in a valid state and your VM must boot. If the system has been tampered with in any way, or the VM has a different vTMP, the disk and its contents will not be accessible.

⚠️ Upon enabling Azure Full Disk Encryption, encrypted OS disks will incur higher costs! This change is because encrypted OS disks use more space, and compression isn’t possible.

Confidential VM and Azure Support

A fun fact in regards to support with confidential VMs; Azure support engineers can’t create a usable memory dump of your VM, even when such an action is required. The reality is that they can create a memory dump, but the contents of the memory dump will be encrypted.

However, the vast majority of VM support tickets typically do not require Azure engineers to take memory dumps of a guest VM. I nevertheless thought that it would be good to highlight this since it does show that a confidential VM with AMD SEV-SNP can deliver on its promises.

Azure CVM quirks

There are only a few particularities to confidential VMs, from both technical and financial perspectives, that we must take into consideration when we want to architect these into our solutions. At the time of writing this post, there are still several technical features that are not available for confidential VM SKUs.

Supported features in DCasv5, DCadsv5, ECasv5 and DCadsv5 SKUs:

Unsupported features in DCasv5, DCadsv5, ECasv5 and DCadsv5 SKUs:

  • Azure Batch
  • Azure Backup (and Restore)
    • Coming soon
  • Azure Dedicated Host
  • Azure Site Recovery
  • Azure Virtual Machine Scale Sets for encrypted OS disks
    • Coming soon
  • Capturing an image of a VM
  • Live Migration
  • Memory Preserving Updates
  • Accelerated Networking
  • Nested Virtualization
  • Azure Compute Gallery
    • Coming soon
  • Shared disks
  • Ultra disks
  • User-attestable platform reports
    • Coming soon

There are two sources that I checked to compile this overview, one being the Azure Virtual Machines SKU documentation and the other being the Azure Confidential VM’s overview documentation.

Deployment of a CVM

Deploying a typical AMD SEV-SNP VM, with full disk encryption using a platform-managed key is incredibly similar to deploying other types of VM which is a very good thing, since that is its value proposition, after all.

There are only a few things that stand out here:

  • Picking the correct VM SKU that supports AMD SEV-SNP.
    • I have limited the available SKUs in the Bicep template.
    • Keep in mind that you can pick SKUs that go up to 96 cores!
  • Enabling Trusted Launch in the VM’s ‘securityProfile’ object.
    • We must enable UefiSettings, to do this we arealso setting the’ securityType’ property of the VM’s ‘securityProfile’ to ‘ConfidentialVM’.
    • We would not be able to enable the vTPM and Secure Boot (which make up Trusted Launch) if we don’t configure these values.
  • To enable full disk encryption (FDE), we set the ‘securityEncryptionType’ property to ‘DiskWithVMGuestState’.
  • Pick a generation 2 OS image.
targetScope = 'resourceGroup'

@description('Required. Name of the Virtual Machine.')
param vmName string

@description('Required. Location of the Virtual Machine.')
@allowed([
  'WestUS'
  'NorthEurope'
  'WestEurope'
])
param location string

@description('Required. Admin username of the Virtual Machine.')
param adminUsername string

@description('Required. Password or ssh key for the Virtual Machine.')
@secure()
param adminPasswordOrKey string

@description('Optional. Size of the VM.')
@allowed([
  'Standard_DC2as_v5'
  'Standard_DC2ads_v5'
  'Standard_EC2as_v5'
  'Standard_EC2ads_v5'
  // goes up to 96 core variants
])
param vmSize string = 'Standard_DC2as_v5'

@description('Optional. OS Image for the Virtual Machine')
@allowed([
  'Windows Server 2022 Gen 2'
  'Windows Server 2019 Gen 2'
  'Ubuntu 20.04 LTS Gen 2'
])
param osImageName string = 'Windows Server 2022 Gen 2'

@description('Optional. OS disk type of the Virtual Machine.')
@allowed([
  'Premium_LRS'
  'Standard_LRS'
  'StandardSSD_LRS'
])
param osDiskType string = 'Premium_LRS'

@description('Optional. Type of authentication to use on the Virtual Machine.')
@allowed([
  'password'
  'sshPublicKey'
])
param authenticationType string = 'password'

@description('Optional. Enable boot diagnostics setting of the Virtual Machine.')
@allowed([
  true
  false
])
param bootDiagnostics bool = false

@description('Optional. Specifies the EncryptionType of the managed disk. It is set to DiskWithVMGuestState for encryption of the managed disk along with VMGuestState blob, and VMGuestStateOnly for encryption of just the VMGuestState blob. NOTE: It can be set for only Confidential VMs.')
@allowed([
  'VMGuestStateOnly' // virtual machine guest state (VMGS) disk
  'DiskWithVMGuestState' // Full disk encryption
])
param securityType string = 'DiskWithVMGuestState'

var imageList = {
  'Windows Server 2022 Gen 2': {
    publisher: 'microsoftwindowsserver'
    offer: 'windowsserver'
    sku: '2022-datacenter-smalldisk-g2'
    version: 'latest'
  }
  'Windows Server 2019 Gen 2': {
    publisher: 'microsoftwindowsserver'
    offer: 'windowsserver'
    sku: '2019-datacenter-smalldisk-g2'
    version: 'latest'
  }
  'Ubuntu 20.04 LTS Gen 2': {
    publisher: 'Canonical'
    offer: '0001-com-ubuntu-confidential-vm-focal'
    sku: '20_04-lts-cvm'
    version: 'latest'
  }
}

var virtualNetworkName = '${vmName}-vnet'
var subnetName = '${vmName}-vnet-sn'
var subnetResourceId = resourceId('Microsoft.Network/virtualNetworks/subnets', virtualNetworkName, subnetName)
var addressPrefix = '10.0.0.0/16'
var subnetPrefix = '10.0.0.0/24'

var isWindows = contains(osImageName, 'Windows')

resource publicIPAddress 'Microsoft.Network/publicIPAddresses@2019-02-01' = {
  name: '${vmName}-ip'
  location: location
  sku: {
    name: 'Basic'
  }
  properties: {
    publicIPAllocationMethod: 'Dynamic'
  }
}

resource networkSecurityGroup 'Microsoft.Network/networkSecurityGroups@2019-02-01' = {
  name: '${vmName}-nsg'
  location: location
  properties: {
    securityRules: [
      {
        name: (isWindows ? 'RDP' : 'SSH')
        properties: {
          priority: 100
          protocol: 'Tcp'
          access: 'Allow'
          direction: 'Inbound'
          sourceAddressPrefix: '*'
          sourcePortRange: '*'
          destinationAddressPrefix: '*'
          destinationPortRange: (isWindows ? '3389' : '22')
        }
      }
    ]
  }
}

resource virtualNetwork 'Microsoft.Network/virtualNetworks@2019-09-01' = {
  name: virtualNetworkName
  location: location
  properties: {
    addressSpace: {
      addressPrefixes: [
        addressPrefix
      ]
    }
    subnets: [
      {
        name: subnetName
        properties: {
          addressPrefix: subnetPrefix
          networkSecurityGroup: {
            id: networkSecurityGroup.id
          }
        }
      }
    ]
  }
}

resource networkInterface 'Microsoft.Network/networkInterfaces@2019-07-01' = {
  name: '${vmName}-nic'
  location: location
  properties: {
    ipConfigurations: [
      {
        name: 'ipconfig1'
        properties: {
          privateIPAllocationMethod: 'Dynamic'
          subnet: {
            id: subnetResourceId
          }
          publicIPAddress: {
            id: publicIPAddress.id
          }
        }
      }
    ]
    networkSecurityGroup: {
      id: networkSecurityGroup.id
    }
  }
  dependsOn:[
    virtualNetwork
  ]
}

resource confidentialVm 'Microsoft.Compute/virtualMachines@2021-11-01' = {
  name: vmName
  location: location
  properties: {
    diagnosticsProfile: {
      bootDiagnostics: {
        enabled: bootDiagnostics
      }
    }
    hardwareProfile: {
      #disable-next-line BCP036
      vmSize: vmSize
    }
    storageProfile: {
      osDisk: {
        createOption: 'FromImage'
        managedDisk: {
          storageAccountType: osDiskType
          securityProfile: {
            securityEncryptionType: securityType
          }
        }
      }
      imageReference: imageList[osImageName]
    }
    networkProfile: {
      networkInterfaces: [
        {
          id: networkInterface.id
        }
      ]
    }
    osProfile: {
      computerName: vmName
      adminUsername: adminUsername
      adminPassword: adminPasswordOrKey
      linuxConfiguration: ((authenticationType == 'password') ? json('null') : {
        disablePasswordAuthentication: 'true'
        ssh: {
          publicKeys: [
            {
              keyData: adminPasswordOrKey
              path: '/home/${adminUsername}/.ssh/authorized_keys'
            }
          ]
        }
      })
      windowsConfiguration: (!isWindows ? json('null') : {
        enableAutomaticUpdates: 'true'
        provisionVmAgent: 'true'
      })
    }
    securityProfile: {
      uefiSettings: {
        secureBootEnabled: true
        vTpmEnabled: true
      }
      securityType: 'ConfidentialVM'
    }
  }
}

Once the FDE process has been completed, you should have a running confidential VM. As we discussed earlier, enabling FDE can slow down the process of creating a VM by a few minutes.

As you can tell from the screenshot below, the features we’ve requested have been enabled.

A screenshot of the Azure Portal. The virtual machine plan is set to ‘2022-datacenter-smalldisk-g2’ and the virtual machine generation is set to ‘V2’. The virtual machine its security type is set to ‘confidential’, and the secure boot and vTPM features are set to ’enabled’. The virtual machine OS disk has no encryption at the host level and the Azure Disk encryption is not enabled.

If we open up the disk’s blade, you should see that the OS disk has “Server-side encryption with platform-managed keys and Confidential VM with platform-managed keys” enabled.

A screenshot of the disk’s blade, OS disk has “Server-side encryption with platform-managed keys and Confidential VM with platform-managed keys” support enabled.

We can also take a look at the OS disk encryption by inspecting the managed disk resource itself. As expected, this tells us three things:

  • “Confidential compute encryption” has been enabled.
  • “Confidential compute encryption type” has been set to “confidential disk encryption with a platform-managed key”.
  • The “Encryption type”, used for encryption at rest, is set to use a platform-managed key which is also the default.

A screenshot of the disk’s blade, OS disk encryption. “Confidential compute encryption” has been enabled.” Confidential compute encryption type” has been set to “confidential disk encryption with a platform-managed key”. The “Encryption type”, used for encryption at rest, is set to use a platform-managed key which is also the default.

On the operating system side of things, we can view whether or not your VM SKU contains the AMD SEV-SNP features. We can query this CPU property in Windows through the msinfo32 application, simply look for the ‘Virtual Machine Isolation properties

A screenshot of ‘msinfo32’, a.k.a. system information. The “Virtual Machine Isolation properties” key has a value set to ‘AMD SEV-SNP’. Virtual machine isolation is set to ’enabled’.

On Linux you can use cpuid by querying a specific leaf, to get similar information.

sudo apt update
sudo apt install cpuid
cpuid -l 0x4000000C -1 | awk '$4 ~ /^ebx=.*2$/ { print "AMD SEV-SNP is enabled"}'

# AMD SEV-SNP is enabled

Even though we can check these values and gain some degree of confidence that nothing has been tampered with, Microsoft warns against checking the CPU features with CPUID or msinfo32, as this is does not provide the same guarantees as the attestation process.

📖 “Attestation is the process by which one party, the verifier, assesses the trustworthiness of a potentially untrusted peer, the attester. The goal of attestation is to allow the verifier to gain confidence in the trustworthiness of the attester by obtaining an authentic, accurate, and timely report about the software and data state of the attester.” - Confidential Compute Consortium’s “A Technical Analysis of Confidential Computing v1.2

According to the CVM FAQ, confidential virtual machines will let users perform independent attestation for their virtual machines once the product enters general availability. During the preview, you will need to place your trust in the way Azure handles CVM attestation with Azure Attestation.

Conclusion

I did not have to delve this deep into the inner workings of AMD SEV-SNP and how it all makes sense in the confidential virtual machine space, but I am glad that I did. I felt that this was a fun topic to dive in to because it gave me the opportunity to look up and learn about very processor specific features.

I should mention again that the value proposition from this offering stems from the fact that you actually do not need to learn anything about AMD SEV-SNP. Instead the confidential VM offering with AMD SEV-SNP has been designed to be the one button approach to bubble-wrapping your virtual machine with confidential compute goodness. Even though it is still missing a few crucial features (Azure Backup, VMSS and user-attestable reporting), I’m optimistic that these will become available soon.

I should also mention that Eden Cohen, who works for Microsoft, recently delivered a great webinar that goes over many of the things I wrote about and paints a very clear picture of how Azure confidential virtual machines are evolving. I really recommend checking out this webinar if you’d like some additional information.

By the way, I think confidential computing is a complex topic. I might have had an inaccuracy or two, maybe even three, creep into this post. If you’ve spotted one, or several, please let me know. I will be forever grateful. 😅