Azure Confidential Computing: CoCo - Confidential Containers

At the end of my blog post from April 2022 “Azure Confidential Computing: IaaS”, I wrote about all the different options you have at your in Azure, when it comes to running application in a Confidential Compute setting. I wrapped up the blog on with a contemplative statement:

📖 “I am very tempted to take a closer look and test out how easy it is to lift-and-shift existing container images and run them inside of an enclave. Perhaps I should write about that next. 🤔”

Fast forward almost two years, and it feels like the world of confidential computing has seen significant advancements, especially since early 2022. Speaking subjectively for a moment; there seems to be an uptick in discussions around this topic. In 2023 alone, there were some great sessions about Confidential Computing at conferences like FOSDEM, KubeCon, and Open Source Summit.. To me, this indicated a growing interest and innovation in this space. So it felt like the right time to delve deeper into the various options available and after some consideration, I settled on exploring the Confidential Compute (CoCo) project.

After trying investing quite a bit of time digging in to and understanding Confidential Computing, I finally feel that I am at a position to have a good understanding of what it takes to run this technology. It has taken me some time and effort to get to this point, partially because many of the techonologies I was learning about were still in flux. However, now it feels like we’ve reached a point where these technologies are becoming more mainstream. For me, grasping the underlying workings of these technologies was crucial to determining their suitability for real-world projects.

It’s worth noting that confidential computing isn’t untested; Microsoft, for instance, extensively uses it as part of their “Secure Future Initiative” (SFI). This initiative aims to bolster cybersecurity protection by leveraging various pillars, including AI-based cyber defenses and advancements in software engineering. If you’ve been following the news as of late, this shouldn’t really come as a surprise..

📖 “As part of this initiative, we also will migrate to a new and fully automated consumer and enterprise key management system with an architecture designed to ensure that keys remain inaccessible even when underlying processes may be compromised. This will build upon our confidential computing architecture and the use of hardware security modules (HSMs) that store and protect keys in hardware and that encrypts data at rest, in transit, and during computation.” - Secure Future Initiative

New to Confidential Computing?

Confidential computing solutions center around “attestable”, “hardware-based Trusted Execution Environments” (TEEs). These TEEs provide isolated environments with enhanced security, preventing unauthorized access or modification of applications and data while in use. Typically, a TEE has at least provides the following three properties:

  • Data confidentiality: Data within the TEE remains inaccessible to unauthorized entities.
  • Data integrity: Unauthorized entities cannot tamper with data within the TEE.
  • Code integrity: Unauthorized entities cannot modify the code running within the TEE.

Confidential Computing primarily comes in two flavors: process-based and virtual machine-based isolation.

  • Process-based isolation: often associated with the term “enclave,” encloses the isolation boundary around each container process. While this offers precision, adapting applications to run within this boundary may require code adjustments.
  • Virtual machine-based isolation: encompasses the boundary around a virtual machine, allowing for more flexibility when migrating existing workloads into confidential computing environments. The downside of this is that you will have many more individual components that you will need to place your trust in.

💡 If you’re entirely new to Azure Confidential Computing or need a refresher, check out my previous articles, check out my previous articles on this topic:

Onboarding an application into Confidential Computing

When I have to recommend a specific technology to solve a particular problem, various factors come into play. I’ve spent considerable time as a developer and have often closely collaborated with seasoned operations personnel. Because of this, I feel I’ve gained valuable insights into what matters to people in both of these categories. When I transitioned into my role as an Azure consultant, which predominantly involves Azure/Cloud Solutions Architecture, I’ve learned the importance of balancing the interests of both business and technical stakeholders. In large enterprises, finding pragmatic solutions often outweighs the pursuit of the “prettiest” one. While I appreciate elegant solutions when they are available, my priority lies in finding options that work effectively for everyone involved.

Let’s consider a hypothetical scenario involving a company, Foo, who is heavily reliant on a particular revenue-generating application. This application might be a SAAS solution sold directly to customers or part of a software suite powering specific business processes. Suddenly, Foo faces several challenges:

  • New regulatory requirements from government bodies necessitate stringent compliance measures to avoid potential fines.
  • Suddenly, the need to enhance security arises, likely driven by the same regulatory obligations. This leads Foo to explore Confidential Computing as a promising solution.
  • Foo should be able to verify that the application runs in the intended state.
  • The objective is to seamlessly transition the application into a containerized environment with those enhanced security measures, ideally with minimal or no adjustments to the app.
  • To make matters worse, time is of the essence, requiring a swift implementation to meet deadlines.
    • This also means ensuring that individuals who may be less familiar with confidential computing are not left behind.

To make an informed decision, it’s essential to determine the size of your trusted computing base (TCB) relative to regulatory requirements. The TCB encompasses all hardware, firmware, and software components ensuring a secure environment. A smaller TCB translates to higher security, reducing exposure to vulnerabilities, malware, attacks, and malicious entities.

🔥 The components inside the TCB are considered “critical”. If one component inside the TCB is compromised, the entire system’s security may be jeopardized.

Here’s an overview of where you would place your trust, depending the Confidential Computing solution:

Without CCCC w/ Intel SGXCC w/ CVMs (SEV-SNP/TDX)
Infrastructure Owner
Hardware
BIOS, Device Drivers
Hypervisor
Host OS
VM Guest admins
Guest OS
Shared libraries
Application

Partner solutions

Since I had some free time, I decided to explore the “Partner solutions” listed on Microsoft docs. These solutions, primarily used for running existing containerized applications, typically support containers using a Linux base image.

However, I found it cumbersome to quickly try them out; simple things like accessing documentation or binaries often required jumping through several hoops. Sometimes, I’d have to book a call with one of the partner’s sales representatives, while other times, I had to sign up for a GitLab account and wait for my account to be verified. While these processes are beneficial for generating leads and funding the development of these projects, they resulted in a less-than-pleasant experience for me as a potential user.

🤔 After giving this some additional thought, I am not entirely convinced that running an existing application, like the one from my scenario, in a containerized SGX-powered wrapper is a wise idea. Let me explain.

The solutions I’ve encountered typically employ something akin to a library OS, which is an operating system implemented as a library. This setup allows your application to make Linux system calls as usual. However, behind the scenes, the library OS intercepts those calls and executes its own implementation of the syscall. Running the syscall inside an SGX enclave is not always possible or does not always behave the same as against a ‘normal’ Linux kernel, which is why they have their own specific implementation that works inside of SGX. For instance, working with networking syscalls within SGX is quite tricky and often requires unconventional methods. So, unless you are prepared to invest the effort to port and extensively test your application under various constraints, it appears uncertain whether an existing (legacy) app will behave as expected.

Unfortunately, none of these solutions seemed to fit my fictional, yet-not-so-entirely-uncommon scenario. Since I had limited time to experiment, I concluded after a few days of testing that I should explore an entirely different solution… Though the solution would have to offer similar protections to what SGX offers, while taking away the complexities of SGX.

This would also imply that unless our application demanded the most conservative type of trusted computing base (for example, if we were building a key management service or similar), we would need to accept a TCB that is considerably larger than what SGX would have provided.

AKS with CVMs: almost good enough

In this scenario, it seemed that the only viable path forward would be to use Azure Confidential VMs and allocate them to a separate Kubernetes worker node pool. Currently, Azure supports both AMD SEV-SNP and Intel TDX based confidential VM SKUs.

⚠️ Picking an SKU, such as the DCsc3 and DCdsv3 series, which only supports Intel SGX does not give Confidential VM capabilities.

Choosing this approach would give us the following, fairly standard AKS setup: Image of an AKS setup, with many details omitted for brevity. Inside the cluster, there is a control plane and two nodes. Each node has four pods, with any number of containers in them.

Though if we think about this critically for a moment:

  • We get runtime encryption for our nodes, but this only provides limited security benefits as it only prevents direct access to a node’s memory.
  • The Kubernetes control plane nodes are not part of our trusted computing base, since Microsoft manages them.
    • Those nodes have the ability to control other nodes through the kubelet that gets installed, and the DaemonSets that are running on the CVM nodes.

Let’s consider the trust model when running a CVM for a moment; this means that you are placing trust in everything that is running inside of the CVM’s guest OS.. Many of these items aren’t necessarily considered to be trusted by default.. In the context of a Kubernetes cluster, all of these components put together makes for a rather large TCB. I began wondering if there is some way that we can do better?

💡 If you’re fine with having to deal with the guest OS in your TCB; another option is to completely self-host our Kubernetes setup, with all CVMs. In the past, this was achievable using Azure Kubernetes Engine (AKS Engine), but it has since been deprecated.

The recommended approach now is to manually create the Kubernetes cluster using tools like kubeadm, and then integrate the Kubernetes Cluster API Provider for Azure. This provider streamlines the cluster lifecycle management across various infrastructure providers, so you’re left with a more flexible and customizable cluster as a result.

Introducing CoCo: Confidential Containers

CoCo is a Cloud Native Computing Foundation project, that designed to enable what their maintainers call “cloud-native confidential computing” and works with a variety of confidential computing hardware platforms and technologies to protect data in use.

📖 “The goal of the CoCo project is to standardize confidential computing at the pod level and simplify its consumption in Kubernetes. This enables Kubernetes users to deploy confidential container workloads using familiar workflows and tools without extensive knowledge of the underlying confidential computing technologies.” - confidentialcontainers.org

The project can leverage multiple hardware trusted execution environments such as: Intel SGX, Intel TDX, AMD SEV/SNP, and IBM Z Secure Execution. It is being backed and developed by multiple software and hardware companies: Alibaba-cloud, AMD, ARM, IBM, Intel, Microsoft, Red Hat, Rivos and others. The development of CoCo itself takes a use cases-based approach, instead of feature-based.

💡 CoCo is, at the time of writing, a CNCF sandbox maturity level project. This means that it is an experimental project, which may not yet widely tested in production.

How CoCo works

At its core, CoCo simplifies the deployment of confidential container workloads in Kubernetes while ensuring cryptographic verifiability. It comes with additional bells and whistles to make the entire thing attestable using cryptographic proofs. This way CoCo guarantees that software runs without tampering and can even prevent the workloads from starting up, when it detects any discrepancies.

CoCo provides a Kubernetes operator for deployment and configuration, abstracting away hardware-specific complexities. This operator establishes a set of runtime classes enabling pod deployment within an enclave across various platforms. CoCo typically operates with signed and/or encrypted container images, which are securely pulled, verified, and decrypted inside the enclave. Secrets, such as image decryption keys, are conditionally provisioned to the enclave by a trusted Key Broker Service, validating the hardware evidence of the TEE before releasing sensitive information.

Diagram of the CoCo architecture. The infrastructure can be hosted on-premises or in any public cloud, as long as it has a confidential compute capable hardware and hypervisor. On top of this hardware and hypervisor, we can run the VM-based or process-based variant of CoCo. Both variants come with the Enclave Software Stack, it also referenced as the Confidential Container Stack in many different documentation. The Confidential Computing services block contains a number of services which are required for creating a holistic CC platform which the customer can then use to build their solution, these are: Image registry, Image build service, Attestation service, Key Broker System, Key Management System and potential future  services that have yet to be deteremined.

The cool thing is that Kubernetes control plane is outside the TCB, which makes CoCo suitable for managed environments like AKS. This really aligns with what I’m looking for as it often difficult enough to wrap your head around all the multitude of different confidential computing concepts.

CoCo operation models

CoCo can be used to enables the use of confidential container-enabled software in two specific ways:

  • VM-based TEEs: memory encryption occurs along a utility VM (UVM) boundary atop a VMM. This is achievable by using technologies such as AMD SEV/SNP, Intel TDX, and IBM Secure Execution.
    • This approach utilizes the Kata containers runtime container runtime and kata-agent, with the agent forming part of the TCB by running inside the TEE.
  • Process-based TEEs The process is split into trusted and untrusted components. The trusted component operates in encrypted memory, managing confidential computing, while the untrusted component interfaces with the OS and manages I/O from encrypted memory to the system.
    • Intel SGX is the technology which powers process-based TEE.
    • This model operates without a utility VM and instead runs on the untrusted guest.
    • According to the design documentation, rune serves as the container runtime for creating and executing enclaves, while runelet acts as the containerized process and communicates with a “LibOS” like Occlum, Gramine, or WebAssembly Micro Runtime (WAMR).

If we work CoCo into our “where-do-we-place-our-trust” table, I think it will look a little like this. This is sort of an assumption on my end, I’ve based this on the Microsoft documentation. :

Process-based TEEVM-based TEE
Infrastructure Operator
Hardware
BIOS, Device Drivers
Azure Hypervisor
Host OS
VM Guest admins
K8S Control Plane
Guest OS
Guest Shared libraries
Guest Kubernetes components
Guest Application
Utility VM (firmware, kernel, ..)N/A
Confidential Containers Stack
CoCo Pod w/ Containers

Although internal implementations for the two approaches differ, and may have additional moving parts, they share a set of common goals and attributes:

  • Deploy unmodified workloads.
  • Integrate natively with the Kubernetes control plane.
  • Provide an unmodified Kubernetes user and developer experience.
  • Remove cloud and infrastructure providers from the guest application Trusted Computing Base (TCB).

VM-based CoCo: moving parts

For the VM-based CoCo approach, several components play crucial roles.

  • Kata Containers:
    • Kata Containers is an open-source community focused on building a secure container runtime with lightweight virtual machines (VMs). These VMs offer enhanced workload isolation using hardware virtualization technology.
    • They enable the execution of lightweight Utility VMs to run pods and containers, utilizing QEMU or Cloud HyperVisor behind the scenes.
    • The CoCo Kubernetes (K8s) operator handles Kata binaries and configuration.
  • Cloud API Adaptor: extends the Kata container runtime, allowing the creation of peer-pods using cloud provider APIs.
  • Attestation Agent:
    • Facilitates remote attestation and secure key release by acting as a Key Broker Client (KBC) with the Key Broker Service (KBS).
    • Provides an abstraction layer over different TEE types to unify platform evidence gathering operations for other CoCo components.
  • Confidential Data Hub:
    • A modular component offering a RESTful API endpoint for other components like the kata-agent and application container.
    • Supports access to Key Management Service, Key Broker Service, and Key Vault Service.
    • Facilitates user-insensitive unsealing of K8s sealed secrets and serves as a trusted storage service for encrypted data.

Diagram showing the architecture of VM-based CoCo components, including Kata Containers, Cloud API Adaptor, Attestation Agent, and Confidential Data Hub.

I think it’s very cool to see Kata Containers being utilized this way. Back in 2019 I briefly mentioned it in my “Kubernetes in a Microsoft World” blog-series, five years ago. I haven’t kept track of it since but it’s great to see the progress that has been made.

💡 While some components overlap with the process-based approach, there are a few additional components that come in to play. If you’d like to learn more, take a look at its design document for more details.

VM-based CoCo

CoCo offers some flexibility in deployment options, whether that is on-premises (bare metal, VMware, etc.) or on public clouds (AWS, Azure, GCP, etc.).

💡 It can even be run without the need for confidential compute hardware, this makes it suitable for dev environments. This is made possible by hacking together a fork of containerd, which in turn, enables you to create a pod with encrypted image support, without the requirement of confidential hardware. Perhaps not ideal, but such an approach allows developers to become familiar with the fundamental concepts of CoCo and the high-level details of its implementation.

Confidential Nested VMs

Running Kata Containers on Azure is made possible by the Pod Sandboxing feature that is available for Azure Kubernetes Service (AKS).

Pod Sandboxing provides an isolation boundary between container applications and the shared kernel and compute resources of the container host. Through the use of nested virtualization, this mechanism helps secure and protect container workloads from untrusted or potentially malicious code by isolating them within their own sandboxed environment. It ensures that containerized applications operate securely, without interfering with other workloads on the same host, and mitigates risks associated with shared resources such as CPU, memory, and networking.

Diagram showing a Kubernetes Control Plane and an agent node. The agent node is a confidential VM that has a nested confidential child VM is tied to the pod’s lifecycle and runs the CoCo software stack.

Needless to say, this model utilizes very specific Azure Confidential Computing VM sizes.. More specifically, it makes use of the new AMD SEV-SNP Confidential Child VM SKUs, which also support nested virtualization. These VM SKUs can only be deployed to an AKS node pool and come in two flavours:

Be careful though, because these VM SKUs are currently in preview and some features are not yet available.

🔥 Previews are provided “as is” and “as available,” and they’re excluded from the service-level agreements and limited warranty. AKS previews are partially covered by customer support on a best-effort basis. As such, these features aren’t meant for production use.

Peer-Pods / Cloud API Adaptor

As a plus, CoCo can be run in (virtual) environments that don’t support nested virtualization with the help of an extension for the Kata runtime, called the “Cloud API adaptor” that starts Kata Containers VMs on any machines without the need for bare metal worker nodes, or nested virtualisation support. This means that CoCo allows users to move workloads on infrastructure owned by third parties, be it cloud providers or others; as long as they provide confidential compute hardware.

This model involves spinning up a seperate Confidential VM in the cloud and designating it as a pod, which offers the benefits added isolation without nested virtualization. While it may incur a slightly higher Azure bill, the performance gains and scalability can make this an attractive option.

Diagram showing a Kubernetes Control Plane and an agent node. The agent node is a confidential VM receives the instruction to spawn a new pod, but the kata runtime will tell the remote hypervisor, using the cloud API adaptor to provision a new confidential virtual machine in the public cloud environment instead of creating a nested virtual machine. This new confidential VM is tied to the pod’s lifecycle and runs the CoCo software stack.

Security policy for Confidential Containers

Policy decisions within Confidential Containers are enforced by the Kata agent, which runs inside the TEE boundary, using the Open Policy Agent. Running within the Utility VM (UVM) in the hardware-based TEE, the Kata agent is a crucial part of the Trusted Computing Base. It provides ttrpc APIs for managing CVM-based Kubernetes pods, keeping the pod implementation transparent to the rest of your Kubernetes stack.

Communication between the kata-agent and the containerd-shim-kata-v2 is a critical control channel, one that crosses the TCB boundary. In order to maintain security, the agent protects itself from untrusted API calls. This protection is implemented with a security policy specified by the confidential pod owners. Each CVM-based pod is annotated with a policy document in the Rego policy language and the Open Policy Agent enforces these policies within the UVM environment, which ensures that only authorized actions occur within the enclave.

Container creation can be rejected by the kata-agent’s policy enforcement logic whenever a command line, storage mount, execution security context, or even an environment variable violates the given rules.

Container Image Snapshotter

In the baseline Confidential Containers stack, container images are typically fetched from within the Utility VM (UVM). However, this approach poses challenges for resource utilization and confidentiality. Storing container image layers in the UVM’s memory-backed local filesystem can strain UVM memory limits and increases the size of the Trusted Computing Base (TCB). We don’t want that.

📖 From Microsoft’s blog post: “The container image snapshotter feature goes hand in hand with security policy. The genpolicy tool downloads the container image layers for each of the containers specified by the input Kubernetes pod manifest and calculates the dm-verity root hash value for each of the layers. This way, each mapped container image layer becomes part of the TCB measurement.”

To tackle those issues, the tardev-snapshotter was introduced to the Confidential Containers stack. This tool retrieves, ideally encrypted and signed, container image layers for pods on the container host, located outside the TCB. As a result, these container image layers can now be shared among pods, which increases resource efficiency and reduces the TCB’s software size.

Each container layer is exposed as a read-only virtio block device to the respective UVM(s). They are protected devices using the dm-verity technology of the Linux kernel. For each container image layer, we include the expected root hash of the dm-verity hash tree inside the policy document and this policy is enforced at runtime by the Kata Agent. The agent uses a tarfs kernel module to mount the dm-verity target block devices as tarfs filesystem mounts, which in turn, provides the container filesystem.

Attestation

Attestation stands as one of the pivotal aspects of CoCo and really deserves an entire blog post dedicated solely to all of its intricacies. Fortunately, Pradipta Banerjee and Samuel Ortiz have done an excellent job explaining this concept in their blog post. I highly recommend diving into their detailed explanation for a deeper understanding!

💡 Attestation in the context of CoCo refers to the process of verifying the integrity and trustworthiness of the environment where a confidential container workload is running. This is achieved through cryptographic proofs and validation mechanisms to ensure that the software executes without unauthorized tampering or interference.

AKS Specifics

On Azure, all of these advancements are driven by Azure Linux.

📖 Microsoft elaborated on how Azure Linux is powering Confidential Containers on AKS in this interesting blog post.

According to the Azure documentation, the resource allocations currently used are as follows:

  • CPU: The shim assigns one vCPU to the base OS inside the pod. If no resource limits are specified, the workloads don’t have separate CPU shares assigned, the vCPU is then shared with that workload. If CPU limits are specified, CPU shares are explicitly allocated for workloads.
  • Memory: The Kata-CC handler uses 2 GB memory for the UVM OS and X MB additional memory where X is the resource limits if specified in the YAML manifest (resulting in a 2-GB VM when no limit is given, without implicit memory for containers). The Kata handler uses 256 MB base memory for the UVM OS and X MB additional memory when resource limits are specified in the YAML manifest. If limits are unspecified, an implicit limit of 1,792 MB is added resulting in a 2 GB VM and 1,792 MB implicit memory for containers.

In this release, specifying resource requests in the pod manifests aren’t supported. The Kata container ignores resource requests from pod YAML manifest, and as a result, containerd doesn’t pass the requests to the shim. Use resource limit instead of resource requests to allocate memory or CPU resources for workloads or containers.

Amongst other things, the Azure Linux team plans to expand TEE support to Intel TDX and confidential GPUs, invest in ways to reduce any performance gap between runc pods and confidential pods.

Microsoft’s Demo

Microsoft offers a comprehensive demo that guides you through the process of onboarding an existing container image into Confidential Containers. While I won’t duplicate the demo here, I’ll provide an overview of what it covers. Currently, it skips over the encryption of the container image and uses two publicly available images that are then signed.

In essence, the demo includes the following steps:

  • Install the confcom extension for az CLI.
  • Register the KataCcIsolationPreview feature flag
  • Create an AKS cluster
  • Create a new node pool specifically with Azure Linux as the OS and the Confidential Child VM SKU
  • Enable workload identity
    • Includes creating a user-assigned managed identity
  • Create an Azure Key Vault Premium
  • Granting Key Vault Crypto Officer and Key Vault Crypto User to the Service Principal Name (SPN) responsible for the deployment and the user-assigned managed identity.
  • Install Kafka pre-req resources.
  • Generate the security policy for the Kafka consumer
  • Generate an RSA asymmetric key pair in Key Vault
    • Will be used for Secure Key Release
  • Deploy the kafka consumer and producer

The most crucial component here is the confcom extension for the az CLI. This tool is essential for creating your security policy. The output it generates is essentially a large policy file, using the Open Policy Agent format (rego), which then gets inserted into your YAML file’s annotations under the io.katacontainers.config.agent.policy key. It is based on genpolicy. According to its maintainers, “the policy auto-generated by genpolicy is typically used for implementing confidential containers, where the Kata Shim and the Kata Agent have different trust properties.”

katapolicygen

The az confcom katapolicygen command is crucial for generating the correct security policy for the pod, particularly when working with CoCo. This tool allows customers to generate a policy document based on their standard Kubernetes pod manifest and annotate the document to the manifest. The resulting policy document describes all the expected calls to the agent’s ttrpc API for creating and managing the respective pod.

💡 Is this not working for you on macOS? Try spinning up an az cli container, that should do the trick.

When the katapolicygen command is run, it downloads every container image manifest, config, and layer, and calculates a dm-verity root hash. This hash is crucial for ensuring that if anyone tampers with one of those image layers or when the layers are mounted as storage devices in the UVM, the kernel will detect it.

Putting the Snapshotter to work

The container image snapshotter feature complements the security policy generation. The genpolicy tool downloads the container image layers for each container specified in the input Kubernetes pod manifest and calculates the dm-verity root hash value for each layer. This ensures that each mapped container image layer becomes part of the Trusted Computing Base measurement.

To get more insight into what katapolicygen is doing behind the scenes, you can set the following variable:

export RUST_LOG=info

az confcom katapolicygen -y consumer.yaml --print-policy
# [2024-03-18T16:10:40Z INFO  genpolicy::registry] ============================================
# [2024-03-18T16:10:40Z INFO  genpolicy::registry] Pulling manifest and config for "mcr.microsoft.com/aci/skr:2.7"
# [2024-03-18T16:10:41Z INFO  genpolicy::registry] Pulling layer "sha256:96526aa774ef0126ad0fe9e9a95764c5fc37f409ab9e97021e7b4775d82bf6fa"
# [2024-03-18T16:10:41Z INFO  genpolicy::registry] Decompressing layer
# [2024-03-18T16:10:42Z INFO  genpolicy::registry] Adding tarfs index to layer
# [2024-03-18T16:10:42Z INFO  genpolicy::registry] Calculating dm-verity root hash
# [2024-03-18T16:10:42Z INFO  genpolicy::registry] dm-verity root hash: ad8468ff2a4197e09f0177d9b0852fa31a8164920dada2da7e1fab449dcfd9f1
# [2024-03-18T16:10:42Z INFO  genpolicy::registry] Pulling layer "sha256:77b59fbbbbac3c4fcce2da23767cfccbee566a86ee91eace5fc42c9e15f488c1"
# [2024-03-18T16:10:43Z INFO  genpolicy::registry] Decompressing layer
# [2024-03-18T16:10:44Z INFO  genpolicy::registry] Adding tarfs index to layer
# Skipping symlink with long link name (ca-cert-NetLock_Arany_=Class_Gold=_Főtanúsítvány.pem, 56 bytes, ca-cert-NetLock_Arany_=Class_Gold=_Ftanstvny.pem, 48 bytes): etc/ssl/certs/988a38cb.0
# Skipping symlink with long link name (/usr/share/ca-certificates/mozilla/NetLock_Arany_=Class_Gold=_Főtanúsítvány.crt, 83 bytes, /usr/share/ca-certificates/mozilla/NetLock_Arany_=Class_Gold=_Ftanstvny.crt, 75 bytes): etc/ssl/certs/# ca-cert-NetLock_Arany_=Class_Gold=_Főtanúsítvány.pem
# [2024-03-18T16:10:44Z INFO  genpolicy::registry] Calculating dm-verity root hash
# [2024-03-18T16:10:44Z INFO  genpolicy::registry] dm-verity root hash: 470de2937d9ecb0ac67492f6b4566e618f02699b812e68af1127e0b0a376fcc8
# [2024-03-18T16:10:44Z INFO  genpolicy::registry] Pulling layer "sha256:5e49d2f6bca1960f7aa30e4bf3690324acbdecab1685ba647550a9b2404fb95c"
# [2024-03-18T16:10:45Z INFO  genpolicy::registry] Decompressing layer
# [2024-03-18T16:10:46Z INFO  genpolicy::registry] Adding tarfs index to layer
# [2024-03-18T16:10:46Z INFO  genpolicy::registry] Calculating dm-verity root hash
# [2024-03-18T16:10:46Z INFO  genpolicy::registry] dm-verity root hash: d2c2a5da674ff266e7be0e293dd7fdbd2d1e347d963f4a3fa4228d6c8f27e08d
# [2024-03-18T16:10:46Z INFO  genpolicy::registry] Pulling layer "sha256:938aa74c23fd51453fd513d11e69971d82b9f6d4798707cac62525009634e442"
# [2024-03-18T16:10:46Z INFO  genpolicy::registry] Decompressing layer
# [2024-03-18T16:10:46Z INFO  genpolicy::registry] Adding tarfs index to layer
# [2024-03-18T16:10:46Z INFO  genpolicy::registry] Calculating dm-verity root hash
# [2024-03-18T16:10:46Z INFO  genpolicy::registry] dm-verity root hash: b80aed9d438f979b1605f3435409f8e1d7ebe18796eada78707e429310a3ed68
# [2024-03-18T16:10:46Z INFO  genpolicy::registry] Pulling layer "sha256:4feb8f688feded5d67ae91c49cde5653bd40b046ce4aa93f1832af9a4f2cd93b"
# [2024-03-18T16:10:47Z INFO  genpolicy::registry] Decompressing layer
# [2024-03-18T16:10:47Z INFO  genpolicy::registry] Adding tarfs index to layer
# [2024-03-18T16:10:47Z INFO  genpolicy::registry] Calculating dm-verity root hash
# [2024-03-18T16:10:47Z INFO  genpolicy::registry] dm-verity root hash: ea0c434828d8018a9eb48d32de2d808172aced5936d4f7b9f1448870876ca9d8
# [2024-03-18T16:10:47Z INFO  genpolicy::registry] Pulling layer "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1"
# [2024-03-18T16:10:47Z INFO  genpolicy::registry] Decompressing layer
# [2024-03-18T16:10:47Z INFO  genpolicy::registry] Adding tarfs index to layer
# [2024-03-18T16:10:47Z INFO  genpolicy::registry] Calculating dm-verity root hash
# [2024-03-18T16:10:47Z INFO  genpolicy::registry] dm-verity root hash: 67450082ab56da1aecc5eae2f18d980cd9e7306e79334a1a826a91cfd90114a8
# [2024-03-18T16:10:47Z INFO  genpolicy::registry] ============================================
# [2024-03-18T16:10:47Z INFO  genpolicy::registry] Pulling manifest and config for "mcr.microsoft.com/acc/samples/kafka/consumer:1.0"
# [2024-03-18T16:10:47Z INFO  genpolicy::registry] Pulling layer "sha256:21a156041ee4ff1bec22cbba2c7bd43978dd30536cd54a5a03b5f2be90d83dc2"
# [2024-03-18T16:10:48Z INFO  genpolicy::registry] Decompressing layer
# [2024-03-18T16:10:48Z INFO  genpolicy::registry] Adding tarfs index to layer
# [2024-03-18T16:10:49Z INFO  genpolicy::registry] Calculating dm-verity root hash
# [2024-03-18T16:10:49Z INFO  genpolicy::registry] dm-verity root hash: 74492a982cb86f16c0989b0d66cc420aa908c3fa8e23effcf2c0b4ace15f4fda
# [2024-03-18T16:10:49Z INFO  genpolicy::registry] Pulling layer "sha256:5c6e3aa84dc0c19f4bc4e7b614abdf3ba0f1957b40a9e98b0385ef4964e5f346"
# [2024-03-18T16:10:49Z INFO  genpolicy::registry] Decompressing layer
# [2024-03-18T16:10:49Z INFO  genpolicy::registry] Adding tarfs index to layer
# [2024-03-18T16:10:49Z INFO  genpolicy::registry] Calculating dm-verity root hash
# [2024-03-18T16:10:49Z INFO  genpolicy::registry] dm-verity root hash: 538551b312caa78e93e8657f55ea45756665bd3c39ec420174327645d80d77fa
# [2024-03-18T16:10:49Z INFO  genpolicy::registry] Pulling layer "sha256:9e3a3a4368e4b2991adc9501f420f103db232c857b050c084e3e02da07c42dfd"
# [2024-03-18T16:10:50Z INFO  genpolicy::registry] Decompressing layer
# [2024-03-18T16:10:50Z INFO  genpolicy::registry] Adding tarfs index to layer
# [2024-03-18T16:10:50Z INFO  genpolicy::registry] Calculating dm-verity root hash
# [2024-03-18T16:10:50Z INFO  genpolicy::registry] dm-verity root hash: 3673e91db0aa540e01fab0ad590dbbc6dafb660ea27224742d7d9540afcb0fa0
# [2024-03-18T16:10:50Z INFO  genpolicy::registry] Pulling layer "sha256:a73d643d7f5c3ec02c4f0bd96264f26362fcb8d570ec3c5422c54581351d3203"
# [2024-03-18T16:10:50Z INFO  genpolicy::registry] Decompressing layer
# [2024-03-18T16:10:50Z INFO  genpolicy::registry] Adding tarfs index to layer
# [2024-03-18T16:10:50Z INFO  genpolicy::registry] Calculating dm-verity root hash
# [2024-03-18T16:10:50Z INFO  genpolicy::registry] dm-verity root hash: 82617c6193aae87cc0f6c94dc058c05d1363afc0b5814db7304b39abe4124489
# [2024-03-18T16:10:50Z INFO  genpolicy::registry] Pulling layer "sha256:3e834e9993cef6624832ec2f6259092cfd009e3cf3f5a407834a47c2fec828d5"
# [2024-03-18T16:10:51Z INFO  genpolicy::registry] Decompressing layer
# [2024-03-18T16:10:52Z INFO  genpolicy::registry] Adding tarfs index to layer
# [2024-03-18T16:10:52Z INFO  genpolicy::registry] Calculating dm-verity root hash
# [2024-03-18T16:10:52Z INFO  genpolicy::registry] dm-verity root hash: 922ee7161ab2710c7899dd954addebd473e3ccb6773f9690f80a8e4f0b6dc4c2
# [2024-03-18T16:10:52Z INFO  genpolicy::registry] ============================================
# [2024-03-18T16:10:52Z INFO  genpolicy::registry] Pulling manifest and config for "mcr.microsoft.com/oss/kubernetes/pause:3.6"
# [2024-03-18T16:10:52Z INFO  genpolicy::registry] Pulling layer "sha256:5720cd9c19ca69b58202945924c37c9bd7b287ce1a88882098fd59a4292e7cd9"
# [2024-03-18T16:10:52Z INFO  genpolicy::registry] Decompressing layer
# [2024-03-18T16:10:52Z INFO  genpolicy::registry] Adding tarfs index to layer
# [2024-03-18T16:10:52Z INFO  genpolicy::registry] Calculating dm-verity root hash
# [2024-03-18T16:10:52Z INFO  genpolicy::registry] dm-verity root hash: 817250f1a3e336da76f5bd3fa784e1b26d959b9c131876815ba2604048b70c18

This command will provide a detailed log of the katapolicygen process. It gives us some good insights into the security policy generation and its integration with the container image snapshotter feature.

Once katapolicygen is done, our YAML file will be updated to include the entire rego policy, in base64. Here’s an example of how it might look:

---
apiVersion: v1
kind: Pod
metadata:
  name: kafka-golang-consumer
  namespace: kafka
  labels:
    azure.workload.identity/use: "true"
    app.kubernetes.io/name: kafka-golang-consumer
  annotations:
    io.katacontainers.config.agent.policy: package agent_policy

import future.keywords.in
import future.keywords.every

import input

# Default values, returned by OPA when rules cannot be evaluated to true.
default CopyFileRequest := false
default CreateContainerRequest := false
default CreateSandboxRequest := false
default DestroySandboxRequest := true
default ExecProcessRequest := false
default GetOOMEventRequest := true
default GuestDetailsRequest := true
default OnlineCPUMemRequest := true
default PullImageRequest := true
default ReadStreamRequest := false
default RemoveContainerRequest := true
default RemoveStaleVirtiofsShareMountsRequest := true
default SignalProcessRequest := true
default StartContainerRequest := true
default StatsContainerRequest := true
default TtyWinResizeRequest := true
default UpdateEphemeralMountsRequest := false
default UpdateInterfaceRequest := true
default UpdateRoutesRequest := true
default WaitProcessRequest := true
default WriteStreamRequest := false

# AllowRequestsFailingPolicy := true configures the Agent to *allow any
# requests causing a policy failure*. This is an unsecure configuration
# but is useful for allowing unsecure pods to start, then connect to
# them and inspect OPA logs for the root cause of a failure.
default AllowRequestsFailingPolicy := false

CreateContainerRequest {
    i_oci := input.OCI
    i_storages := input.storages

    print("CreateContainerRequest: i_oci.Hooks =", i_oci.Hooks)
    is_null(i_oci.Hooks)

    some p_container in policy_data.containers
    print("======== CreateContainerRequest: trying next policy container")

    p_oci := p_container.OCI
    p_storages := p_container.storages

    print("CreateContainerRequest: p Version =", p_oci.Version, "i Version =", i_oci.Version)
    p_oci.Version == i_oci.Version

    print("CreateContainerRequest: p Readonly =", p_oci.Root.Readonly, "i Readonly =", i_oci.Root.Readonly)
    p_oci.Root.Readonly == i_oci.Root.Readonly

    allow_anno(p_oci, i_oci)
    allow_by_anno(p_oci, i_oci, p_storages, i_storages)
    allow_linux(p_oci, i_oci)

    print("CreateContainerRequest: true")
}

# Reject unexpected annotations.
allow_anno(p_oci, i_oci) {
    print("allow_anno 1: start")

    not i_oci.Annotations

    print("allow_anno 1: true")
}
allow_anno(p_oci, i_oci) {
    print("allow_anno 2: p Annotations =", p_oci.Annotations)
    print("allow_anno 2: i Annotations =", i_oci.Annotations)

    i_keys := object.keys(i_oci.Annotations)
    print("allow_anno 2: i keys =", i_keys)

    every i_key in i_keys {
        allow_anno_key(i_key, p_oci)
    }

    print("allow_anno 2: true")
}

allow_anno_key(i_key, p_oci) {
    print("allow_anno_key 1: i key =", i_key)

    startswith(i_key, "io.kubernetes.cri.")

    print("allow_anno_key 1: true")
}
allow_anno_key(i_key, p_oci) {
    print("allow_anno_key 2: i key =", i_key)

    some p_key, _ in p_oci.Annotations
    p_key == i_key

    print("allow_anno_key 2: true")
}

# Get the value of the "io.kubernetes.cri.sandbox-name" annotation and
# correlate it with other annotations and process fields.
allow_by_anno(p_oci, i_oci, p_storages, i_storages) {
    print("allow_by_anno 1: start")

    s_name := "io.kubernetes.cri.sandbox-name"

    not p_oci.Annotations[s_name]

    i_s_name := i_oci.Annotations[s_name]
    print("allow_by_anno 1: i_s_name =", i_s_name)

    allow_by_sandbox_name(p_oci, i_oci, p_storages, i_storages, i_s_name)

    print("allow_by_anno 1: true")
}
allow_by_anno(p_oci, i_oci, p_storages, i_storages) {
    print("allow_by_anno 2: start")

    s_name := "io.kubernetes.cri.sandbox-name"

    p_s_name := p_oci.Annotations[s_name]
    i_s_name := i_oci.Annotations[s_name]
    print("allow_by_anno 2: i_s_name =", i_s_name, "p_s_name =", p_s_name)

    allow_sandbox_name(p_s_name, i_s_name)
    allow_by_sandbox_name(p_oci, i_oci, p_storages, i_storages, i_s_name)

    print("allow_by_anno 2: true")
}

allow_by_sandbox_name(p_oci, i_oci, p_storages, i_storages, s_name) {
    print("allow_by_sandbox_name: start")

    s_namespace := "io.kubernetes.cri.sandbox-namespace"

    p_namespace := p_oci.Annotations[s_namespace]
    i_namespace := i_oci.Annotations[s_namespace]
    print("allow_by_sandbox_name: p_namespace =", p_namespace, "i_namespace =", i_namespace)
    p_namespace == i_namespace

    allow_by_container_types(p_oci, i_oci, s_name, p_namespace)
    allow_by_bundle_or_sandbox_id(p_oci, i_oci, p_storages, i_storages)
    allow_process(p_oci, i_oci, s_name)

    print("allow_by_sandbox_name: true")
}

allow_sandbox_name(p_s_name, i_s_name) {
    print("allow_sandbox_name 1: start")

    p_s_name == i_s_name

    print("allow_sandbox_name 1: true")
}
allow_sandbox_name(p_s_name, i_s_name) {
    print("allow_sandbox_name 2: start")

    # TODO: should generated names be handled differently?
    contains(p_s_name, "$(generated-name)")

    print("allow_sandbox_name 2: true")
}

# Check that the "io.kubernetes.cri.container-type" and
# "io.katacontainers.pkg.oci.container_type" annotations designate the
# expected type - either a "sandbox" or a "container". Then, validate
# other annotations based on the actual "sandbox" or "container" value
# from the input container.
allow_by_container_types(p_oci, i_oci, s_name, s_namespace) {
    print("allow_by_container_types: checking io.kubernetes.cri.container-type")

    c_type := "io.kubernetes.cri.container-type"
    
    p_cri_type := p_oci.Annotations[c_type]
    i_cri_type := i_oci.Annotations[c_type]
    print("allow_by_container_types: p_cri_type =", p_cri_type, "i_cri_type =", i_cri_type)
    p_cri_type == i_cri_type

    allow_by_container_type(i_cri_type, p_oci, i_oci, s_name, s_namespace)

    print("allow_by_container_types: true")
}

allow_by_container_type(i_cri_type, p_oci, i_oci, s_name, s_namespace) {
    print("allow_by_container_type 1: i_cri_type =", i_cri_type)
    i_cri_type == "sandbox"

    i_kata_type := i_oci.Annotations["io.katacontainers.pkg.oci.container_type"]
    print("allow_by_container_type 1: i_kata_type =", i_kata_type)
    i_kata_type == "pod_sandbox"

    allow_sandbox_container_name(p_oci, i_oci)
    allow_sandbox_net_namespace(p_oci, i_oci)
    allow_sandbox_log_directory(p_oci, i_oci, s_name, s_namespace)

    print("allow_by_container_type 1: true")
}

allow_by_container_type(i_cri_type, p_oci, i_oci, s_name, s_namespace) {
    print("allow_by_container_type 2: i_cri_type =", i_cri_type)
    i_cri_type == "container"

    i_kata_type := i_oci.Annotations["io.katacontainers.pkg.oci.container_type"]
    print("allow_by_container_type 2: i_kata_type =", i_kata_type)
    i_kata_type == "pod_container"

    allow_container_name(p_oci, i_oci)
    allow_net_namespace(p_oci, i_oci)
    allow_log_directory(p_oci, i_oci)

    print("allow_by_container_type 2: true")
}

# "io.kubernetes.cri.container-name" annotation
allow_sandbox_container_name(p_oci, i_oci) {
    print("allow_sandbox_container_name: start")

    container_annotation_missing(p_oci, i_oci, "io.kubernetes.cri.container-name")

    print("allow_sandbox_container_name: true")
}

allow_container_name(p_oci, i_oci) {
    print("allow_container_name: start")

    allow_container_annotation(p_oci, i_oci, "io.kubernetes.cri.container-name")

    print("allow_container_name: true")
}

container_annotation_missing(p_oci, i_oci, key) {
    print("container_annotation_missing:", key)

    not p_oci.Annotations[key]
    not i_oci.Annotations[key]

    print("container_annotation_missing: true")
}

allow_container_annotation(p_oci, i_oci, key) {
    print("allow_container_annotation: key =", key)

    p_value := p_oci.Annotations[key]
    i_value := i_oci.Annotations[key]
    print("allow_container_annotation: p_value =", p_value, "i_value =", i_value)

    p_value == i_value

    print("allow_container_annotation: true")
}

# "nerdctl/network-namespace" annotation
allow_sandbox_net_namespace(p_oci, i_oci) {
    print("allow_sandbox_net_namespace: start")

    key := "nerdctl/network-namespace"

    p_namespace := p_oci.Annotations[key]
    i_namespace := i_oci.Annotations[key]
    print("allow_sandbox_net_namespace: p_namespace =", p_namespace, "i_namespace =", i_namespace)

    regex.match(p_namespace, i_namespace)

    print("allow_sandbox_net_namespace: true")
}

allow_net_namespace(p_oci, i_oci) {
    print("allow_net_namespace: start")

    key := "nerdctl/network-namespace"

    not p_oci.Annotations[key]
    not i_oci.Annotations[key]

    print("allow_net_namespace: true")
}

# "io.kubernetes.cri.sandbox-log-directory" annotation
allow_sandbox_log_directory(p_oci, i_oci, s_name, s_namespace) {
    print("allow_sandbox_log_directory: start")

    key := "io.kubernetes.cri.sandbox-log-directory"

    p_dir := p_oci.Annotations[key]
    regex1 := replace(p_dir, "$(sandbox-name)", s_name)
    regex2 := replace(regex1, "$(sandbox-namespace)", s_namespace)
    print("allow_sandbox_log_directory: regex2 =", regex2)

    i_dir := i_oci.Annotations[key]
    print("allow_sandbox_log_directory: i_dir =", i_dir)

    regex.match(regex2, i_dir)

    print("allow_sandbox_log_directory: true")
}

allow_log_directory(p_oci, i_oci) {
    print("allow_log_directory: start")

    key := "io.kubernetes.cri.sandbox-log-directory"

    not p_oci.Annotations[key]
    not i_oci.Annotations[key]

    print("allow_log_directory: true")
}

allow_linux(p_oci, i_oci) {
    p_namespaces := p_oci.Linux.Namespaces
    print("allow_linux: p namespaces =", p_namespaces)

    i_namespaces := i_oci.Linux.Namespaces
    print("allow_linux: i namespaces =", i_namespaces)

    p_namespaces == i_namespaces

    allow_masked_paths(p_oci, i_oci)
    allow_readonly_paths(p_oci, i_oci)

    print("allow_linux: true")
}

allow_masked_paths(p_oci, i_oci) {
    p_paths := p_oci.Linux.MaskedPaths
    print("allow_masked_paths 1: p_paths =", p_paths)

    i_paths := i_oci.Linux.MaskedPaths
    print("allow_masked_paths 1: i_paths =", i_paths)

    allow_masked_paths_array(p_paths, i_paths)

    print("allow_masked_paths 1: true")
}
allow_masked_paths(p_oci, i_oci) {
    print("allow_masked_paths 2: start")

    not p_oci.Linux.MaskedPaths
    not i_oci.Linux.MaskedPaths

    print("allow_masked_paths 2: true")
}

# All the policy masked paths must be masked in the input data too.
# Input is allowed to have more masked paths than the policy.
allow_masked_paths_array(p_array, i_array) {
    every p_elem in p_array {
        allow_masked_path(p_elem, i_array)
    }
}

allow_masked_path(p_elem, i_array) {
    print("allow_masked_path: p_elem =", p_elem)

    some i_elem in i_array
    p_elem == i_elem

    print("allow_masked_path: true")
}

allow_readonly_paths(p_oci, i_oci) {
    p_paths := p_oci.Linux.ReadonlyPaths
    print("allow_readonly_paths 1: p_paths =", p_paths)

    i_paths := i_oci.Linux.ReadonlyPaths
    print("allow_readonly_paths 1: i_paths =", i_paths)

    allow_readonly_paths_array(p_paths, i_paths, i_oci.Linux.MaskedPaths)

    print("allow_readonly_paths 1: true")
}
allow_readonly_paths(p_oci, i_oci) {
    print("allow_readonly_paths 2: start")

    not p_oci.Linux.ReadonlyPaths
    not i_oci.Linux.ReadonlyPaths

    print("allow_readonly_paths 2: true")
}

# All the policy readonly paths must be either:
# - Present in the input readonly paths, or
# - Present in the input masked paths.
# Input is allowed to have more readonly paths than the policy.
allow_readonly_paths_array(p_array, i_array, masked_paths) {
    every p_elem in p_array {
        allow_readonly_path(p_elem, i_array, masked_paths)
    }
}

allow_readonly_path(p_elem, i_array, masked_paths) {
    print("allow_readonly_path 1: p_elem =", p_elem)

    some i_elem in i_array
    p_elem == i_elem

    print("allow_readonly_path 1: true")
}
allow_readonly_path(p_elem, i_array, masked_paths) {
    print("allow_readonly_path 2: p_elem =", p_elem)

    some i_masked in masked_paths
    p_elem == i_masked

    print("allow_readonly_path 2: true")
}

# Check the consistency of the input "io.katacontainers.pkg.oci.bundle_path"
# and io.kubernetes.cri.sandbox-id" values with other fields.
allow_by_bundle_or_sandbox_id(p_oci, i_oci, p_storages, i_storages) {
    print("allow_by_bundle_or_sandbox_id: start")

    bundle_path := i_oci.Annotations["io.katacontainers.pkg.oci.bundle_path"]
    bundle_id := replace(bundle_path, "/run/containerd/io.containerd.runtime.v2.task/k8s.io/", "")

    key := "io.kubernetes.cri.sandbox-id"

    p_regex := p_oci.Annotations[key]
    sandbox_id := i_oci.Annotations[key]

    print("allow_by_bundle_or_sandbox_id: sandbox_id =", sandbox_id, "regex =", p_regex)
    regex.match(p_regex, sandbox_id)

    allow_root_path(p_oci, i_oci, bundle_id)

    every i_mount in input.OCI.Mounts {
        allow_mount(p_oci, i_mount, bundle_id, sandbox_id)
    }

    allow_storages(p_storages, i_storages, bundle_id, sandbox_id)

    print("allow_by_bundle_or_sandbox_id: true")
}

allow_process(p_oci, i_oci, s_name) {
    p_process := p_oci.Process
    i_process := i_oci.Process

    print("allow_process: i terminal =", i_process.Terminal, "p terminal =", p_process.Terminal)
    p_process.Terminal == i_process.Terminal

    print("allow_process: i cwd =", i_process.Cwd, "i cwd =", p_process.Cwd)
    p_process.Cwd == i_process.Cwd

    print("allow_process: i noNewPrivileges =", i_process.NoNewPrivileges, "p noNewPrivileges =", p_process.NoNewPrivileges)
    p_process.NoNewPrivileges == i_process.NoNewPrivileges

    allow_caps(p_process.Capabilities, i_process.Capabilities)
    allow_user(p_process, i_process)
    allow_args(p_process, i_process, s_name)
    allow_env(p_process, i_process, s_name)

    print("allow_process: true")
}

allow_user(p_process, i_process) {
    p_user := p_process.User
    i_user := i_process.User

    # TODO: track down the reason for mcr.microsoft.com/oss/bitnami/redis:6.0.8 being
    #       executed with uid = 0 despite having "User": "1001" in its container image
    #       config.
    #print("allow_user: input uid =", i_user.UID, "policy uid =", p_user.UID)
    #p_user.UID == i_user.UID

    # TODO: track down the reason for registry.k8s.io/pause:3.9 being
    #       executed with gid = 0 despite having "65535:65535" in its container image
    #       config.
    #print("allow_user: input gid =", i_user.GID, "policy gid =", p_user.GID)
    #p_user.GID == i_user.GID

    # TODO: compare the additionalGids field too after computing its value
    # based on /etc/passwd and /etc/group from the container image.
}

allow_args(p_process, i_process, s_name) {
    print("allow_args 1: no args")

    not p_process.Args
    not i_process.Args

    print("allow_args 1: true")
}
allow_args(p_process, i_process, s_name) {
    print("allow_args 2: policy args =", p_process.Args)
    print("allow_args 2: input args =", i_process.Args)

    count(p_process.Args) == count(i_process.Args)

    every i, i_arg in i_process.Args {
        allow_arg(i, i_arg, p_process, s_name)
    }

    print("allow_args 2: true")
}
allow_arg(i, i_arg, p_process, s_name) {
    p_arg := p_process.Args[i]
    print("allow_arg 1: i =", i, "i_arg =", i_arg, "p_arg =", p_arg)

    p_arg2 := replace(p_arg, "$$", "$")
    p_arg2 == i_arg

    print("allow_arg 1: true")
}
allow_arg(i, i_arg, p_process, s_name) {
    p_arg := p_process.Args[i]
    print("allow_arg 2: i =", i, "i_arg =", i_arg, "p_arg =", p_arg)

    # TODO: can $(node-name) be handled better?
    contains(p_arg, "$(node-name)")

    print("allow_arg 2: true")
}
allow_arg(i, i_arg, p_process, s_name) {
    p_arg := p_process.Args[i]
    print("allow_arg 3: i =", i, "i_arg =", i_arg, "p_arg =", p_arg)

    p_arg2 := replace(p_arg, "$$", "$")
    p_arg3 := replace(p_arg2, "$(sandbox-name)", s_name)
    print("allow_arg 3: p_arg3 =", p_arg3)
    p_arg3 == i_arg

    print("allow_arg 3: true")
}

# OCI process.Env field
allow_env(p_process, i_process, s_name) {
    print("allow_env: p env =", p_process.Env)
    print("allow_env: i env =", i_process.Env)

    every i_var in i_process.Env {
        allow_var(p_process, i_process, i_var, s_name)
    }

    print("allow_env: true")
}

# Allow input env variables that are present in the policy data too.
allow_var(p_process, i_process, i_var, s_name) {
    print("allow_var 1: i_var =", i_var)

    some p_var in p_process.Env
    p_var == i_var

    print("allow_var 1: true")
}

# Match input with one of the policy variables, after substituting $(sandbox-name).
allow_var(p_process, i_process, i_var, s_name) {
    print("allow_var 2: i_var =", i_var)

    some p_var in p_process.Env
    p_var2 := replace(p_var, "$(sandbox-name)", s_name)
    print("allow_var 2: p_var2 =", p_var2)

    p_var2 == i_var

    print("allow_var 2: true")
}

# Allow input env variables that match with a request_defaults regex.
allow_var(p_process, i_process, i_var, s_name) {
    print("allow_var 3: start")

    some p_regex1 in policy_data.request_defaults.CreateContainerRequest.allow_env_regex
    print("allow_var 3: p_regex1 =", p_regex1)

    p_regex2 := replace(p_regex1, "$(ipv4_a)", policy_data.common.ipv4_a)
    print("allow_var 3: p_regex2 =", p_regex2)

    p_regex3 := replace(p_regex2, "$(ip_p)", policy_data.common.ip_p)
    print("allow_var 3: p_regex3 =", p_regex3)

    p_regex4 := replace(p_regex3, "$(svc_name)", policy_data.common.svc_name)
    print("allow_var 3: p_regex4 =", p_regex4)

    p_regex5 := replace(p_regex4, "$(dns_label)", policy_data.common.dns_label)
    print("allow_var 3: p_regex5 =", p_regex5)

    print("allow_var 3: i_var =", i_var)
    regex.match(p_regex5, i_var)

    print("allow_var 3: true")
}

# Allow fieldRef "fieldPath: status.podIP" values.
allow_var(p_process, i_process, i_var, s_name) {
    print("allow_var 4: i_var =", i_var)

    name_value := split(i_var, "=")
    count(name_value) == 2
    is_ip(name_value[1])

    some p_var in p_process.Env
    allow_pod_ip_var(name_value[0], p_var)

    print("allow_var 4: true")
}

# Allow common fieldRef variables.
allow_var(p_process, i_process, i_var, s_name) {
    print("allow_var 5: i_var =", i_var)

    name_value := split(i_var, "=")
    count(name_value) == 2

    some p_var in p_process.Env
    p_name_value := split(p_var, "=")
    count(p_name_value) == 2

    p_name_value[0] == name_value[0]

    # TODO: should these be handled in a different way?
    always_allowed := ["$(host-name)", "$(node-name)", "$(pod-uid)"]
    some allowed in always_allowed
    contains(p_name_value[1], allowed)

    print("allow_var 5: true")
}

# Allow fieldRef "fieldPath: status.hostIP" values.
allow_var(p_process, i_process, i_var, s_name) {
    print("allow_var 6: i_var =", i_var)

    name_value := split(i_var, "=")
    count(name_value) == 2
    is_ip(name_value[1])

    some p_var in p_process.Env
    allow_host_ip_var(name_value[0], p_var)

    print("allow_var 6: true")
}

# Allow resourceFieldRef values (e.g., "limits.cpu").
allow_var(p_process, i_process, i_var, s_name) {
    print("allow_var 7: i_var =", i_var)

    name_value := split(i_var, "=")
    count(name_value) == 2

    some p_var in p_process.Env
    p_name_value := split(p_var, "=")
    count(p_name_value) == 2

    p_name_value[0] == name_value[0]

    # TODO: should these be handled in a different way?
    always_allowed = ["$(resource-field)", "$(todo-annotation)"]
    some allowed in always_allowed
    contains(p_name_value[1], allowed)

    print("allow_var 7: true")
}

allow_pod_ip_var(var_name, p_var) {
    print("allow_pod_ip_var: var_name =", var_name, "p_var =", p_var)

    p_name_value := split(p_var, "=")
    count(p_name_value) == 2

    p_name_value[0] == var_name
    p_name_value[1] == "$(pod-ip)"

    print("allow_pod_ip_var: true")
}

allow_host_ip_var(var_name, p_var) {
    print("allow_host_ip_var: var_name =", var_name, "p_var =", p_var)

    p_name_value := split(p_var, "=")
    count(p_name_value) == 2

    p_name_value[0] == var_name
    p_name_value[1] == "$(host-ip)"

    print("allow_host_ip_var: true")
}

is_ip(value) {
    bytes = split(value, ".")
    count(bytes) == 4

    is_ip_first_byte(bytes[0])
    is_ip_other_byte(bytes[1])
    is_ip_other_byte(bytes[2])
    is_ip_other_byte(bytes[3])
}
is_ip_first_byte(component) {
    number = to_number(component)
    number >= 1
    number <= 255
}
is_ip_other_byte(component) {
    number = to_number(component)
    number >= 0
    number <= 255
}

# OCI root.Path
allow_root_path(p_oci, i_oci, bundle_id) {
    p_path1 := p_oci.Root.Path
    print("allow_root_path: p_path1 =", p_path1)

    p_path2 := replace(p_path1, "$(cpath)", policy_data.common.cpath)
    print("allow_root_path: p_path2 =", p_path2)

    p_path3 := replace(p_path2, "$(bundle-id)", bundle_id)
    print("allow_root_path: p_path3 =", p_path3)

    p_path3 == i_oci.Root.Path

    print("allow_root_path: true")
}

# device mounts
allow_mount(p_oci, i_mount, bundle_id, sandbox_id) {
    print("allow_mount: start")

    some p_mount in p_oci.Mounts
    check_mount(p_mount, i_mount, bundle_id, sandbox_id)

    # TODO: are there any other required policy checks for mounts - e.g.,
    #       multiple mounts with same source or destination?

    print("allow_mount: true")
}

check_mount(p_mount, i_mount, bundle_id, sandbox_id) {
    print("check_mount 1: p_mount =", p_mount)
    print("check_mount 1: i_mount =", i_mount)

    p_mount == i_mount

    print("check_mount 1: true")
}
check_mount(p_mount, i_mount, bundle_id, sandbox_id) {
    print("check_mount 2: i destination =", i_mount.destination, "p destination =", p_mount.destination)
    p_mount.destination == i_mount.destination

    print("check_mount 2: i type =", i_mount.type_, "p type =", p_mount.type_)
    p_mount.type_ == i_mount.type_

    print("check_mount 2: i options =", i_mount.options)
    print("check_mount 2: p options =", p_mount.options)
    p_mount.options == i_mount.options

    mount_source_allows(p_mount, i_mount, bundle_id, sandbox_id)

    print("check_mount 2: true")
}

mount_source_allows(p_mount, i_mount, bundle_id, sandbox_id) {
    print("mount_source_allows 1: i_mount.source =", i_mount.source)

    regex1 := p_mount.source
    print("mount_source_allows 1: regex1 =", regex1)

    regex2 := replace(regex1, "$(sfprefix)", policy_data.common.sfprefix)
    print("mount_source_allows 1: regex2 =", regex2)

    regex3 := replace(regex2, "$(cpath)", policy_data.common.cpath)
    print("mount_source_allows 1: regex3 =", regex3)

    regex4 := replace(regex3, "$(bundle-id)", bundle_id)
    print("mount_source_allows 1: regex4 =", regex4)

    regex.match(regex4, i_mount.source)

    print("mount_source_allows 1: true")
}
mount_source_allows(p_mount, i_mount, bundle_id, sandbox_id) {
    print("mount_source_allows 2: i_mount.source=", i_mount.source)

    regex1 := p_mount.source
    print("mount_source_allows 2: regex1 =", regex1)

    regex2 := replace(regex1, "$(sfprefix)", policy_data.common.sfprefix)
    print("mount_source_allows 2: regex2 =", regex2)

    regex3 := replace(regex2, "$(cpath)", policy_data.common.cpath)
    print("mount_source_allows 2: regex3 =", regex3)

    regex4 := replace(regex3, "$(sandbox-id)", sandbox_id)
    print("mount_source_allows 2: regex4 =", regex4)

    regex.match(regex4, i_mount.source)

    print("mount_source_allows 2: true")
}

######################################################################
# Create container Storages

allow_storages(p_storages, i_storages, bundle_id, sandbox_id) {
    p_count := count(p_storages)
    i_count := count(i_storages)
    print("allow_storages: p_count =", p_count, "i_count =", i_count)

    p_count == i_count

    # Get the container image layer IDs and verity root hashes, from the "overlayfs" storage.
    some overlay_storage in p_storages
    overlay_storage.driver == "overlayfs"
    print("allow_storages: overlay_storage =", overlay_storage)
    count(overlay_storage.options) == 2

    layer_ids := split(overlay_storage.options[0], ":")
    print("allow_storages: layer_ids =", layer_ids)

    root_hashes := split(overlay_storage.options[1], ":")
    print("allow_storages: root_hashes =", root_hashes)

    every i_storage in i_storages {
        allow_storage(p_storages, i_storage, bundle_id, sandbox_id, layer_ids, root_hashes)
    }

    print("allow_storages: true")
}

allow_storage(p_storages, i_storage, bundle_id, sandbox_id, layer_ids, root_hashes) {
    some p_storage in p_storages

    print("allow_storage: p_storage =", p_storage)
    print("allow_storage: i_storage =", i_storage)

    p_storage.driver           == i_storage.driver
    p_storage.driver_options   == i_storage.driver_options
    p_storage.fs_group         == i_storage.fs_group

    allow_storage_options(p_storage, i_storage, layer_ids, root_hashes)
    allow_mount_point(p_storage, i_storage, bundle_id, sandbox_id, layer_ids)

    # TODO: validate the source field too.

    print("allow_storage: true")
}

allow_storage_options(p_storage, i_storage, layer_ids, root_hashes) {
    print("allow_storage_options 1: start")

    p_storage.driver != "blk"
    p_storage.driver != "overlayfs"
    p_storage.options == i_storage.options

    print("allow_storage_options 1: true")
}
allow_storage_options(p_storage, i_storage, layer_ids, root_hashes) {
    print("allow_storage_options 2: start")

    p_storage.driver == "overlayfs"
    count(p_storage.options) == 2

    policy_ids := split(p_storage.options[0], ":")
    print("allow_storage_options 2: policy_ids =", policy_ids)
    policy_ids == layer_ids

    policy_hashes := split(p_storage.options[1], ":")
    print("allow_storage_options 2: policy_hashes =", policy_hashes)

    p_count := count(policy_ids)
    print("allow_storage_options 2: p_count =", p_count)
    p_count >= 1
    p_count == count(policy_hashes)

    i_count := count(i_storage.options)
    print("allow_storage_options 2: i_count =", i_count)
    i_count == p_count + 3

    print("allow_storage_options 2: i_storage.options[0] =", i_storage.options[0])
    i_storage.options[0] == "io.katacontainers.fs-opt.layer-src-prefix=/var/lib/containerd/io.containerd.snapshotter.v1.tardev/layers"

    print("allow_storage_options 2: i_storage.options[i_count - 2] =", i_storage.options[i_count - 2])
    i_storage.options[i_count - 2] == "io.katacontainers.fs-opt.overlay-rw"

    lowerdir := concat("=", ["lowerdir", p_storage.options[0]])
    print("allow_storage_options 2: lowerdir =", lowerdir)

    print("allow_storage_options 2: i_storage.options[i_count - 1] =", i_storage.options[i_count - 1])
    i_storage.options[i_count - 1] == lowerdir

    every i, policy_id in policy_ids {
        allow_overlay_layer(policy_id, policy_hashes[i], i_storage.options[i + 1])
    }

    print("allow_storage_options 2: true")
}
allow_storage_options(p_storage, i_storage, layer_ids, root_hashes) {
    print("allow_storage_options 3: start")

    p_storage.driver == "blk"
    count(p_storage.options) == 1

    startswith(p_storage.options[0], "$(hash")
    hash_suffix := trim_left(p_storage.options[0], "$(hash")

    endswith(hash_suffix, ")")
    hash_index := trim_right(hash_suffix, ")")
    i := to_number(hash_index)
    print("allow_storage_options 3: i =", i)

    hash_option := concat("=", ["io.katacontainers.fs-opt.root-hash", root_hashes[i]])
    print("allow_storage_options 3: hash_option =", hash_option)

    count(i_storage.options) == 4
    i_storage.options[0] == "ro"
    i_storage.options[1] == "io.katacontainers.fs-opt.block_device=file"
    i_storage.options[2] == "io.katacontainers.fs-opt.is-layer"
    i_storage.options[3] == hash_option

    print("allow_storage_options 3: true")
}

allow_overlay_layer(policy_id, policy_hash, i_option) {
    print("allow_overlay_layer: policy_id =", policy_id, "policy_hash =", policy_hash)
    print("allow_overlay_layer: i_option =", i_option)

    startswith(i_option, "io.katacontainers.fs-opt.layer=")
    i_value := replace(i_option, "io.katacontainers.fs-opt.layer=", "")
    i_value_decoded := base64.decode(i_value)
    print("allow_overlay_layer: i_value_decoded =", i_value_decoded)

    policy_suffix := concat("=", ["tar,ro,io.katacontainers.fs-opt.block_device=file,io.katacontainers.fs-opt.is-layer,io.katacontainers.fs-opt.root-hash", policy_hash])
    p_value := concat(",", [policy_id, policy_suffix])
    print("allow_overlay_layer: p_value =", p_value)

    p_value == i_value_decoded

    print("allow_overlay_layer: true")
}

allow_mount_point(p_storage, i_storage, bundle_id, sandbox_id, layer_ids) {
    print("allow_mount_point 1: i_storage.mount_point =", i_storage.mount_point)
    p_storage.fstype == "tar"

    startswith(p_storage.mount_point, "$(layer")
    mount_suffix := trim_left(p_storage.mount_point, "$(layer")

    endswith(mount_suffix, ")")
    layer_index := trim_right(mount_suffix, ")")
    i := to_number(layer_index)
    print("allow_mount_point 1: i =", i)

    layer_id := layer_ids[i]
    print("allow_mount_point 1: layer_id =", layer_id)

    p_mount := concat("/", ["/run/kata-containers/sandbox/layers", layer_id])
    print("allow_mount_point 1: p_mount =", p_mount)

    p_mount == i_storage.mount_point

    print("allow_mount_point 1: true")
}
allow_mount_point(p_storage, i_storage, bundle_id, sandbox_id, layer_ids) {
    print("allow_mount_point 2: i_storage.mount_point =", i_storage.mount_point)
    p_storage.fstype == "fuse3.kata-overlay"

    mount1 := replace(p_storage.mount_point, "$(cpath)", policy_data.common.cpath)
    mount2 := replace(mount1, "$(bundle-id)", bundle_id)
    print("allow_mount_point 2: mount2 =", mount2)

    mount2 == i_storage.mount_point

    print("allow_mount_point 2: true")
}
allow_mount_point(p_storage, i_storage, bundle_id, sandbox_id, layer_ids) {
    print("allow_mount_point 3: i_storage.mount_point =", i_storage.mount_point)
    p_storage.fstype == "local"

    mount1 := p_storage.mount_point
    print("allow_mount_point 3: mount1 =", mount1)

    mount2 := replace(mount1, "$(cpath)", policy_data.common.cpath)
    print("allow_mount_point 3: mount2 =", mount2)

    mount3 := replace(mount2, "$(sandbox-id)", sandbox_id)
    print("allow_mount_point 3: mount3 =", mount3)

    regex.match(mount3, i_storage.mount_point)

    print("allow_mount_point 3: true")
}
allow_mount_point(p_storage, i_storage, bundle_id, sandbox_id, layer_ids) {
    print("allow_mount_point 4: i_storage.mount_point =", i_storage.mount_point)
    p_storage.fstype == "bind"

    mount1 := p_storage.mount_point
    print("allow_mount_point 4: mount1 =", mount1)

    mount2 := replace(mount1, "$(cpath)", policy_data.common.cpath)
    print("allow_mount_point 4: mount2 =", mount2)

    mount3 := replace(mount2, "$(bundle-id)", bundle_id)
    print("allow_mount_point 4: mount3 =", mount3)

    regex.match(mount3, i_storage.mount_point)

    print("allow_mount_point 4: true")
}
allow_mount_point(p_storage, i_storage, bundle_id, sandbox_id, layer_ids) {
    print("allow_mount_point 5: i_storage.mount_point =", i_storage.mount_point)
    p_storage.fstype == "tmpfs"

    mount1 := p_storage.mount_point
    print("allow_mount_point 5: mount1 =", mount1)

    regex.match(mount1, i_storage.mount_point)

    print("allow_mount_point 5: true")
}

# process.Capabilities
allow_caps(p_caps, i_caps) {
    print("allow_caps: policy Ambient =", p_caps.Ambient)
    print("allow_caps: input Ambient =", i_caps.Ambient)
    match_caps(p_caps.Ambient, i_caps.Ambient)

    print("allow_caps: policy Bounding =", p_caps.Bounding)
    print("allow_caps: input Bounding =", i_caps.Bounding)
    match_caps(p_caps.Bounding, i_caps.Bounding)

    print("allow_caps: policy Effective =", p_caps.Effective)
    print("allow_caps: input Effective =", i_caps.Effective)
    match_caps(p_caps.Effective, i_caps.Effective)

    print("allow_caps: policy Inheritable =", p_caps.Inheritable)
    print("allow_caps: input Inheritable =", i_caps.Inheritable)
    match_caps(p_caps.Inheritable, i_caps.Inheritable)

    print("allow_caps: policy Permitted =", p_caps.Permitted)
    print("allow_caps: input Permitted =", i_caps.Permitted)
    match_caps(p_caps.Permitted, i_caps.Permitted)
}

match_caps(p_caps, i_caps) {
    print("match_caps 1: start")

    p_caps == i_caps

    print("match_caps 1: true")
}
match_caps(p_caps, i_caps) {
    print("match_caps 2: start")

    count(p_caps) == 1
    p_caps[0] == "$(default_caps)"

    print("match_caps 2: default_caps =", policy_data.common.default_caps)
    policy_data.common.default_caps == i_caps

    print("match_caps 2: true")
}
match_caps(p_caps, i_caps) {
    print("match_caps 3: start")

    count(p_caps) == 1
    p_caps[0] == "$(privileged_caps)"

    print("match_caps 3: privileged_caps =", policy_data.common.privileged_caps)
    policy_data.common.privileged_caps == i_caps

    print("match_caps 3: true")
}

######################################################################
check_directory_traversal(i_path) {
    contains(i_path, "../") == false
    endswith(i_path, "/..") == false
}

check_symlink_source {
    # TODO: delete this rule once the symlink_src field gets implemented
    # by all/most Guest VMs.
    not input.symlink_src
}
check_symlink_source {
    i_src := input.symlink_src
    print("check_symlink_source: i_src =", i_src)

    startswith(i_src, "/") == false
    check_directory_traversal(i_src)
}

allow_sandbox_storages(i_storages) {
    print("allow_sandbox_storages: i_storages =", i_storages)

    p_storages := policy_data.sandbox.storages
    every i_storage in i_storages {
        allow_sandbox_storage(p_storages, i_storage)
    }

    print("allow_sandbox_storages: true")
}

allow_sandbox_storage(p_storages, i_storage) {
    print("allow_sandbox_storage: i_storage =", i_storage)

    some p_storage in p_storages
    print("allow_sandbox_storage: p_storage =", p_storage)
    i_storage == p_storage

    print("allow_sandbox_storage: true")
}

CopyFileRequest {
    print("CopyFileRequest: input.path =", input.path)

    check_symlink_source
    check_directory_traversal(input.path)

    some regex1 in policy_data.request_defaults.CopyFileRequest
    regex2 := replace(regex1, "$(sfprefix)", policy_data.common.sfprefix)
    regex3 := replace(regex2, "$(cpath)", policy_data.common.cpath)
    regex4 := replace(regex3, "$(bundle-id)", "[a-z0-9]{64}")
    print("CopyFileRequest: regex4 =", regex4)

    regex.match(regex4, input.path)

    print("CopyFileRequest: true")
}

CreateSandboxRequest {
    print("CreateSandboxRequest: input.guest_hook_path =", input.guest_hook_path)
    count(input.guest_hook_path) == 0

    print("CreateSandboxRequest: input.kernel_modules =", input.kernel_modules)
    count(input.kernel_modules) == 0

    allow_sandbox_storages(input.storages)
}

ExecProcessRequest {
    print("ExecProcessRequest 1: input =", input)

    i_command = concat(" ", input.process.Args)
    print("ExecProcessRequest 3: i_command =", i_command)

    some p_command in policy_data.request_defaults.ExecProcessRequest.commands
    p_command == i_command

    print("ExecProcessRequest 1: true")
}
ExecProcessRequest {
    print("ExecProcessRequest 2: input =", input)

    # TODO: match input container ID with its corresponding container.exec_commands.
    i_command = concat(" ", input.process.Args)
    print("ExecProcessRequest 3: i_command =", i_command)

    some container in policy_data.containers
    some p_command in container.exec_commands
    print("ExecProcessRequest 2: p_command =", p_command)

    # TODO: should other input data fields be validated as well?
    p_command == i_command

    print("ExecProcessRequest 2: true")
}
ExecProcessRequest {
    print("ExecProcessRequest 3: input =", input)

    i_command = concat(" ", input.process.Args)
    print("ExecProcessRequest 3: i_command =", i_command)

    some p_regex in policy_data.request_defaults.ExecProcessRequest.regex
    print("ExecProcessRequest 3: p_regex =", p_regex)

    regex.match(p_regex, i_command)

    print("ExecProcessRequest 3: true")
}

ReadStreamRequest {
    policy_data.request_defaults.ReadStreamRequest == true
}

UpdateEphemeralMountsRequest {
    policy_data.request_defaults.UpdateEphemeralMountsRequest == true
}

WriteStreamRequest {
    policy_data.request_defaults.WriteStreamRequest == true
}

policy_data := {
  "containers": [
    {
      "OCI": {
        "Version": "1.1.0-rc.1",
        "Process": {
          "Terminal": false,
          "User": {
            "UID": 65535,
            "GID": 65535,
            "AdditionalGids": [],
            "Username": ""
          },
          "Args": [
            "/pause"
          ],
          "Env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
          ],
          "Cwd": "/",
          "Capabilities": {
            "Ambient": [],
            "Bounding": [
              "$(default_caps)"
            ],
            "Effective": [
              "$(default_caps)"
            ],
            "Inheritable": [],
            "Permitted": [
              "$(default_caps)"
            ]
          },
          "NoNewPrivileges": true
        },
        "Root": {
          "Path": "$(cpath)/$(bundle-id)",
          "Readonly": true
        },
        "Mounts": [
          {
            "destination": "/proc",
            "source": "proc",
            "type_": "proc",
            "options": [
              "nosuid",
              "noexec",
              "nodev"
            ]
          },
          {
            "destination": "/dev",
            "source": "tmpfs",
            "type_": "tmpfs",
            "options": [
              "nosuid",
              "strictatime",
              "mode=755",
              "size=65536k"
            ]
          },
          {
            "destination": "/dev/pts",
            "source": "devpts",
            "type_": "devpts",
            "options": [
              "nosuid",
              "noexec",
              "newinstance",
              "ptmxmode=0666",
              "mode=0620",
              "gid=5"
            ]
          },
          {
            "destination": "/dev/shm",
            "source": "/run/kata-containers/sandbox/shm",
            "type_": "bind",
            "options": [
              "rbind"
            ]
          },
          {
            "destination": "/dev/mqueue",
            "source": "mqueue",
            "type_": "mqueue",
            "options": [
              "nosuid",
              "noexec",
              "nodev"
            ]
          },
          {
            "destination": "/sys",
            "source": "sysfs",
            "type_": "sysfs",
            "options": [
              "nosuid",
              "noexec",
              "nodev",
              "ro"
            ]
          },
          {
            "destination": "/etc/resolv.conf",
            "source": "$(sfprefix)resolv.conf$",
            "type_": "bind",
            "options": [
              "rbind",
              "ro",
              "nosuid",
              "nodev",
              "noexec"
            ]
          }
        ],
        "Annotations": {
          "io.katacontainers.pkg.oci.bundle_path": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/$(bundle-id)",
          "io.katacontainers.pkg.oci.container_type": "pod_sandbox",
          "io.kubernetes.cri.container-type": "sandbox",
          "io.kubernetes.cri.sandbox-id": "^[a-z0-9]{64}$",
          "io.kubernetes.cri.sandbox-log-directory": "^/var/log/pods/$(sandbox-namespace)_$(sandbox-name)_[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$",
          "io.kubernetes.cri.sandbox-name": "kafka-golang-consumer",
          "io.kubernetes.cri.sandbox-namespace": "kafka",
          "nerdctl/network-namespace": "^/var/run/netns/cni-[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$"
        },
        "Linux": {
          "Namespaces": [
            {
              "Type": "ipc",
              "Path": ""
            },
            {
              "Type": "uts",
              "Path": ""
            },
            {
              "Type": "mount",
              "Path": ""
            }
          ],
          "MaskedPaths": [
            "/proc/acpi",
            "/proc/asound",
            "/proc/kcore",
            "/proc/keys",
            "/proc/latency_stats",
            "/proc/timer_list",
            "/proc/timer_stats",
            "/proc/sched_debug",
            "/sys/firmware",
            "/proc/scsi"
          ],
          "ReadonlyPaths": [
            "/proc/bus",
            "/proc/fs",
            "/proc/irq",
            "/proc/sys",
            "/proc/sysrq-trigger"
          ]
        }
      },
      "storages": [
        {
          "driver": "blk",
          "driver_options": [],
          "source": "",
          "fstype": "tar",
          "options": [
            "$(hash0)"
          ],
          "mount_point": "$(layer0)",
          "fs_group": null
        },
        {
          "driver": "overlayfs",
          "driver_options": [],
          "source": "",
          "fstype": "fuse3.kata-overlay",
          "options": [
            "5a5aad80055ff20012a50dc25f8df7a29924474324d65f7d5306ee8ee27ff71d",
            "817250f1a3e336da76f5bd3fa784e1b26d959b9c131876815ba2604048b70c18"
          ],
          "mount_point": "$(cpath)/$(bundle-id)",
          "fs_group": null
        }
      ],
      "exec_commands": []
    },
    {
      "OCI": {
        "Version": "1.1.0-rc.1",
        "Process": {
          "Terminal": false,
          "User": {
            "UID": 0,
            "GID": 0,
            "AdditionalGids": [],
            "Username": ""
          },
          "Args": [
            "/bin/skr"
          ],
          "Env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "BUILD_DIR=/go/src/github.com/microsoft/confidential-sidecar-containers",
            "HOSTNAME=$(host-name)",
            "SkrSideCarArgs=ewogICAgImNlcnRjYWNoZSI6IHsKCQkiZW5kcG9pbnRfdHlwZSI6ICJMb2NhbFRISU0iLAoJCSJlbmRwb2ludCI6ICIxNjkuMjU0LjE2OS4yNTQvbWV0YWRhdGEvVEhJTS9hbWQvY2VydGlmaWNhdGlvbiIKCX0gIAp9"
          ],
          "Cwd": "/",
          "Capabilities": {
            "Ambient": [],
            "Bounding": [
              "$(default_caps)"
            ],
            "Effective": [
              "$(default_caps)"
            ],
            "Inheritable": [],
            "Permitted": [
              "$(default_caps)"
            ]
          },
          "NoNewPrivileges": false
        },
        "Root": {
          "Path": "$(cpath)/$(bundle-id)",
          "Readonly": false
        },
        "Mounts": [
          {
            "destination": "/proc",
            "source": "proc",
            "type_": "proc",
            "options": [
              "nosuid",
              "noexec",
              "nodev"
            ]
          },
          {
            "destination": "/dev",
            "source": "tmpfs",
            "type_": "tmpfs",
            "options": [
              "nosuid",
              "strictatime",
              "mode=755",
              "size=65536k"
            ]
          },
          {
            "destination": "/dev/pts",
            "source": "devpts",
            "type_": "devpts",
            "options": [
              "nosuid",
              "noexec",
              "newinstance",
              "ptmxmode=0666",
              "mode=0620",
              "gid=5"
            ]
          },
          {
            "destination": "/dev/shm",
            "source": "/run/kata-containers/sandbox/shm",
            "type_": "bind",
            "options": [
              "rbind"
            ]
          },
          {
            "destination": "/dev/mqueue",
            "source": "mqueue",
            "type_": "mqueue",
            "options": [
              "nosuid",
              "noexec",
              "nodev"
            ]
          },
          {
            "destination": "/sys",
            "source": "sysfs",
            "type_": "sysfs",
            "options": [
              "nosuid",
              "noexec",
              "nodev",
              "ro"
            ]
          },
          {
            "destination": "/sys/fs/cgroup",
            "source": "cgroup",
            "type_": "cgroup",
            "options": [
              "nosuid",
              "noexec",
              "nodev",
              "relatime",
              "ro"
            ]
          },
          {
            "destination": "/etc/hosts",
            "source": "$(sfprefix)hosts$",
            "type_": "bind",
            "options": [
              "rbind",
              "rprivate",
              "rw"
            ]
          },
          {
            "destination": "/dev/termination-log",
            "source": "$(sfprefix)termination-log$",
            "type_": "bind",
            "options": [
              "rbind",
              "rprivate",
              "rw"
            ]
          },
          {
            "destination": "/etc/hostname",
            "source": "$(sfprefix)hostname$",
            "type_": "bind",
            "options": [
              "rbind",
              "rprivate",
              "rw"
            ]
          },
          {
            "destination": "/etc/resolv.conf",
            "source": "$(sfprefix)resolv.conf$",
            "type_": "bind",
            "options": [
              "rbind",
              "rprivate",
              "rw"
            ]
          },
          {
            "destination": "/var/run/secrets/kubernetes.io/serviceaccount",
            "source": "$(sfprefix)serviceaccount$",
            "type_": "bind",
            "options": [
              "rbind",
              "rprivate",
              "ro"
            ]
          },
          {
            "destination": "/var/run/secrets/azure/tokens",
            "source": "$(sfprefix)tokens$",
            "type_": "bind",
            "options": [
              "rbind",
              "rprivate",
              "ro"
            ]
          },
          {
            "destination": "/opt/confidential-containers/share/kata-containers/reference-info-base64",
            "source": "$(sfprefix)reference-info-base64$",
            "type_": "bind",
            "options": [
              "rbind",
              "rprivate",
              "rw"
            ]
          }
        ],
        "Annotations": {
          "io.katacontainers.pkg.oci.bundle_path": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/$(bundle-id)",
          "io.katacontainers.pkg.oci.container_type": "pod_container",
          "io.kubernetes.cri.container-name": "skr",
          "io.kubernetes.cri.container-type": "container",
          "io.kubernetes.cri.image-name": "mcr.microsoft.com/aci/skr:2.7",
          "io.kubernetes.cri.sandbox-id": "^[a-z0-9]{64}$",
          "io.kubernetes.cri.sandbox-name": "kafka-golang-consumer",
          "io.kubernetes.cri.sandbox-namespace": "kafka"
        },
        "Linux": {
          "Namespaces": [
            {
              "Type": "ipc",
              "Path": ""
            },
            {
              "Type": "uts",
              "Path": ""
            },
            {
              "Type": "mount",
              "Path": ""
            }
          ],
          "MaskedPaths": [
            "/proc/acpi",
            "/proc/kcore",
            "/proc/keys",
            "/proc/latency_stats",
            "/proc/timer_list",
            "/proc/timer_stats",
            "/proc/sched_debug",
            "/proc/scsi",
            "/sys/firmware"
          ],
          "ReadonlyPaths": [
            "/proc/asound",
            "/proc/bus",
            "/proc/fs",
            "/proc/irq",
            "/proc/sys",
            "/proc/sysrq-trigger"
          ]
        }
      },
      "storages": [
        {
          "driver": "blk",
          "driver_options": [],
          "source": "",
          "fstype": "tar",
          "options": [
            "$(hash0)"
          ],
          "mount_point": "$(layer0)",
          "fs_group": null
        },
        {
          "driver": "blk",
          "driver_options": [],
          "source": "",
          "fstype": "tar",
          "options": [
            "$(hash1)"
          ],
          "mount_point": "$(layer1)",
          "fs_group": null
        },
        {
          "driver": "blk",
          "driver_options": [],
          "source": "",
          "fstype": "tar",
          "options": [
            "$(hash2)"
          ],
          "mount_point": "$(layer2)",
          "fs_group": null
        },
        {
          "driver": "blk",
          "driver_options": [],
          "source": "",
          "fstype": "tar",
          "options": [
            "$(hash3)"
          ],
          "mount_point": "$(layer3)",
          "fs_group": null
        },
        {
          "driver": "blk",
          "driver_options": [],
          "source": "",
          "fstype": "tar",
          "options": [
            "$(hash4)"
          ],
          "mount_point": "$(layer4)",
          "fs_group": null
        },
        {
          "driver": "blk",
          "driver_options": [],
          "source": "",
          "fstype": "tar",
          "options": [
            "$(hash5)"
          ],
          "mount_point": "$(layer5)",
          "fs_group": null
        },
        {
          "driver": "overlayfs",
          "driver_options": [],
          "source": "",
          "fstype": "fuse3.kata-overlay",
          "options": [
            "d004eccdd1a26067107adad941c974b5d06f8ff5f2d80ec93a9f1933308a79e8:26ff9c6ac969c7b781f155c2d02cc9ee9fd2cf755b65a3c1612ebc6de44cf6cb:718c74cffee41f23c836c0543d57caf08b0bf19d479a8968f50a0a7e88f642a7:a46e8c407dfbaaeeedf5a55088c40fea80c299556c249ef18f81d63ad1cae518:60e870824a928fd25b7c66cdcb70a1314c829d7ad6b0bc4c747b95315ee6b8ff:d5567d3931056de85269d2ef44fe46bf168feb53c523a57180956b1af01c69cb",
            "67450082ab56da1aecc5eae2f18d980cd9e7306e79334a1a826a91cfd90114a8:ea0c434828d8018a9eb48d32de2d808172aced5936d4f7b9f1448870876ca9d8:b80aed9d438f979b1605f3435409f8e1d7ebe18796eada78707e429310a3ed68:d2c2a5da674ff266e7be0e293dd7fdbd2d1e347d963f4a3fa4228d6c8f27e08d:470de2937d9ecb0ac67492f6b4566e618f02699b812e68af1127e0b0a376fcc8:ad8468ff2a4197e09f0177d9b0852fa31a8164920dada2da7e1fab449dcfd9f1"
          ],
          "mount_point": "$(cpath)/$(bundle-id)",
          "fs_group": null
        }
      ],
      "exec_commands": []
    },
    {
      "OCI": {
        "Version": "1.1.0-rc.1",
        "Process": {
          "Terminal": false,
          "User": {
            "UID": 0,
            "GID": 0,
            "AdditionalGids": [],
            "Username": ""
          },
          "Args": [
            "/consume"
          ],
          "Env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "HOSTNAME=$(host-name)",
            "SkrClientKID=kafka-encryption-demo",
            "SkrClientMAAEndpoint=sharedweu.weu.attest.azure.net",
            "SkrClientAKVEndpoint=tvlskrkv.vault.azure.net",
            "TOPIC=kafka-demo-topic"
          ],
          "Cwd": "/",
          "Capabilities": {
            "Ambient": [],
            "Bounding": [
              "$(default_caps)"
            ],
            "Effective": [
              "$(default_caps)"
            ],
            "Inheritable": [],
            "Permitted": [
              "$(default_caps)"
            ]
          },
          "NoNewPrivileges": false
        },
        "Root": {
          "Path": "$(cpath)/$(bundle-id)",
          "Readonly": false
        },
        "Mounts": [
          {
            "destination": "/proc",
            "source": "proc",
            "type_": "proc",
            "options": [
              "nosuid",
              "noexec",
              "nodev"
            ]
          },
          {
            "destination": "/dev",
            "source": "tmpfs",
            "type_": "tmpfs",
            "options": [
              "nosuid",
              "strictatime",
              "mode=755",
              "size=65536k"
            ]
          },
          {
            "destination": "/dev/pts",
            "source": "devpts",
            "type_": "devpts",
            "options": [
              "nosuid",
              "noexec",
              "newinstance",
              "ptmxmode=0666",
              "mode=0620",
              "gid=5"
            ]
          },
          {
            "destination": "/dev/shm",
            "source": "/run/kata-containers/sandbox/shm",
            "type_": "bind",
            "options": [
              "rbind"
            ]
          },
          {
            "destination": "/dev/mqueue",
            "source": "mqueue",
            "type_": "mqueue",
            "options": [
              "nosuid",
              "noexec",
              "nodev"
            ]
          },
          {
            "destination": "/sys",
            "source": "sysfs",
            "type_": "sysfs",
            "options": [
              "nosuid",
              "noexec",
              "nodev",
              "ro"
            ]
          },
          {
            "destination": "/sys/fs/cgroup",
            "source": "cgroup",
            "type_": "cgroup",
            "options": [
              "nosuid",
              "noexec",
              "nodev",
              "relatime",
              "ro"
            ]
          },
          {
            "destination": "/etc/hosts",
            "source": "$(sfprefix)hosts$",
            "type_": "bind",
            "options": [
              "rbind",
              "rprivate",
              "rw"
            ]
          },
          {
            "destination": "/dev/termination-log",
            "source": "$(sfprefix)termination-log$",
            "type_": "bind",
            "options": [
              "rbind",
              "rprivate",
              "rw"
            ]
          },
          {
            "destination": "/etc/hostname",
            "source": "$(sfprefix)hostname$",
            "type_": "bind",
            "options": [
              "rbind",
              "rprivate",
              "rw"
            ]
          },
          {
            "destination": "/etc/resolv.conf",
            "source": "$(sfprefix)resolv.conf$",
            "type_": "bind",
            "options": [
              "rbind",
              "rprivate",
              "rw"
            ]
          },
          {
            "destination": "/var/run/secrets/kubernetes.io/serviceaccount",
            "source": "$(sfprefix)serviceaccount$",
            "type_": "bind",
            "options": [
              "rbind",
              "rprivate",
              "ro"
            ]
          },
          {
            "destination": "/var/run/secrets/azure/tokens",
            "source": "$(sfprefix)tokens$",
            "type_": "bind",
            "options": [
              "rbind",
              "rprivate",
              "ro"
            ]
          }
        ],
        "Annotations": {
          "io.katacontainers.pkg.oci.bundle_path": "/run/containerd/io.containerd.runtime.v2.task/k8s.io/$(bundle-id)",
          "io.katacontainers.pkg.oci.container_type": "pod_container",
          "io.kubernetes.cri.container-name": "kafka-golang-consumer",
          "io.kubernetes.cri.container-type": "container",
          "io.kubernetes.cri.image-name": "mcr.microsoft.com/acc/samples/kafka/consumer:1.0",
          "io.kubernetes.cri.sandbox-id": "^[a-z0-9]{64}$",
          "io.kubernetes.cri.sandbox-name": "kafka-golang-consumer",
          "io.kubernetes.cri.sandbox-namespace": "kafka"
        },
        "Linux": {
          "Namespaces": [
            {
              "Type": "ipc",
              "Path": ""
            },
            {
              "Type": "uts",
              "Path": ""
            },
            {
              "Type": "mount",
              "Path": ""
            }
          ],
          "MaskedPaths": [
            "/proc/acpi",
            "/proc/kcore",
            "/proc/keys",
            "/proc/latency_stats",
            "/proc/timer_list",
            "/proc/timer_stats",
            "/proc/sched_debug",
            "/proc/scsi",
            "/sys/firmware"
          ],
          "ReadonlyPaths": [
            "/proc/asound",
            "/proc/bus",
            "/proc/fs",
            "/proc/irq",
            "/proc/sys",
            "/proc/sysrq-trigger"
          ]
        }
      },
      "storages": [
        {
          "driver": "blk",
          "driver_options": [],
          "source": "",
          "fstype": "tar",
          "options": [
            "$(hash0)"
          ],
          "mount_point": "$(layer0)",
          "fs_group": null
        },
        {
          "driver": "blk",
          "driver_options": [],
          "source": "",
          "fstype": "tar",
          "options": [
            "$(hash1)"
          ],
          "mount_point": "$(layer1)",
          "fs_group": null
        },
        {
          "driver": "blk",
          "driver_options": [],
          "source": "",
          "fstype": "tar",
          "options": [
            "$(hash2)"
          ],
          "mount_point": "$(layer2)",
          "fs_group": null
        },
        {
          "driver": "blk",
          "driver_options": [],
          "source": "",
          "fstype": "tar",
          "options": [
            "$(hash3)"
          ],
          "mount_point": "$(layer3)",
          "fs_group": null
        },
        {
          "driver": "blk",
          "driver_options": [],
          "source": "",
          "fstype": "tar",
          "options": [
            "$(hash4)"
          ],
          "mount_point": "$(layer4)",
          "fs_group": null
        },
        {
          "driver": "overlayfs",
          "driver_options": [],
          "source": "",
          "fstype": "fuse3.kata-overlay",
          "options": [
            "b5b09b45053cdd993415dfade5793eb8822a5bd6298ae6a04ef2c630da7d4e53:c99b94f5c7978e7798d8a2946055aeb7ce249692f5fa6938b0e2172858852034:ff030cb4cf8a46927f11bf48d8be123b7c379571607859c19bc6a61cbb747487:a75a1b847d7546cdab6f35b1b9832f9cc586b92f6045dc3d10356ec7e29079b8:8eedd2e42eed8f5de226f8ba0ca8ff8ab9f58ce2b36e28c619f319a82ec30298",
            "922ee7161ab2710c7899dd954addebd473e3ccb6773f9690f80a8e4f0b6dc4c2:82617c6193aae87cc0f6c94dc058c05d1363afc0b5814db7304b39abe4124489:3673e91db0aa540e01fab0ad590dbbc6dafb660ea27224742d7d9540afcb0fa0:538551b312caa78e93e8657f55ea45756665bd3c39ec420174327645d80d77fa:74492a982cb86f16c0989b0d66cc420aa908c3fa8e23effcf2c0b4ace15f4fda"
          ],
          "mount_point": "$(cpath)/$(bundle-id)",
          "fs_group": null
        }
      ],
      "exec_commands": []
    }
  ],
  "common": {
    "cpath": "/run/kata-containers/shared/containers",
    "sfprefix": "^$(cpath)/$(bundle-id)-[a-z0-9]{16}-",
    "ipv4_a": "((25[0-5]|(2[0-4]|1\\d|[1-9]|)\\d)\\.?\\b){4}",
    "ip_p": "[0-9]{1,5}",
    "svc_name": "[A-Z0-9_\\.\\-]+",
    "dns_label": "[a-zA-Z0-9_\\.\\-]+",
    "default_caps": [
      "CAP_CHOWN",
      "CAP_DAC_OVERRIDE",
      "CAP_FSETID",
      "CAP_FOWNER",
      "CAP_MKNOD",
      "CAP_NET_RAW",
      "CAP_SETGID",
      "CAP_SETUID",
      "CAP_SETFCAP",
      "CAP_SETPCAP",
      "CAP_NET_BIND_SERVICE",
      "CAP_SYS_CHROOT",
      "CAP_KILL",
      "CAP_AUDIT_WRITE"
    ],
    "privileged_caps": [
      "CAP_CHOWN",
      "CAP_DAC_OVERRIDE",
      "CAP_DAC_READ_SEARCH",
      "CAP_FOWNER",
      "CAP_FSETID",
      "CAP_KILL",
      "CAP_SETGID",
      "CAP_SETUID",
      "CAP_SETPCAP",
      "CAP_LINUX_IMMUTABLE",
      "CAP_NET_BIND_SERVICE",
      "CAP_NET_BROADCAST",
      "CAP_NET_ADMIN",
      "CAP_NET_RAW",
      "CAP_IPC_LOCK",
      "CAP_IPC_OWNER",
      "CAP_SYS_MODULE",
      "CAP_SYS_RAWIO",
      "CAP_SYS_CHROOT",
      "CAP_SYS_PTRACE",
      "CAP_SYS_PACCT",
      "CAP_SYS_ADMIN",
      "CAP_SYS_BOOT",
      "CAP_SYS_NICE",
      "CAP_SYS_RESOURCE",
      "CAP_SYS_TIME",
      "CAP_SYS_TTY_CONFIG",
      "CAP_MKNOD",
      "CAP_LEASE",
      "CAP_AUDIT_WRITE",
      "CAP_AUDIT_CONTROL",
      "CAP_SETFCAP",
      "CAP_MAC_OVERRIDE",
      "CAP_MAC_ADMIN",
      "CAP_SYSLOG",
      "CAP_WAKE_ALARM",
      "CAP_BLOCK_SUSPEND",
      "CAP_AUDIT_READ",
      "CAP_PERFMON",
      "CAP_BPF",
      "CAP_CHECKPOINT_RESTORE"
    ]
  },
  "sandbox": {
    "storages": [
      {
        "driver": "ephemeral",
        "driver_options": [],
        "source": "shm",
        "fstype": "tmpfs",
        "options": [
          "noexec",
          "nosuid",
          "nodev",
          "mode=1777",
          "size=67108864"
        ],
        "mount_point": "/run/kata-containers/sandbox/shm",
        "fs_group": null
      }
    ]
  },
  "request_defaults": {
    "CreateContainerRequest": {
      "allow_env_regex": [
        "^HOSTNAME=$(dns_label)$",
        "^$(svc_name)_PORT_$(ip_p)_TCP=tcp://$(ipv4_a):$(ip_p)$",
        "^$(svc_name)_PORT_$(ip_p)_TCP_PROTO=tcp$",
        "^$(svc_name)_PORT_$(ip_p)_TCP_PORT=$(ip_p)$",
        "^$(svc_name)_PORT_$(ip_p)_TCP_ADDR=$(ipv4_a)$",
        "^$(svc_name)_SERVICE_HOST=$(ipv4_a)$",
        "^$(svc_name)_SERVICE_PORT=$(ip_p)$",
        "^$(svc_name)_SERVICE_PORT_$(dns_label)=$(ip_p)$",
        "^$(svc_name)_PORT=tcp://$(ipv4_a):$(ip_p)$",
        "^AZURE_CLIENT_ID=[A-Fa-f0-9-]*$",
        "^AZURE_TENANT_ID=[A-Fa-f0-9-]*$",
        "^AZURE_FEDERATED_TOKEN_FILE=/var/run/secrets/azure/tokens/azure-identity-token$",
        "^AZURE_AUTHORITY_HOST=https://login\\.microsoftonline\\.com/$"
      ]
    },
    "CopyFileRequest": [
      "$(sfprefix)"
    ],
    "ExecProcessRequest": {
      "commands": [],
      "regex": []
    },
    "ReadStreamRequest": true,
    "UpdateEphemeralMountsRequest": false,
    "WriteStreamRequest": false
  }
}
spec:
  serviceAccountName: workload-identity-sa
  runtimeClassName: kata-cc-isolation
  containers:
    - image: "mcr.microsoft.com/aci/skr:2.7"
      imagePullPolicy: Always
      name: skr
      env:
        - name: SkrSideCarArgs
          value: ewogICAgImNlcnRjYWNoZSI6IHsKCQkiZW5kcG9pbnRfdHlwZSI6ICJMb2NhbFRISU0iLAoJCSJlbmRwb2ludCI6ICIxNjkuMjU0LjE2OS4yNTQvbWV0YWRhdGEvVEhJTS9hbWQvY2VydGlmaWNhdGlvbiIKCX0gIAp9
      command:
        - /bin/skr
      volumeMounts:
        - mountPath: /opt/confidential-containers/share/kata-containers/reference-info-base64
          name: endor-loc
    - image: "mcr.microsoft.com/acc/samples/kafka/consumer:1.0"
      imagePullPolicy: Always
      name: kafka-golang-consumer
      env:
        - name: SkrClientKID
          value: kafka-encryption-demo
        - name: SkrClientMAAEndpoint
          value: sharedweu.weu.attest.azure.net
        - name: SkrClientAKVEndpoint
          value: tvlskrkv.vault.azure.net
        - name: TOPIC
          value: kafka-demo-topic
      command:
        - /consume
      ports:
        - containerPort: 3333
          name: kafka-consumer
      resources:
        limits:
          memory: 1Gi
          cpu: 200m
  volumes:
    - name: endor-loc
      hostPath:
        path: /opt/confidential-containers/share/kata-containers/reference-info-base64
---
apiVersion: v1
kind: Service
metadata:
  name: consumer
  namespace: default
spec:
  type: LoadBalancer
  selector:
    app.kubernetes.io/name: kafka-golang-consumer
  ports:
    - protocol: TCP
      port: 80
      targetPort: kafka-consumer

About those annotations

A few things you may have spotted in the YAML file:

  • The runtimeClass
    • The value kata-cc-isolation seems to be an AKS-specific implementation.
  • The hostPath mount
    • /opt/confidential-containers/share/kata-containers/reference-info-base64
    • This path holds the COSE Sign1 document containing the measurement of the utility VM (UVM) used to launch the container, encoded in base64. You can inspect this by decoding the file and using a tool that supports COSE_Sign1.
  • The io.katacontainers.config.agent.policy annotation

If you base64 decode the value inside the io.katacontainers.config.agent.policy annotation using base64 -d, you will see the Open Policy Agent policy passed to the kata-agent in the UVM.

package agent_policy

import future.keywords.in
import future.keywords.every

import input

# Default values, returned by OPA when rules cannot be evaluated to true.
default CopyFileRequest := false
default CreateContainerRequest := false
default CreateSandboxRequest := false
default DestroySandboxRequest := true
...
policy_data := {
  "containers": [
    ...
    {
      "OCI": {
        "Version": "1.1.0-rc.1",
        "Process": {
          "Terminal": false,
          "User": {
            "UID": 0,
            "GID": 0,
            "AdditionalGids": [],
            "Username": ""
          },
          "Args": [
            "/consume"
          ],
          "Env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
            "HOSTNAME=$(host-name)",
            "SkrClientKID=kafka-encryption-demo",
            "SkrClientMAAEndpoint=sharedweu.weu.attest.azure.net",
            "SkrClientAKVEndpoint=tvlskrkv.vault.azure.net",
            "TOPIC=kafka-demo-topic"
          ],
          "Cwd": "/",
          "Capabilities": {
            "Ambient": [],
            "Bounding": [
              "$(default_caps)"
            ],
            "Effective": [
              "$(default_caps)"
            ],
            "Inheritable": [],
            "Permitted": [
              "$(default_caps)"
            ]
          },
          "NoNewPrivileges": false
        },
...

Performing Secure Key Release w/ Key Vault

The base64’d policy value is passed through a SHA256 hashing algorithm and used in the setup-key.sh script, which creates a new RSA key backed by an HSM. The key is marked as exportable, and the Secure Key Release (SKR) policy is also created.

bash setup-key.sh "kafka-encryption-demo" tvlskrkv.vault.azure.net
# Key vault name is tvlskrkv
# AKV endpoint OK
# ......Generated key release policy kafka-encryption-demo-release-policy.json
# {
#   "attributes": {
#     "created": "2024-03-18T16:14:34+00:00",
#     "enabled": true,
#     "expires": null,
#     "exportable": true,
#     "hsmPlatform": "2",
#     "notBefore": null,
#     "recoverableDays": 7,
#     "recoveryLevel": "CustomizedRecoverable+Purgeable",
#     "updated": "2024-03-18T16:14:34+00:00"
#   },
#   "key": {
#     "crv": null,
#     "d": null,
#     "dp": null,
#     "dq": null,
#     "e": "AQAB",
#     "k": null,
#     "keyOps": [
#       "wrapKey",
#       "unwrapKey",
#       "encrypt",
#       "decrypt"
#     ],
#     "kid": "https://tvlskrkv.vault.azure.net/keys/kafka-encryption-demo/6b74f1730b054a518bc8ad26ae8c4631",
#     "kty": "RSA-HSM",
#     "n": "ixTg4gQqI71DplulO3caOx114yFzyAl1g2AzpcJEL1XVWvIJCkavpsKqHtajzRCHWSmx3A5a6QWLbNVXOQAhQEvdNNWb2p6qpIVb7QchOLC3aSlCTvrLQWyKoaeoaNSwXRqIr0h1q1YuC470kNHoLATa+yOVQPkLaeV/VHBaq0M86eaCe4o+qTfMbHbQ/6u/peQOeDEFJ/KEA/fusHoZwSkYskxxu/DfQFd2tGkdf9NieB6SwhL4puVBdAxWzk4Z0IlmUexfsfc4chVKMDNcfQqTOGfcRxXdlerLIYdCPb6nZGeOwxqxBCAkMp+fQikZ5irI96+49L9zsHybNnfgl7nMUFCrkBYy8hcK7RNgBC25VRCfV6Mr+/W2Kz9ZozIHU8JNFOyBwj8bQ5PLMI0NqW2xfWEuWRlmHYwxe+BczJeqe9mr21Gw3Z4s31UXWHfj+LRNEDqrv8RnO1009HyURlz6HK/J0wbYiBgI0FIyi4eyvwD9EXDX6U9NJ25EmF+T",
#     "p": null,
#     "q": null,
#     "qi": null,
#     "t": null,
#     "x": null,
#     "y": null
#   },
#   "managed": null,
#   "releasePolicy": {
#     "contentType": "application/json; charset=utf-8",
#     "encodedPolicy": "{\"version\":\"1.0.0\",\"anyOf\":[{\"authority\":\"https://sharedweu.weu.attest.azure.net\",\"allOf\":[{\"claim\":\"x-ms-attestation-type\",\"equals\":\"sevsnpvm\"},{\"claim\":\"x-ms-sevsnpvm-hostdata\",\"equals\":\"ee9f39dc52c3c8dfa1a8fe485b66a109c56e31804ddca1aad37e8b57be25bd16\"},{\"claim\":\"x-ms-compliance-status\",\"equals\":\"azure-signed-katacc-uvm\"},{\"claim\":\"x-ms-sevsnpvm-is-debuggable\",\"equals\":\"false\"}]}]}",
#     "immutable": false
#   },
#   "tags": null
# }
# ......Created RSA key in tvlskrkv.vault.azure.net
# ......Downloaded the public key to kafka-encryption-demo-pub.pem
# ......Generated key info file kafka-encryption-demo-info.json
# ......Key setup successful!

Releasing the private key from Azure Key Vault

In order to release the private key from Azure Key Vault to our Confidential Container workload, we must provide evidence that was signed by a specific instance of the Microsoft Azure Attestation service. Additionally, a series of claims must contain specific values. Here’s an example JSON structure representing this evidence:

{
    "anyOf": [
        {
            "authority": "https://sharedweu.weu.attest.azure.net",
            "allOf": [
                {
                    "claim": "x-ms-attestation-type",
                    "equals": "sevsnpvm"
                },
                {
                    "claim": "x-ms-sevsnpvm-hostdata",
                    "equals": "272a252154e79edebb18c0c7162adc6baaab731b45ac53253dbfa81e2af78623"
                },
                {
                    "claim": "x-ms-compliance-status",
                    "equals": "azure-signed-katacc-uvm"
                },
                {
                    "claim": "x-ms-sevsnpvm-is-debuggable",
                    "equals": "false"
                }
            ]
        }
    ],
    "version": "1.0.0"
}

Here is a quick summary of what these values mean:

  • authority: The Azure Attestation service endpoint.
  • claim: Different assertions required for the attestation process.
    • x-ms-attestation-type: Must be “sevsnpvm”.
    • x-ms-sevsnpvm-hostdata: Arbitrary data defined by the host at VM launch time. This ensures that if the pod spec changes, the key must be recreated.
    • x-ms-compliance-status: Must be “azure-signed-katacc-uvm”.
    • x-ms-sevsnpvm-is-debuggable: Must be “false”.

In order to get the value for x-ms-sevsnpvm-hostdata Microsoft will instruct you to execute the following command:

export WORKLOAD_MEASUREMENT=$(az confcom katapolicygen -y consumer.yaml --print-policy | base64 -d | sha256sum | cut -d' ' -f1)

Remember, generating a new key incurs a small cost, so it’s essential to be mindful of this when managing keys.

🔥 This is especially the case with HSM-protected keys, as certain “advanced key types” are charged at a higher rate, starting at $5 per key per month. Only actively used HSM-protected keys (those used in the prior 30-day period) are charged, and each version of an HSM-protected key is counted as a separate key.

In this example, you would need to generate a new key version whenever you update the x-ms-sevsnpvm-hostdata value. Thus, it’s crucial to be cautious to avoid accumulating a high Azure Consumption bill.

Once we’ve completed all of these steps, we should be left with this result:

Screenshot of a web browser displaying the message: ‘Welcome to Confidential Containers on AKS! Encrypted Kafka Message: Msg 3: Azure Confidential Computing’

Conclusion

After diving into the details of the Confidential Containers project, I believe that their solution strikes a balance between robust security measures and accessibility. I feel this will especially the case for those who are looking to leverage this technology for their applications, without creating significant barriers for their teams. I hope the project progresses from its initial CNCF “sandbox” to the “incubation” phase; this will show to everyone that it is successfully being used in production use by a small number of users along and that is is being worked on by a healthy amount of contributors.

I think that the development and refinement of CoCo to its current state is no small feat, to say the least. The journey to get to this point is a pretty big achievement, especially in the realm of secure containerization. Way to go to all of those on the CoCo project! 😎