Azure Confidential Inferencing with Oblivious HTTP

Some time ago, while gathering reference material for a session on Confidential Containers, I ran into a Microsoft demo on GitHub and Microsoft’s Azure AI Confidential Inferencing: Technical Deep-Dive blog post. It’s a very insightful post from September 2024 that goes into detail about how Microsoft was/is tackling confidential inferencing in Azure’s AI portfolio. Both mentioned something called Oblivious HTTP (OHTTP); something I had never heard of, and that seemed worth looking into.

To be honest, I just bookmarked it and moved on. I was short on time, and AI’s sudden explosion in popularity pulled me toward a different topic. I’m probably not the only one who had to drop everything to see what was going on there…

But every time I’d scroll past that bookmark, the same question would nag at me: “If TLS already protects the payload in transit, what does OHTTP actually add? Can’t I just encrypt my payload and be done with it? What’s so special about it that it warrants its own separate Microsoft demo and blog post?”

So I investigated! And fortunately, stepping away for a while didn’t mean I’d missed my window, as it seemed like the topic had only grown more relevant. (Sweet. 😎) Along the way I found this super insightful session by Antoine Delignat-Lavaud, a researcher at Microsoft, from the Confidential Computing Summit in November 2025. It covers the design decisions behind confidential inferencing on Azure, and I’ll be drawing on it at several points throughout this post. Well worth watching alongside this.

This post deliberately starts with a couple of the building blocks: hybrid public key encryption (HPKE), then the OHTTP flow, before moving on to the confidential AI inferencing angle.

What TLS doesn’t hide

We start with what we already know about HTTPS. When you connect to a service over TLS, the request body is encrypted on the wire. A passive observer sitting somewhere between you and the service should not be able to read your JSON payload, the headers inside the encrypted tunnel, or the response.

That is exactly what we want and what we’ve done for quite some time now. But the server still sees your network address. And on the way to the destination, intermediaries can observe quite a bit of metadata even without ever decrypting the body:

  • DNS lookups for the destination hostname
  • Source and destination IP addresses
  • TLS Server Name Indication (SNI) revealing which site you’re contacting
  • Timing patterns, request sizes, and request volume
  • Connection lifetimes and reconnect behavior
Shows how TLS encryption hides the payload while still exposing metadata such as IP address, SNI, timing, and request size to intermediaries.

For many systems, that is perfectly acceptable. Sometimes that metadata is even necessary for abuse prevention, billing, or troubleshooting. For other systems, especially ones that process sensitive inputs, that same metadata starts to reveal behavioral patterns that go well beyond what the content alone discloses: which tenant is calling which endpoint, how often, and when.

For a regulated enterprise sending sensitive prompts like clinical notes, legal drafts, or internal incident reports, that is not just inconvenient metadata. It is data you have an obligation to protect. Any intermediary that terminates TLS and logs requests after decryption can see both who you are and what you’re sending. Whoever holds that pair holds the relationship.

VPNs operate in a different space; a VPN moves your apparent network origin to the VPN provider, but the VPN provider then sees both who you are and where you’re going. The destination still terminates a single connection from a single source. The “who” and the “what” are still observable by someone with full visibility into the request. OHTTP takes a different approach: it deliberately splits that knowledge across two parties so that neither one of them holds both halves.

Compares a VPN, where one provider sees both client identity and destination, with OHTTP, where that knowledge is split between separate parties.

To be clear: TLS is not broken, and OHTTP is not a replacement for it. OHTTP addresses a different problem: reducing the ability of one party to observe both the client identity and the plaintext request content. It provides metadata privacy, not anonymity against a global adversary. Should the relay and gateway collude, or if a query is unique enough to be re-identified from content alone, OHTTP’s protections are limited. It is a layer in a defense-in-depth stack, not a silver bullet.

RFC 9458: Oblivious HTTP” frames this as request correlation. If an origin server sees every request from a client directly, it can link those requests over time. Even if application identifiers are removed, IP addresses and timing patterns may still be enough to correlate activity. For an AI service handling sensitive prompts, that is a meaningful risk. That correlation risk gets awkward quickly in AI deployments that still need layer 7 frontends, routing, abuse controls, and billing at the edge. Those systems may have a legitimate reason to touch the outer request path even when you would prefer them not to see the prompt itself.

So the question becomes: can we split knowledge across different parties? One party sees who sent something, but not what they sent. Another party sees what was sent, but not who sent it.

This concept is often illustrated with a sealed envelope and a courier. You write a message and place it in an envelope that only the final recipient can open. You hand that sealed envelope to a courier. The courier knows where they picked up the envelope, but cannot read the message. The recipient can open the envelope, but if the courier strips away the sender information before delivery, the recipient does not know who originally handed it over.

Shows how the relay learns the sender identity while the gateway learns the inner request content without seeing who sent it.

That split is what OHTTP is trying to provide. The client encrypts the HTTP request so only the gateway can open it. The relay forwards the encrypted request, but cannot decrypt it. The gateway decrypts and processes the request, but only sees the relay as the network peer. It is not magic anonymity dust. It is an architecture that intentionally separates identity metadata from request content, but it does not make plaintext private from the gateway or target once the request is opened.

Inside HPKE

Before we look at OHTTP itself, we need to take a small detour through “RFC 9180: Hybrid Public Key Encryption”, also known as HPKE. It is the cryptographic building block OHTTP uses to encrypt encapsulated HTTP messages.

Before we dive into the terminology, it helps to keep one “simple” picture in mind. The client uses a Diffie-Hellman operation between a fresh ephemeral private key and the gateway’s public key. The gateway does the same in reverse, using its private key and the client’s ephemeral public key. Both sides land on the same shared secret without it ever crossing the wire. HPKE then feeds that shared secret into a key derivation step and derives the symmetric keys that actually protect the request.

Shows the HPKE flow from a client ephemeral key pair through Diffie-Hellman, shared secret derivation, key scheduling, and an encrypted request.

You’ll probably notice that the name already gives away the pattern! Hybrid Public Key Encryption combines asymmetric cryptography with symmetric encryption:

  • Asymmetric (aka public-key) cryptography is used to establish shared secret material with the recipient.
  • Symmetric encryption is used to encrypt the actual payload efficiently.

Public-key operations are computationally expensive and not a great fit for encrypting large payloads directly. This is the same reason you do not use RSA-4096 to encrypt something like a storage blob end-to-end: it is slow, it has size limits, and it works better as a way to establish or wrap key material than as a bulk-encryption tool. Symmetric encryption is much faster, but you still need a safe way to agree on a key first. HPKE gives you a standardized way to do just that.

If you’ve worked with HTTPS, TLS, or other hybrid cryptographic schemes, you know that “hybrid crypto” is not new either. Other schemes often looked like this:

  1. The sender generates a random symmetric key and uses it to encrypt the payload.
  2. The sender encrypts that symmetric key with the recipient’s public key and sends both to the receiver.
  3. The receiver decrypts the symmetric key with their private key, then uses it to decrypt the payload.

RSA key transport is the usual example.

But HPKE takes a bit of a different route. It does not start with a pre-chosen symmetric key that then needs to be wrapped. The key encapsulation mechanism (KEM) will derive its shared secret material from a Diffie-Hellman exchange between a fresh sender key pair and the recipient’s public key. A subsequent key derivation function (KDF) derives the keying material for the Authenticated Encryption with Associated Data (AEAD) step from that shared secret. In HPKE, the only public value that needs to travel with the ciphertext is the sender’s ephemeral public key, exposed in HPKE as enc.

The three HPKE ingredients

RFC 9180 describes HPKE as a composition of three primitives:

  1. KEM, or Key Encapsulation Mechanism: establishes shared secret material using the recipient’s public key.
  2. KDF, or Key Derivation Function: derives encryption keys from that shared secret material.
  3. AEAD, or Authenticated Encryption with Associated Data: encrypts and authenticates the payload.

A ciphersuite is a triple (KEM, KDF, AEAD) containing a choice of algorithm for each primitive.

The one you will often see in OHTTP examples, combines these primitives referred to as kem_id, kdf_id, aead_id as follows:

  • kem_id: DHKEM(X25519, HKDF-SHA256) for the key encapsulation mechanism (KEM).
    • Diffie-Hellman-based key encapsulation using Curve25519 for the elliptic curve operations and HKDF-SHA256 for the key derivation.
  • kdf_id: HKDF-SHA256 for the key derivation function (KDF)
  • aead_id: AES-128-GCM for the authenticated encryption with associated data (AEAD).

💡 Note

A non-exhaustive list of algorithm identifiers suitable for different HPKE configurations can be found here.

By the way, that’s not a typo. HMAC-based Extract-and-Expand Key Derivation Function (HKDF) is used both as part of the key encapsulation mechanism (KEM) and as the standalone key derivation function (KDF). In the Diffie-Hellman-based KEM (DHKEM), the raw DH output is fed through HKDF extract-and-expand to produce the KEM shared secret. The HPKE key schedule (the fixed sequence of derivation steps defined in RFC 9180 Section 5) then applies HKDF a second time to derive three outputs:

  • AEAD key: used to encrypt and decrypt the payload.
  • Base nonce: A “number used once”, is a value that must never repeat under the same key. The starting counter both sides use to generate per-message nonces, incremented with each seal or open operation. Reusing a nonce with the same key breaks confidentiality and often authentication too.
  • Exporter secret: available for out-of-band use cases, but not needed in a basic OHTTP exchange.

Just a quick word on “encapsulation” versus “encryption”: they are not the same thing in HPKE. Encapsulation is specifically the KEM step it produces the shared_secret and the enc value (the serialized ephemeral public key) that travels with the ciphertext. Encryption is what the AEAD step does with the key material the KEM produces. The two steps are sequential: the KEM runs first, then its output feeds the AEAD.

The KDF and AEAD steps are fairly straightforward from here: you feed in the shared secret, and they give you the keys and encryption context you need to seal and open messages. The KEM step is where it gets a bit more interesting, because that’s the part that lets the client and gateway independently arrive at the same shared secret without ever exchanging it directly.

How two parties agree on a secret

But before we walk the encapsulation process, I think it would be beneficial to take a bit of a closer look at what the Diffie-Hellman step is actually doing.

💡 Note

To be honest, I don’t deal with DH nearly enough to even remember it correctly half the time, so I wanted to make sure I wrote it down in a way that made sense to me.

The core idea is simple: two parties can each arrive at the same shared secret using only their own private key and the other party’s public key, without ever sending that secret over the wire.

The simplified, Alice-and-Bob version goes a little something like this:

  1. Alice and Bob each generate a key pair: a private key they keep to themselves, and a public key they share freely.
  2. Alice performs an elliptic-curve scalar multiplication between her private key and Bob’s public key.
  3. Bob performs the same operation with his private key and Alice’s public key.
  4. Both arrive at the same result, a shared secret, without it ever crossing the wire.
Illustrates how Alice and Bob independently derive the same shared secret from their private and public keys without sending it over the wire.

An observer watching the exchange sees both of these public keys, but that is not enough to derive the shared secret. To reproduce the same value, you need at least one of the two private keys. If neither private key leaks, the shared secret never appears on the wire and cannot be reconstructed from public data alone.

💡 Note

If you prefer an analogy, think of the recipient’s public key as the mailbox anyone can address, and the private key as the thing that can actually open what arrives. Seeing the mailbox and the sealed envelope does not help you open it.

In Hybrid Public Key Encryption (HPKE) terms, “Bob” is the recipient (the OHTTP gateway in our case) holding a long-term static key pair. “Alice” is the client, but instead of reusing a long-term key, the client generates a fresh ephemeral key pair for every request. (That’s right, there is indeed a little more overhead.) That fresh ephemeral key pair is key to how this all works: it gives each request its own HPKE context and avoids key reuse.

What it does not mean is TLS-style forward secrecy for OHTTP across the lifetime of a gateway key configuration. If an attacker records ciphertext and later steals the gateway’s private key, the enc value sent with each request is enough for them to reconstruct the same HPKE context and decrypt those recorded requests. RFC 9458 calls this out explicitly. The protection boundary here is per-request key separation plus gateway key rotation, not full forward secrecy for the lifetime of one gateway key.

💡 Note

Forward secrecy means that if an attacker records encrypted traffic today and steals a long-term private key later, those old recordings should still stay unreadable. In modern TLS, that property comes from ephemeral Diffie-Hellman (ECDHE) on both sides of the handshake: both client and server generate fresh key pairs per session, so no single long-term key can unlock old sessions. In OHTTP’s HPKE setup, only the client contributes a fresh ephemeral key per request; the gateway holds a static private key for the duration of its key configuration. That asymmetry is why this is sometimes called static Diffie-Hellman on the receiver side: the client’s contribution is ephemeral, the gateway’s is not. Compromising the gateway’s private key exposes every request encrypted under that key configuration. Regular key rotation limits that window, but it does not change the underlying property.

Walking through Encap

Encap is the KEM operation that makes the shared secret happen. The sender generates a fresh ephemeral key pair, runs the DH operation against the recipient’s static public key, and emits enc, the serialized ephemeral public key that travels alongside the ciphertext so the receiver can mirror the same DH operation. Decap is the receiver’s mirror: it takes enc, combines it with the receiver’s static private key, and arrives at the same shared_secret without ever having seen the sender’s ephemeral private key.

The walkthrough below is specifically for HPKE’s Base mode. HPKE supports four modes, but OHTTP request encapsulation uses the simplest one.

The four modes are as follows:

ModeSender static keyPre-shared key (PSK)What it adds
BaseNoNoEncryption to the recipient only
PSKNoYesSender proves possession of a shared secret
AuthYesNoSender authentication with a sender key pair
AuthPSKYesYesSender authentication with both

OHTTP request encapsulation uses Base mode, so there is no sender static key or pre-shared key in this flow. In the DHKEM variant used here, the enc value is the sender’s ephemeral public key in serialized form.

The RFC 9180 pseudocode below shows exactly what each step does:

📖 Quote:

"Encap(pkR): Randomized algorithm to generate an ephemeral, fixed-length symmetric key (the KEM shared secret) and a fixed-length encapsulation of that key that can be decapsulated by the holder of the private key corresponding to pkR. This function can raise an EncapError on encapsulation failure."

RFC 9180, Section 4

def LabeledExtract(salt, label, ikm):
  # Prepends "HPKE-v1" and the suite identifier to the input key material
  # before hashing, so keys derived in different HPKE contexts can't collide.
  labeled_ikm = concat("HPKE-v1", suite_id, label, ikm)
  return Extract(salt, labeled_ikm)

def LabeledExpand(prk, label, info, L):
  # Same domain-separation idea on the expand side; L is the desired output length.
  labeled_info = concat(I2OSP(L, 2), "HPKE-v1", suite_id,
                        label, info)
  return Expand(prk, labeled_info, L)

def ExtractAndExpand(dh, kem_context):
  # Turns the raw DH output into the KEM shared_secret via a two-step HKDF pass.
  # kem_context binds both public keys so neither side can be swapped after the fact.
  eae_prk = LabeledExtract("", "eae_prk", dh)
  shared_secret = LabeledExpand(eae_prk, "shared_secret",
                                kem_context, Nsecret)
  return shared_secret

def Encap(pkR):
  skE, pkE = GenerateKeyPair()         # fresh ephemeral key pair, never reused
  dh = DH(skE, pkR)                    # DH(sender ephemeral private, recipient public)
  enc = SerializePublicKey(pkE)        # this is what travels with the ciphertext

  pkRm = SerializePublicKey(pkR)
  kem_context = concat(enc, pkRm)      # binds both public keys into the derivation

  shared_secret = ExtractAndExpand(dh, kem_context)
  return shared_secret, enc

def Decap(enc, skR):
  pkE = DeserializePublicKey(enc)      # recover sender's ephemeral public key
  dh = DH(skR, pkE)                    # DH(recipient private, sender ephemeral public)

  pkRm = SerializePublicKey(pk(skR))
  kem_context = concat(enc, pkRm)      # same context as Encap - must match exactly

  shared_secret = ExtractAndExpand(dh, kem_context)
  return shared_secret

The naming convention is:

  • sk = secret (private) key
  • pk = public key
  • R = recipient
  • E = ephemeral
SymbolMeaningWho possesses it
skRRecipient’s private keyOnly the recipient (e.g., OHTTP gateway)
pkRRecipient’s public keyDistributed by the recipient to parties that need to encrypt to it
skESender’s ephemeral private keyOnly the sender, for a single encapsulation
pkESender’s ephemeral public keySent to the recipient as part of enc

LabeledExtract and LabeledExpand are HKDF wrappers that prepend a domain-separation string ("HPKE-v1" plus the suite identifier) to every derivation input. The reason is that without that binding, two different HPKE configurations using the same underlying hash could theoretically produce colliding keys. By stamping every operation with the suite identity, RFC 9180 makes sure keys derived in one ciphersuite context can never be replayed in another.

Sender (client)

  1. In DHKEM(X25519, HKDF-SHA256), the sender generates a fresh X25519 ephemeral key pair. In most HPKE APIs this happens inside Encap() or SetupBaseS(), so you usually do not call key generation separately.
  2. Inside Encap(), the sender runs DH between the ephemeral private key and the recipient’s static public key, then derives the KEM shared_secret; enc is the serialized ephemeral public key that travels with the ciphertext.
  3. SetupBaseS() then runs the HPKE key schedule with that shared_secret and the same info input, yielding the AEAD key, base nonce, and exporter secret.
  4. The sender uses that context to seal the payload.
  5. The sender emits enc alongside the ciphertext.

Receiver (gateway)

  1. The receiver parses enc as the sender’s ephemeral public key. In most HPKE APIs this happens inside SetupBaseR(), so you usually do not deserialize it separately unless the library exposes that step.
  2. Inside Decap(), the receiver runs DH between its static private key and that ephemeral public key, then derives the same KEM shared_secret.
  3. SetupBaseR() then runs the HPKE key schedule with that shared_secret and the same info input, yielding the same AEAD key, base nonce, and exporter secret.
  4. The receiver uses that context to open the ciphertext.
  5. Without the receiver’s private key, the recorded enc value and ciphertext are just public inputs with no way to finish the computation.

Here is that same sender/receiver picture in expanded form:

Shows how the sender and receiver each apply Diffie-Hellman with their own private key and the other party's public key to reach the same shared secret.

That flowchart is the dependency view: which public and private values feed into the same shared secret and then into the HPKE key schedule. The sequence diagram below shows the same exchange in time order.

Shows the sequence of encapsulation, key scheduling, sealing, sending, decapsulation, and opening in order.

Neither side ever sees the other’s private key. Each runs the DH operation with its own private half and the other side’s public half, and both land on the same shared secret independently.

X25519 is the Diffie-Hellman function defined over Curve25519. Scalar multiplication on the curve is fast in software, it targets a 128-bit security level with no known practical attacks, and it ships in OpenSSL, BoringSSL, libsodium, and most modern TLS stacks. Nothing exotic to bolt on.

Why HPKE fits OHTTP

HPKE is a building block that lets a client encrypt a request to the gateway using only the gateway’s static public key, without first opening a direct session with it. The client fetches the gateway’s key configuration, picks a supported ciphersuite, and seals the request locally. The relay can then carry the encrypted blob to the gateway without being able to read it.

HPKE alone only gives you an encrypted blob. OHTTP also standardizes how HTTP requests and responses are represented, carried, encrypted, forwarded, and returned across all three parties. Without that standardization, every deployment would invent its own framing and interoperability would be someone else’s problem.

Shows a client fetching gateway public keys, sealing a request locally with HPKE, forwarding it through a relay, and decrypting it at the gateway.

In a confidential inferencing deployment, trusting that public key becomes its own problem. You also need a way to know which attested workload is allowed to hold the private half. We’ll come back to that when we get to key distribution and Secure Key Release.

That gives us enough context to look at the three-party OHTTP model: who holds which key, who can see what, and where the trust boundary sits.

Client, relay, gateway, target

OHTTP is specified in RFC 9458, which describes it as “a protocol for forwarding encrypted HTTP messages.” The RFC was published in January 2024 and builds on HPKE and Binary HTTP. It defines the protocol, the message formats, and the trust model for how the client, relay, gateway, and target resource interact.

The core idea is that the client encrypts an HTTP request so that only the gateway can read it, while the relay forwards that encrypted request without being able to see its contents. If you’re thinking, “wait is that relay just some sort of proxy?” then you’re already on the right track.

Shows the client sending an encapsulated request to the relay, the relay forwarding it to the gateway, and the gateway sending the decrypted request onward.

The four roles are easiest to compare side by side:

ComponentWhat It DoesWhat It Sees
ClientBuilds the inner HTTP request, serializes it, encrypts it with the gateway’s OHTTP key configuration, and sends the encrypted message to a relay.It knows the plaintext request and the relay endpoint; its network identity stays hidden from the gateway.
RelayReceives the encrypted request from the client and forwards it to the gateway.It knows the client’s network identity (IP address, TLS connection metadata, request timing) and where it is forwarding to, but the inner HTTP request is sealed inside an HPKE-encrypted blob. The relay cannot see method, path, headers inside the encapsulation, query parameters, or body.
GatewayReceives the encrypted request from the relay, decrypts it, processes the inner HTTP request, and returns an encrypted response.From a network perspective, it sees the relay as its peer rather than the original client, and it does not see the client’s network identity.
Target ResourceServes the HTTP request the client actually cares about; RFC 9458 treats it as a distinct resource.It receives a normal HTTP request from the gateway, may be colocated with it or sit upstream, and might be completely unaware that OHTTP was involved.

At this point you might reasonably begin to wonder: “hey, doesn’t this defeat the purpose of being able to remain anonymous?” After all, the relay still sees who I am. The nuance here is that the relay doesn’t see what you’re saying, and the gateway (which sees what you’re saying) doesn’t see who you are. That’s the “unlinkability” again, so no single party holds both halves of the relationship. It is not the same as “nobody anywhere knows you exist.”

The trust boundaries

RFC 9458 is pretty clear on how the trust model is supposed to take shape:

📖 Quote

“To achieve the stated privacy goals, the Oblivious Relay Resource cannot be operated by the same entity as the Oblivious Gateway Resource.”

It helps to keep at least two boundaries separate in your head. The RFC 9458 is strict about one of them: the relay and gateway cannot be operated by the same entity if you want the stated privacy properties. It is more flexible about the other: the gateway and target may be colocated, or they may be separate.

  1. The relay sees client-side network metadata: source IP, TLS connection details, timing, and message size.
  2. The gateway sees plaintext request content and the target-facing application semantics. If one operator can combine those two views, the separation collapses.

With the same operator running both the relay and gateway, both the client identity and the inner request are visible to a single entity. Though the protocol still functions as a valid OHTTP deployment:

Shows the request flow when a single entity operates both the relay and gateway, which weakens the privacy boundary between client identity and request content.

It’s by no means a bad approach, but it just does not give you the same privacy properties as the split-operator model, which we will see in a second. The relay and gateway together can link requests to clients, even if the inner request content is still encrypted from the relay. That may be an acceptable risk for some deployments, but it is not the unlinkability property RFC 9458 describes.

💡 Note

Microsoft’s Azure AI Confidential Inferencing: Technical Deep-Dive blog post describes this model by default: Azure’s load balancers, frontends, and firewalls act as the relay layer, forwarding encrypted OHTTP blobs to the OHTTP Gateway inside the confidential TEE.

In his Confidential Computing Summit 2025 talk, Antoine confirms this too: they repurpose OHTTP’s request encryption mechanism as an application-level encrypted transport through Microsoft’s own untrusted infrastructure layers, not as a three-party unlinkability mechanism. In that default configuration, the relay and gateway are both operated by Microsoft, which does not give you the same unlinkability guarantees.

However, Microsoft’s deep dive does mention an optional addition: “the client application may optionally use an OHTTP proxy outside of Azure to provide stronger unlinkability between clients and inference requests.

The gateway and the Target Resource, though, are a slightly different story. RFC 9458 explicitly says that colocation is not necessarily a bad thing, and colocation of the gateway and target simplifies the deployment without affecting client privacy. You may still care about logging, transport security, or how much you trust the target. Those are real concerns, but they are not the unlinkability boundary. The hard boundary is relay versus gateway.

Shows the relay and gateway as separate entities so no single operator sees both the client identity and the plaintext request content.

Keep in mind that what we’re after is unlinkability: no single party holds both the client’s network identity and the request content. Not absolute anonymity. A relay that plays by the rules should forward the ciphertext, strip or avoid unknown identifying headers, and never copy the client’s source IP into Forwarded, Via, or similar metadata for the gateway. However, a malicious or careless relay can still shrink the anonymity set through logging, timing, header injection, differential treatment, or metadata leaks. OHTTP keeps one party from seeing both halves by default. It does not make every intermediary a benevolent one.

The same caveat applies to content privacy. OHTTP hides the request from the relay and from middleboxes or network appliances sitting between the TLS hops, but the gateway decrypts by design. Once the gateway opens the request, plaintext exists there. OHTTP is about unlinkability, not “nobody can ever read the data.”

There is a third caveat that lives at the application layer. OHTTP hides the sender’s network identity from the gateway, but it says nothing about what the application puts inside the encrypted request. If the request body contains a user ID, an auth token, a name, or any other personally identifiable information, the gateway can correlate that request back to a specific person, regardless of what OHTTP does at the transport layer. As Tinfoil puts it: “It’s analogous to wearing a hood and mask to protect your identity and then pinning your ID to your forehead.” OHTTP protects who sent the request. It does not protect what was sent from the gateway that processes it.

In practice, deployments lean on contracts, transparency, and audits to make that trust split credible. For example: Cloudflare went a step further and formally analyzed OHTTP with the Tamarin theorem prover. They published the models and proofs, which is useful if you want to read the privacy story in a more formal way than a blog post can provide. Cloudflare also offers Privacy Gateway to select Enterprise customers, so you could set up the three-entity separation by using them as your OHTTP relay provider.

That’s the trust boundary in abstract terms. The next step is to make it concrete. Let’s try to figure out what exactly gets fetched, wrapped, forwarded, decrypted, and returned during an OHTTP exchange?

The flow of an OHTTP request

At a high level, the request is encrypted by the client, handed to the relay as an opaque blob, decrypted only at the gateway, and then returned along the same path as an encrypted response. To see where each privacy property comes from, it helps to walk one complete request from key discovery through response decryption.

A complete OHTTP exchange starts before the relay ever sees a POST. The client first needs an authenticated gateway key configuration in application/ohttp-keys form. RFC 9458 defines that format, but not a single mandatory discovery mechanism. RFC 9540 (“Discovery of Oblivious Services via Service Binding Records”) standardized one discovery path using SVCB / HTTPS DNS records and the well-known gateway URI /.well-known/ohttp-gateway, while out-of-band configuration is still possible.

For example, a client that already knows the target host can fetch the key configuration like this:

GET /.well-known/ohttp-gateway HTTP/1.1
Host: ogw-inference.apps.thomasvanlaere.com
Accept: application/ohttp-keys

If the gateway supports that discovery path, it responds with a body of type application/ohttp-keys. That body is a binary serialization of one or more OHTTP key configurations, not a JSON document or some other human-readable structure. RFC 9458 Appendix A shows a concrete example as hex, but that hex string is only a readable rendering for the RFC; on the wire, the body is binary.

That example only covers gateway discovery and key fetching. Relay discovery is still out of scope for RFC 9540, so clients typically rely on a preconfigured relay or some separate trust and configuration path.

💡 Note

OHTTP does not encrypt an arbitrary string and call it a day. It encapsulates an HTTP request using Binary HTTP, also known as bHTTP. Binary HTTP is a binary representation of HTTP semantics: request control data, headers, content, and trailers packed into a single self-contained object that HPKE can encrypt.

In this flow, bHTTP is the step that turns the inner request into one deterministic binary message before HPKE sealing. That gives the client and gateway a stable object to serialize, encrypt, decrypt, and recover again, regardless of what HTTP version is used on the outer client-to-relay or relay-to-gateway hops.

The full request-response flow looks like this:

Shows the client fetching key material, building and encrypting the request, forwarding it through the relay and gateway, and returning the encrypted response.

Let’s use an inferencing request as the inner HTTP message. Pay attention to the host, we have it set to inference.apps.thomasvanlaere.com to make it clear that the relay should not see it:

POST /v1/responses HTTP/1.1
Host: inference.apps.thomasvanlaere.com
Content-Type: application/json
Accept: application/json
Authorization: Bearer <token>

{
  "model": "local-confidential-model",
  "input": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_file",
          "filename": "root-cause-addendum.txt",
          "file_data": "data:text/plain;base64,TXkgZnJpZW5kIGxldCBtZSByaWdodC1jbGljay1kZXBsb3kgdG8gcHJvZC4="
        },
        {
          "type": "input_text",
          "text": "Summarize this incident report and identify follow-up actions."
        }
      ]
    }
  ]
}

The content type is application/json here, but it could be anything: multipart form data, protobuf, XML, whatever the target resource expects. The point is that this is the inner HTTP request that gets serialized into bHTTP and then encrypted with HPKE. Let’s assume we performed the full HPKE encapsulation of that inner HTTP request, producing an encrypted blob that only the gateway can open.

The outer request going to the relay has Host: privacy-relay.cloudflare.com and Content-Type: message/ohttp-req:

POST /ogw-inference HTTP/1.1
Host: privacy-relay.cloudflare.com
Content-Type: message/ohttp-req

<encrypted binary HTTP of that inference payload from above>

In the example above, /ogw-inference is the relay resource path — a gateway name slug registered with Cloudflare Privacy Gateway following its /<GATEWAY_SERVER_NAME> convention, which maps to the gateway at ogw-inference.apps.thomasvanlaere.com. It is configured to forward to one specific gateway resource. RFC 9458 assumes a fixed one-to-one mapping between a relay resource and a gateway resource. If you are rolling your own relay rather than using a managed service like Cloudflare Privacy Gateway, or need to front multiple gateways, the practical question is simply which relay resource URI maps to which gateway resource URI. Those URIs can vary by hostname, by path, or by a mix of both.

PatternRelay resource URIGateway resource URI
Different hostnameshttps://assistant.relay.example.com/ohttp-gatewayhttps://ogw-assistant.apps.thomasvanlaere.com
Different hostnameshttps://search.relay.example.com/ohttp-gatewayhttps://ogw-search.apps.thomasvanlaere.com
Different pathshttps://relay.example.com/ohttp-gateway/assistanthttps://ogw-assistant.apps.thomasvanlaere.com
Different pathshttps://relay.example.com/ohttp-gateway/searchhttps://ogw-search.apps.thomasvanlaere.com
Mixed hostname and pathhttps://ai.relay.example.com/ohttp-gateway/assistanthttps://ogw-assistant.apps.thomasvanlaere.com
Mixed hostname and pathhttps://ai.relay.example.com/ohttp-gateway/searchhttps://ogw-search.apps.thomasvanlaere.com

💡 Note

The exact hostname structure here is illustrative — the ogw-*.apps.thomasvanlaere.com naming pattern reflects my own deployment conventions and how I’d expect to set this up, but yours will likely differ. The point is the pattern: each OHTTP gateway gets its own hostname, paired one-to-one with a target application.

These are routing endpoints between the relay and the gateway. They are not the same thing as the inner target request shown above, which still goes to Host: inference.apps.thomasvanlaere.com with the path to /v1/responses.

💡 Note

You are not limited to a fixed path like /ohttp-gateway.

/.well-known/ohttp-gateway is the standardized discovery location for gateway configuration in RFC 9540, not a universal requirement for the relay-facing or gateway-facing request path. RFC 9458 Appendix A uses different concrete resource paths entirely: https://proxy.example.org/request.example.net/proxy for the relay resource and https://example.com/oblivious/request for the gateway resource.

Now that the full exchange is on the table, we can zoom in on the pieces that show up on the wire.

What shows up on the wire

OHTTP defines media types for the key configuration and encapsulated messages:

  • application/ohttp-keys for the gateway key configuration
  • message/ohttp-req for an encrypted OHTTP request
  • message/ohttp-res for an encrypted OHTTP response

The client first needs a gateway key configuration. That configuration contains one or more public keys and the supported HPKE identifiers. A simplified key configuration shape looks like this:

KeyConfig {
  key_id                         // 8 bits
  kem_id                         // 16 bits
  public_key                     // variable amount of bytes, depends on KEM
  symmetric_algorithms_length    // 16 bits
  symmetric_algorithms[]         // one or more entries
}

SymmetricAlgorithm {
  kdf_id                         // 16 bits
  aead_id                        // 16 bits
}

The actual wire format is binary, not JSON. Examples sometimes show JSON-like structures to make the values readable, but RFC 9458 defines the concrete encoding.

Key rotation and key_id

Each entry in the configuration has a key_id. When the client encapsulates a request, the resulting OHTTP message carries the key_id it encrypted to, so the gateway knows which private key to use for decryption. This is what makes rotation tractable: the gateway can serve a configuration containing the current key and the previous key for an overlap window, accept both during that window, and then retire the old one once clients have refreshed their cached configuration.

The key_id field is only 8 bits wide, and that is deliberate. A server that assigned different key IDs to different user populations could use those IDs to discriminate between clients before any request is even sent. The 8-bit field caps that at 256 distinguishable buckets. Antoine calls this out explicitly when discussing key management for confidential inferencing: widening the identifier would let a malicious or careless gateway embed per-user tracking directly in the key discovery step.

⚠️ Warning

Do not treat the OHTTP key configuration as a long-lived forever key. In a real service, key rotation and cache lifetimes are very important. Clients need to respect key configuration freshness, and gateways need a rotation strategy that does not break in-flight requests.

In practice, a client must:

  • Fetch the configuration over HTTPS from that .well-known location.
  • Respect cache lifetimes (the configuration response carries cache headers).
  • On a decryption failure, refetch the configuration before retrying, in case the gateway rotated unexpectedly.

On the operator side, rotation gets more interesting when it is coupled to a workload deployment. Antoine describes this in his Confidential Computing Summit talk: when you roll out a new gateway version, you cannot simply swap the key at the same time without dropping in-flight requests. The pattern he describes is:

  1. Build and deploy the new gateway workload alongside the old one.
  2. Create a new OHTTP key whose Secure Key Release policy authorizes both the old and the new deployment to receive it.
  3. Let the previous key’s validity window expire so clients naturally migrate to the new key.
  4. Gradually scale up the new deployment and scale down the old one; any node holding the new key can serve any request encrypted to it.
  5. Once the old deployment is fully drained, create a final key whose policy only authorizes the new deployment, and retire the transition key.

That overlap key is what makes the rollout zero-downtime: both generations of workload can decrypt requests during the transition window, and the policy boundary ensures neither side ever gets access to a key it should not hold.

What the relay sees at the HTTP layer

At the HTTP layer, when the relay terminates the TLS connection from the client and parses the request, it can see:

  • The client source IP (from the TCP connection)
  • The HTTP method and path it was sent to (typically a single /ohttp-gateway endpoint)
  • Standard transport headers (Host, User-Agent, Content-Type: message/ohttp-req, Content-Length)
  • The size of the encrypted body and the time it took to receive
  • Anything the client explicitly chose to put on the outer request

A compliant relay should treat that source identity as relay-local information. It should not forward the client’s source IP, copy it into Forwarded, or add fresh identifying headers that let the gateway reconstruct who the client was.

What the relay cannot see is everything inside the encapsulated bHTTP payload:

  • The inner method, path, and query string
  • The inner host header
  • The inner request body (prompt, JSON payload, multipart upload, whatever)
  • Application-specific headers carried inside the encapsulation: Authorization, Cookie, session tokens, X-CSRF-TOKEN, app-specific auth headers

That last point is worth spelling out, because at the end of the day you still need to be able to access your application. Anything you put on the inner request, which the client serializes into bHTTP before HPKE-sealing, is invisible to the relay. So if your inner request carries an authentication cookie or a CSRF token for the gateway-facing application, those values never reach the relay. They are sealed inside the encapsulation alongside the body. The relay can’t read them, can’t log them, and can’t strip or modify them without breaking the AEAD authentication tag. That property is crucial because it allows the gateway to authenticate the inner request without the relay ever seeing the credentials.

If you were tailing relay logs, the useful part is not what appears. It is what does not appear:

relay: received request from 203.0.113.24
relay: content-type=message/ohttp-req
relay: body=9f 7b 21 1c 4a ... 0e 51 2d 93
relay: forwarding encrypted payload to gateway ogw-inference.apps.thomasvanlaere.com

The relay log has the client’s IP, a content type, and an opaque body. Nothing about the inner request leaks out. A relay that plays by the rules keeps the client identity on its side of the boundary. The gateway sees the other side of that trade:

gateway: received request from relay privacy-relay.cloudflare.com
gateway: decrypted bHTTP request
gateway: method=POST path=/v1/responses
gateway: inner-header Authorization=Bearer <redacted>
gateway: forwarding request to inference.apps.thomasvanlaere.com

That is the trade. The relay has client metadata without plaintext. The gateway has plaintext (including inner auth headers) without the direct client network identity.

💡 Note

RFC 9458 Section 4 describes how the request and response are encrypted. The response encryption keys are derived from the same HPKE exchange associated with the request, so the client can decrypt the response without adding a separate key exchange round trip.

What about TLS?

You still use TLS, that does not change; the client talks to the relay over HTTPS. The relay talks to the gateway over HTTPS. OHTTP sits inside those HTTP exchanges as an encrypted message format. So you get different layers doing different jobs:

  • TLS protects each hop on the wire.
  • HPKE protects the encapsulated request and response from the relay (and from anyone else on the path).
  • The relay and gateway split identity metadata from plaintext content.

One subtle but important consequence of those two separate TLS hops: the client does not have a direct TLS session with the gateway. The relay terminates one TLS connection and opens another.

So if you are thinking ahead to attested TLS, the client cannot use it inline to authenticate the gateway on the OHTTP request path itself. Any client-side assurance about the gateway has to come from how the client obtained and validated the gateway’s OHTTP key configuration before sending the request.

A confidential computing layer can add a fourth boundary: protecting plaintext while it is being processed. We will come back to that when we get to the confidential AI inferencing section.

Where OHTTP meets confidential AI

Up until this point, this blog has mostly been about protocol mechanics: how OHTTP keeps the request hidden from the relay while keeping the client’s network identity hidden from the gateway.

After reading Microsoft’s blog on confidential inferencing, I got stuck on a different question: not how the request stays protected in transit, but “why the client should trust the gateway public key at all”.

My assumption going in was that attested TLS would be the logical next step for confidential compute workloads: the client opens a direct TLS connection to the TEE, the certificate carries attestation evidence, and you get transport security and workload verification in one session. Because the relay terminates the TLS connection, the client never has a direct session with the gateway, so that model does not apply.

As Microsoft points out, large-scale AI services rarely give you that direct connection in practice. They make use of public edge services, layer 7 routing, load balancing, abuse controls, authentication, and APIs to facilitate billing, all in front of the workload that actually runs the model. Once those pieces exist, “just use TLS end to end” is no longer possible.

OHTTP still keeps the prompt sealed while those outer services do their job, but the trust anchor moves. Instead of asking the client to attest the transport channel end to end, you encrypt the prompt to a public key that only an attested TEE is allowed to use. So you end up with attested OHTTP, instead.

💡 Note

Direct attested TLS to the inference TEE is the neat answer in theory, but Microsoft explicitly says it becomes impractical once you need smart layer 7 load balancing and elastic frontends with TLS termination at the load balancer. That does not make attested TLS wrong.

It just means it is not the mechanism the client can rely on for the OHTTP request path. Their answer, and I think the right mental model for this deployment shape, is application-level encryption for the prompt rather than pretending the client still has a direct transport channel to the TEE.

Confidential computing is still important because OHTTP only decides who can read the request on the way in. It does not say whether the workload that finally opens the request is the one the client intended, or what evidence says the current gateway key can only be used by that attested workload. That is where TEEs, attestation, and Secure Key Release enter the OHTTP story.

Why trust the gateway public key at all?

Fetching the key configuration over HTTPS tells you who served it. (With “who” I mean, the domain owner whose TLS certificate was validated by a CA.) It doesn’t tell you which workload is allowed to hold the matching private key, or what stops an arbitrary frontend from serving its own. Azure’s confidential inferencing deployment answers that second question like this:

  1. The client fetches the current OHTTP key configuration, including the HPKE public key, key_id, and supported ciphersuite.
  2. The client verifies evidence that this key came from the intended confidential key-management path rather than from an arbitrary frontend.
  3. The client verifies the policy binding, receipt, or equivalent evidence that says which attested gateway workloads are allowed to obtain the matching private key.
  4. When a gateway instance needs to decrypt a request, it attests its current state and requests the private key through Secure Key Release.
  5. Only if that policy matches does the key service release the wrapped HPKE private key, and only then can the gateway decapsulate the OHTTP request.

Neither attestation nor OHTTP does the other’s job. Attestation controls which workload gets the private key. OHTTP controls who sees the prompt on the way to that workload.

Worth noting before looking at the diagram: there are two distinct secure key releases happening here, not one. The first is the OHTTP private key; without it, the gateway cannot decapsulate the request at all. The second is the model key, or whatever credential the GPU-backed inference path needs to load protected artifacts. The release happens downstream, under its own separate policy. Same mechanism, but different gates.

The diagram below shows how these pieces fit together:

Azure Confidential Inferencing architecture

💡 Note

The diagram also hides part of the operational trust story. You can host this kind of inferencing stack on containerized infrastructure such as Kubernetes, but that introduces additional trust in cluster administrators, the control plane, ingress, and surrounding cloud services.

Confidential Containers can reduce some of that surface, but they are not a requirement for OHTTP itself. The core point here is that attestation and Secure Key Release constrain which workload inside the deployment is allowed to obtain the keys.

In 2024, I wrote about this subject as well, so feel free to check out that post.

Attested TLS or mutual TLS (mTLS) can still secure internal hops within the deployment. That is not what the client relies on here though. The client encrypts to a public key whose release is constrained by attestation. That is the trust anchor.

OHTTP in the wild

If you are implementing any of this, be sure to read “RFC 9458: Oblivious HTTP” itself. It is incredibly well written and should really be treated as the authoritative source. This post is a guided walkthrough and not quite a replacement for that spec.

Earlier I said OHTTP already shows up in a few high-profile privacy-sensitive systems. These are the public examples I meant, which are also very interesting to study:

  • Apple describes Private Cloud Compute as sending Apple Intelligence requests through a third-party OHTTP relay before they reach PCC infrastructure. Apple’s write-up is especially useful because it ties the relay story to node attestation, public key selection, and blinded credentials in the same architecture.
    • By the way, this is separate from Apple’s Safari-focused iCloud Private Relay service, which uses a MASQUE-based relay stack rather than OHTTP and should not be conflated with PCC’s OHTTP path.
  • Mozilla explains its use of OHTTP in Firefox as a way to separate who made a request from the request content for privacy-sensitive services such as address-bar suggestions and oblivious DNS over HTTPS.

Public deployment details outside those examples are uneven, but there is a very common pattern: OHTTP shows up where a service wants per-request privacy for high-value requests without turning the whole system into a long-lived anonymity network. So if you want to apply it yourself, use those triggers as a guide: high-value requests, privacy-sensitive content, and a desire to separate the request from the requester.

Practical but not free

OHTTP is not a silver bullet, as with any security technology, it can be part of a layered approach. However, as part of a confidential AI architecture, OHTTP can become a pretty practical layer. It helps answer a question that is easy to skip when focusing only on just the model serving and attestation: who gets to see both the caller and the prompt? It’s really an architectural choice, because I supose you could technically have some parts of the app use OHTTP and some parts not, but if you want that split, OHTTP is the standardized way to get it without building a custom encryption and routing layer.

It is important to realize that it does come with a real operational price. A credible end-to-end confidential computing stack has a lot of moving parts that can be pretty daunting to manage: relay boundaries, attestation policy, Secure Key Release, key rotation, workload identity, observability that does not leak data, and people who can keep all of that healthy during an outage. By pointing out that burden I do not mean to invalidate the design, but it absolutely affects whether the design is practical for a given team. Just a little something to take into consideration.

A lot of time went into building this technology, and it is really cool to see it get adopted. I hope this post helps you understand how it works and how to use it if you need it. If you have any questions or want to chat about OHTTP, confidential computing, spotted a mistake or anything else, feel free to reach out.

References

I’ll just leave you with a list of references for further reading. Some of these are RFCs, some are blog posts, and some are code repositories. They are all worth checking out if you want to go deeper.

Blogs, articles, and talks

Code Reference implementations