Prove to anyone what code is running on your server

Avatar of the author Willem Schots
18 Jun, 2026
~9 min.
RSS

If you’re a developer, you’ve probably got a few deployments under your belt.

After deploying something, have you ever doubted your own tools so much that you had to SSH into a system to check the “impossible” things you were seeing and make sure you weren’t going insane?

I definitely had to. Fun times.

Today we’re going to lean into this doubt and see how it’s possible to combine Confidential Computing and read-only OS images to prove exactly what code is running on a system. No SSH-ing required.

We won’t just do this for our own sanity, but also for others. It will allow us to potentially prove to anyone what code is running on the system.

(Which just happens to be relevant for the design I described in my previous post :))

Primer on Confidential Computing and TEE’s

Confidential Computing is about keeping your data and code protected while it’s in use. So not just encrypting it at rest, but while it’s actively processed and executed, often on hardware operated by someone else.

The central component that enables this protection is the Trusted Execution Environment (TEE). Conceptually you can think of such an environment as a hardware-isolated “space” in which you can run code.

The code running in this space is referred to as a workload. Some TEE’s support workloads as small as a process, but the dominant trend has been for these workloads to take the shape of entire VM’s. Which is what we’ll assume in this post.

The exact details depend on the specific TEE, but broadly speaking isolation is achieved by:

  • Building on top of existing hardware virtualization tech.
  • Hardware enforced limits on what a hypervisor can do with guest VM’s.
  • Encrypting instructions and data before they’re stored in memory.

The CPU is the security boundary of this isolation. If you could peek into a running CPU you would see the plaintext data and instructions being executed. This is not just theoretical, while SPECTRE-class attacks don’t literally read data from the CPU, they can infer it via microarchitectural side channels.

Depending on the use-case or threat-model these issues need to be taken into consideration.

Diagram showing the isolation between hypervisor and TEE's
Diagram detailing how hypervisor and TEE's relate but all run as plaintext data and instructions on the CPU.

Even if this isolation were perfect, it wouldn’t be enough. The hypervisor is responsible for setting up the initial workload, it could modify it before it’s loaded.

For example, a malicious hypervisor could modify a workload so that all data is submitted to an endpoint under its control. No amount of isolation will help if the workload itself is malicious.

In other words, the isolation doesn’t ensure the provenance of the workload.

This is where attestation comes in. A TEE can measure the code inside of it, which results in a hash called the measurement. The exact details depend on the TEE, but all of them can provide a measurement of their initial code.

This measurement is signed by the TEE in the form of an attestation report. The keys used to sign this report trace back to the TEE vendor. The assumption is that the hardware operator can’t fake them.

Anyone depending on the workload can now verify its provenance by checking that:

  • The attestation report was signed by the expected TEE vendor’s public key.
  • The measurement inside the report corresponds to the expected hash value of a specific workload.

That’s the idea. In reality, and when running in the cloud you might still need to trust the cloud provider.

For example, the way SEV-SNP (a TEE for VM-type workloads) boots on Google Cloud Confidential VM’s only allows for the TEE to measure the firmware. Measurement of the OS is done by a separate component, the Virtual Trusted Platform Module (vTPM), which is provided by Google’s hypervisor.

The hypervisor and vTPM aren’t part of the TEE, so they require trusting Google not to tamper with the vTPM measurements.

Whether this is an issue depends on your threat-model. If you want to cut out the cloud provider you will likely need to go bare-metal.

Measured boot and read-only OS images

To re-iterate, we just discussed how a TEE can be used to provision a report that once verified, proves it began its life with specific code inside it.

For the VM-style TEE’s, the code involves an operating system image and there is some method by which you can measure the boot process as it transitions between stages. These measurements are extended into specific append-only registers to create a chain of evidence. This process is referred to as “measured boot”.

Some TEE’s can do measured boot directly, but in practice a Trusted Platform Module (TPM) is often used. The measured boot ecosystem is more or less built around the TPM, as that’s where it originated.

With a measured-boot the TEE and/or TPM can provide attestation reports containing measurements of the firmware, initrd, kernel and kernel command line.

As discussed earlier, trusting a vTPM may require trusting the (cloud) provider.

These reports tell you exactly which kernel ran and how it was configured, but they stop after the kernel command line. Everything that happens next, the root filesystem mounted by the kernel and any user binaries it executes aren’t measured.

You can show what kernel booted, but someone could still swap out the files on-disk.

Protecting the root filesystem with dm-verity closes this gap.

dm-verity operates on a read-only filesystem and verifies its integrity against a merkle-tree of block hashes. Verification happens on-demand as blocks are read and compared to the merkle-tree. dm-verity protects every read operation.

The merkle-tree-of-hashes cleanly resolves to a single root hash.

This root hash is precomputed at build-time, included in the kernel command line and enforced by the kernel. The kernel will fail to run if the filesystem it booted with doesn’t match the build-time root hash.

The kernel command line is a measurement included in the attestation reports, so we can externally identify the root filesystem that was used to boot the system.

Reproducible builds

Now that we have a way to identify the root filesystem, we still have to show that building our source code results in a specific root filesystem hash.

Welcome to software supply chain security and provenance!

Unfortunately, this is not something that can necessarily be verified directly at runtime, as in the best case that would require the verifier to have access to the source code and rebuild the root filesystem for each hash it wants to verify.

Source code and hash are usually linked via a deferred claim. The verifier accepts a signed statement from the build system that contains such a claim and verifies that the signature matches a known key.

A signed statement like this usually also includes metadata related to the build, making it possible for a verifier to, for example, only allow builds from the main branch.

To make the claim independently verifiable, the build needs to be reproducible. This way anyone with access to the source code can verify that it results in the claimed dm-verity root hash.

Systems like NixOS and languages like Go and Rust really shine here, they make it relatively straightforward to set up reproducible builds.

There’s still the risk of malicious builds themselves. If an attacker can tamper or trick the build infrastructure, it could possibly create a valid build statement without anyone noticing.

To make it impossible to hide such malicious build statements, it’s common practice to publish all build statements to an append-only transparency log like Sigstore. A verifier can then make this a requirement and require a signed proof from the transparency log that a particular build statement has been included.

This mechanism won’t prevent build-time attacks, but it will make them detectable.

Verification steps

Now let’s say we want to verify a system that uses all the above. It runs inside a TEE, has a TPM and has its builds published to a transparency log.

How would a verifier actually verify this system?

The system produces evidence that consists of:

  • (build-time) BuildStatement: Contains a reference to the source code and expected boot measurements (including the dm-verity root hash).
  • (build-time) LogProof: Transparency log proof-of-inclusion, references BuildStatement.
  • (runtime) TPMReport: Contains boot measurements (including the dm-verity root hash).
  • (runtime) TEEReport: The TEE Attestation report containing initial TEE contents, references the TPMReport.

Various certificates (public keys) are required to verify the authenticity of evidence: BuildCert, TransparencyLogCert, TEEVendorCert and TPMVendorCert.

Diagram showing how the different pieces of evidence are connected
How the different pieces of evidence and certificates are connected.

The verification process looks as follows:

  • Establish the provenance of the build:
    1. BuildStatement is signed by BuildCert.
    2. LogProof is signed by TransparencyLogCert.
    3. LogProof references the BuildStatement that was provided.
    4. BuildStatement metadata matches our expectation (main builds only etc).
    5. We now know we have a genuine and traceable build.
  • Verify the VM is running on confidential computing hardware:
    1. TEEReport is signed by TEEVendorCert.
    2. TPMReport is signed by TPMVendorCert.
    3. TPMReport matches reference in TEEReport.
    4. TPMReport boot measurements are self-consistent.
    5. We now know we have a genuine TEE+TPM.
  • Verify the VM booted our build by checking that the boot measurements in TPMReport match the expected boot measurement in BuildStatement.

If these checks passed, we now know exactly what build the VM booted with, and since our OS is effectively read-only, we can assume it’s in an expected state.

Now, vulnerabilities in a particular build might invalidate that assumption. But this then becomes a matter of locking down the OS image. Additionally it allows for a structured way of excluding known-bad builds at the verifier level.

It’s also possible to keep extending registers in the TPM beyond the boot measurements, but I didn’t want to make this post more complicated than it already was :)

Conclusion

I hope this post has given you an idea of how Confidential Computing can be used to, with certain assumptions, prove what code is running on a remote system.

Most importantly I hope it has given you a somewhat useful mental model. A lot of the existing material either dives off the deep end, or is so marketing-oriented that it’s useless from a technical perspective.

I’m currently developing a system that uses the approach in this post at its base layer. The system will be extended to run workloads and ensure that they follow specific compliance/tech related policies. See my previous post for the general idea.

If you’re working for an organization in Digital Health and interested in trying this, or just want to compare notes on attestation, I’d love to hear from you.

It's getting crazy out there*
Let's cool down a bit?

Join 800+ icecold subscribers

*Political, social and economical trust keeps eroding. AI is adding non-deterministic fuel to the fire.

I'm currently building attested.systems, a way to make systems verifiable by humans and machines. Sharing what I learn along the way.

Avatar of the author
Willem Schots
Edvard Munch, Public domain, via Wikimedia Commons

Hi! I'm the Willem behind willem.dev

I initially created this website to help developers learn Go, but I'm currently working a project that allows humans and machines to verify remote systems.

You can follow me on Bluesky or LinkedIn.

Happy that you're here and thanks for reading! :)