Operational Runbooks

Step-by-step procedures for deploying, verifying, and rolling out KMS and node images

3
runbooks
6
decision gates
3
profiles
2
platforms

Phala KMS Runbook

Deploy and verify mero-kms-phala on Phala Cloud CVMs. Covers release verification, digest-pinned deployment, merod attestation configuration, and runtime health checks.

Prerequisites
  • GitHub release access to calimero-network/mero-tee
  • Phala Cloud account with CVM provisioning permissions
  • CLI tools: jq, curl, sha256sum, docker
  • Local clone of the mero-tee repository with verify scripts
  • Target KMS version tag (e.g. v0.3.0)
  • Target profile: debug, debug-read-only, or locked-read-only
Step 1 — Verify Release Assets

Before deploying anything, verify the integrity and completeness of the release.

1.1

Run the verification script

# From the mero-tee repo root
./scripts/release/verify-release-assets.sh v0.3.0

This script downloads the release artifacts, checks SHA-256 checksums, verifies cosign signatures, validates the SBOM, and confirms that the attestation policy JSON is well-formed.

1.2

Confirm expected artifacts

The release must include:

  • kms-phala-attestation-policy.{profile}.json — attestation policy for the target profile
  • checksums.txt — SHA-256 hashes of all artifacts
  • checksums.txt.sig — cosign signature over checksums
  • sbom.spdx.json — software bill of materials
  • Docker image digest in the release notes
1.3

Record the image digest

Extract the digest-pinned image reference from the release notes. You will use this in the next step. Format: ghcr.io/calimero-network/mero-kms-phala@sha256:abc123...

Step 2 — Deploy Digest-Pinned Image

Deploy using the exact digest from the verified release. Never use mutable tags in production.

2.1

Docker Compose configuration

# docker-compose.yml for Phala Cloud CVM
version: "3.8"
services:
  kms:
    image: ghcr.io/calimero-network/mero-kms-phala@sha256:<DIGEST>
    environment:
      - MERO_KMS_VERSION=v0.3.0
      - MERO_KMS_PROFILE=locked-read-only
      - DSTACK_SOCKET_PATH=/var/run/dstack.sock
      - ENFORCE_MEASUREMENT_POLICY=true
    volumes:
      - /var/run/dstack.sock:/var/run/dstack.sock
    ports:
      - "8080:8080"
2.2

Required environment variables

MERO_KMS_VERSION
Must match the release tag. Used to fetch the correct attestation policy from GitHub.
MERO_KMS_PROFILE
Must match the profile of the nodes this KMS instance serves. Determines which policy variant is loaded.
DSTACK_SOCKET_PATH
Path to the dstack Unix domain socket. Must be volume-mounted from the CVM host.
ENFORCE_MEASUREMENT_POLICY
Must be true in production. When false, attestation quotes are not validated.
MERO_KMS_POLICY_SHA256
Optional but recommended. Pin the expected SHA-256 of the policy file to prevent supply-chain attacks.
2.3

Deploy to Phala Cloud

Upload the Compose file to the Phala Cloud CVM dashboard or deploy via CLI. Wait for the container to reach Running state.

Step 3 — Configure merod Attestation

After KMS is running, configure merod nodes to use this KMS for attestation and key release.

3.1

Generate merod KMS configuration

# Generate config pointing merod at the new KMS endpoint
./scripts/generate-merod-kms-config.sh \
  --kms-url https://<kms-host>:8080 \
  --profile locked-read-only \
  --output merod-kms.json
3.2

Apply configuration to merod

# Apply the KMS config to a running merod node
./scripts/apply-merod-kms-config.sh \
  --config merod-kms.json \
  --node <node-id>
3.3

Profile compatibility

The KMS enforces profile-scoped policy. A KMS instance configured with MERO_KMS_PROFILE=locked-read-only will only release keys to nodes whose RTMR3 measurement matches the locked-read-only profile. This ensures cohort separation: debug nodes cannot obtain keys from a production KMS.

Step 4 — Runtime Checks

Verify the deployed KMS is healthy and producing valid attestation quotes.

4.1

Health check

curl -s https://<kms-host>:8080/health | jq .
# Expected: {"status":"ok","version":"v0.3.0","profile":"locked-read-only"}
4.2

Attestation endpoint

curl -s -X POST https://<kms-host>:8080/attest | jq .
# Returns TDX quote from the KMS enclave for mutual verification

Verify the returned quote against expected measurements. The quote should contain MRTD and RTMR values matching the published policy for this version.

4.3

Quote verification

# Verify the attestation quote using Intel Trust Authority
./scripts/verify-kms-quote.sh --url https://<kms-host>:8080

The script fetches POST /attest, extracts the TDX quote, and submits it to ITA for appraisal. A successful result confirms the KMS is running genuine TEE code.

Common Mistakes
Wrong profile

Setting MERO_KMS_PROFILE to a profile that doesn’t match the node cohort. Nodes will fail key release with a policy mismatch error. Always confirm the profile matches between KMS and the nodes it serves.

Stale policy

Deploying a new KMS binary but using an old MERO_KMS_VERSION tag. The fetched policy won’t include measurements for the new binary. Always bump the version tag alongside the image.

Missing environment variables

Omitting MERO_KMS_VERSION or MERO_KMS_PROFILE causes the KMS to fail at startup. These are required variables with no defaults.

Mutable image tag

Using :latest or :v0.3.0 instead of a digest-pinned reference. Tags can be overwritten; digests are immutable. Always use @sha256:....

dstack socket not mounted

Forgetting to volume-mount /var/run/dstack.sock into the container. Without it, key derivation and quote generation fail with a connection error.

GCP Node Image Runbook

Deploy and verify Calimero node images on GCP TDX Confidential VMs. Covers release verification, image pinning, runtime measurement validation, and MDMA comparison.

What Is Released

Each mero-tee-vX.Y.Z release includes the following artifacts:

published-mrtds.json
Expected MRTD and RTMR values for each image profile. Used to verify runtime measurements of deployed VMs.
checksums.txt
SHA-256 hashes of all release artifacts including the MRTD file, provenance records, and SBOM.
checksums.txt.sig
Cosign signature over the checksums file. Verifies artifact integrity and publisher identity.
sbom.spdx.json
Software Bill of Materials listing all dependencies baked into the image.
provenance.intoto.jsonl
SLSA provenance attestation recording the build inputs and environment.
GCP image family/version
The image family name and version number published to the GCP project. Referenced in instance templates.
Step 1 — Verify Release Assets
1.1

Run the verification script

./scripts/release/verify-release-assets.sh mero-tee-v0.3.0

Validates checksums, signatures, SBOM structure, and confirms published-mrtds.json is well-formed and contains entries for all three profiles.

1.2

Inspect the MRTD file

jq . published-mrtds.json
# Expect: per-profile MRTD + RTMR0-3 values

Each profile entry contains the expected mrtd, rtmr0, rtmr1, rtmr2, and rtmr3 hex values. These are the measurements your deployed VMs must match.

Step 2 — Deploy Pinned Image
2.1

GCP instance with image family/version pin

gcloud compute instances create merod-prod-01 \
  --zone=us-central1-a \
  --machine-type=n2d-standard-4 \
  --confidential-compute-type=TDX \
  --image-family=mero-tee-locked-read-only \
  --image-project=calimero-prod \
  --metadata=MERO_TEE_VERSION=v0.3.0

For maximum reproducibility, pin to a specific image version instead of family:

--image=mero-tee-locked-read-only-v0-3-0 \
--image-project=calimero-prod
2.2

Baked merod binary

The GCP image includes the merod binary baked at build time by the Packer + Ansible pipeline. The binary version is fixed and cannot be changed without rebuilding the image. This ensures the binary hash is part of the measured MRTD.

2.3

MERO_TEE_VERSION metadata

The MERO_TEE_VERSION GCP instance metadata key is optional but recommended. It allows operators and automation to identify which release an instance was created from without inspecting the image itself. It has no effect on measurements or attestation.

Step 3 — Verify Runtime Measurements
3.1

Extract runtime measurements

# SSH into the instance and extract TDX measurements
gcloud compute ssh merod-prod-01 -- \
  "cat /sys/kernel/security/tdx/measurements"
3.2

Compare against published-mrtds.json

# Compare runtime values with expected values
./scripts/verify-runtime-mrtds.sh \
  --instance merod-prod-01 \
  --expected published-mrtds.json \
  --profile locked-read-only

The script compares MRTD and all four RTMR registers. Any mismatch indicates the VM is not running the expected code.

3.3

MDMA vs release probe comparison

The MDMA (Measurement Data Management Agent) collects measurements from running instances. Compare MDMA-reported values against the release probe values stored in published-mrtds.json. Discrepancies indicate either a different image or unexpected runtime modification.

# Fetch MDMA measurements and compare
./scripts/compare-mdma-vs-release.sh \
  --instance merod-prod-01 \
  --release mero-tee-v0.3.0
3.4

Interaction with core tee-mode

The calimero/core merod binary reads the --tee-mode flag at startup. When set, merod performs a KMS challenge-response to obtain its storage encryption key. The node image profile must match the KMS profile for the key release to succeed. The core tee-mode is configured in the calimero/core repository and is independent of the image build.

Common Mistakes
Wrong image family

Deploying with --image-family=mero-tee-debug when you intended mero-tee-locked-read-only. The RTMR3 measurement will differ, and the production KMS will reject the node.

Profile mismatch

The node image profile and the KMS MERO_KMS_PROFILE must agree. A debug node connecting to a locked-read-only KMS will fail key release with a policy mismatch.

Stale MRTDs

Verifying against a published-mrtds.json from an older release. Always download the MRTD file corresponding to the exact image version you deployed.

Image family resolution drift

Image families resolve to the latest version. If a new image is published between planning and deployment, the resolved image may differ. Use explicit --image= for reproducible deployments.

Forgetting confidential-compute-type

Omitting --confidential-compute-type=TDX creates a standard VM without TDX. The node will not produce valid attestation quotes and will fail KMS challenges.

Blue-Green KMS Rollout

Safe, gated rollout of a new KMS version paired with updated node images. Six decision gates ensure correctness at each stage with rollback branches.

Scope & Inputs

This runbook applies when rolling out a new KMS version that may also require new node images with updated measurements.

Required Inputs

New KMS tag
The mero-tee release tag for the new KMS version (e.g. v0.4.0).
New node-image tag
The mero-tee release tag for the new node image, if changed. May be the same as the KMS tag for coupled releases.
Target profile
The profile being rolled out: debug, debug-read-only, or locked-read-only.
Decision Tree (6 Gates)

Each gate is a go/no-go checkpoint. Failing a gate triggers the associated rollback procedure.

D1

Verify KMS Release Assets

Run verify-release-assets.sh on the new KMS tag. Confirm checksums, signatures, SBOM, and policy files are present and valid.

✓ Pass → proceed to D2
✗ Fail → abort rollout, report to release team
D2

Verify Node Image Release Assets

Run verify-release-assets.sh on the node-image tag. Confirm published-mrtds.json, checksums, and provenance are valid.

✓ Pass → proceed to D3
✗ Fail → abort rollout, report to release team
D3

Deploy Staging KMS

Deploy the new KMS image to the staging environment using the digest-pinned reference. Configure with MERO_KMS_VERSION set to the new tag and the target profile.

✓ Pass (healthy + correct version) → proceed to D4
✗ Fail → tear down staging KMS, investigate logs
D4

Probe Staging KMS

Run the staging probe workflow (kms-phala-staging-probe.yaml). Verify attestation quotes, extract measurements, and compare against the expected policy. Confirm a staging node can complete the full challenge → get-key flow.

✓ Pass (probe green) → proceed to D5
✗ Fail → tear down staging KMS, analyze probe output, check measurement candidates
D5

Promote Policy

Open a PR promoting the staging probe’s measurement candidates to the release policy files. Require two approvals from the security team. Merge only after review.

✓ Pass (PR merged) → proceed to D6
✗ Fail (review rejected) → investigate measurement discrepancies, re-probe if needed
D6

Deploy Production

Deploy the new KMS to production with the merged policy. Perform a production probe: health check, attestation verification, and a canary key-release from a production node. Monitor for 30 minutes.

✓ Pass (production stable) → rollout complete
✗ Fail → execute production rollback (below)
Per-Gate Runbook Steps

D1 & D2 — Asset Verification

1

Download release artifacts

Use gh release download to fetch all assets for both the KMS and node-image tags into separate directories.

2

Verify checksums and signatures

Run verify-release-assets.sh in each directory. The script exits non-zero on any failure.

3

Spot-check policy contents

Open the attestation policy JSON and confirm the target profile entry exists with non-empty MRTD and RTMR values.

D3 — Staging Deploy

4

Deploy staging Compose

Apply the Docker Compose file from the Phala KMS runbook, pointed at the staging CVM. Use the new digest-pinned image.

5

Confirm health

Poll GET /health until it returns the expected version and profile. Timeout after 120 seconds.

D4 — Staging Probe

6

Trigger probe workflow

Run kms-phala-staging-probe.yaml via GitHub Actions or manually. The probe deploys a test node, performs a full challenge → get-key cycle, and extracts measurement candidates.

7

Validate probe output

Confirm the probe outputs match the published MRTDs. Any discrepancy blocks progression.

D5 — Policy Promotion

8

Create policy PR

The probe can auto-generate a PR updating kms-phala-attestation-policy.{profile}.json. If manual, copy measurement candidates into the policy file and open the PR.

9

Review and merge

Two security-team approvals required. Verify the diff contains only expected measurement changes.

D6 — Production Deploy

10

Deploy to production

Update the production Compose file with the new digest and version. Rolling restart the KMS container.

11

Canary key-release

Verify at least one production node can complete a challenge → get-key cycle with the new KMS.

12

Monitor (30 min soak)

Watch health endpoint, error rates, and key-release latency. No elevated errors for 30 minutes confirms stable rollout.

Rollback Procedures

Each gate has a defined rollback path. Rollbacks are safe because the old KMS and policy remain in place until D6 completes.

D1/D2 Rollback: No deployment has occurred. Simply abort the rollout and investigate the release artifacts.

D3 Rollback: Tear down the staging KMS container. No production impact.

D4 Rollback: Tear down the staging KMS. Discard measurement candidates. Re-run the probe after fixing the issue.

D5 Rollback: Close the policy PR without merging. The existing production policy remains unchanged.

D6 Rollback: Revert the production Compose file to the previous digest-pinned image. Restart the KMS container. Revert the policy file PR if it was the cause. Verify production nodes can still obtain keys.

Guardrails
  • Never skip verification: D1 and D2 must pass before any deployment. Skipping them risks deploying tampered artifacts.
  • Always probe staging first: D4 catches measurement mismatches, configuration errors, and policy incompatibilities in a safe environment.
  • Policy promotion requires review: D5 is a human gate. Automated promotion is blocked; the PR requires manual security review.
  • Production is last: D6 only executes after all other gates pass. Production KMS stays on the old version until the final step.
  • Rollback before retry: If a gate fails after partial deployment, complete the rollback before re-attempting.
  • Version coupling: If the KMS and node image are released together, both must pass verification (D1 + D2) before either is deployed.