Storage Layer

calimero-store + calimero-store-rocksdb + calimero-dag + calimero-storage

10
column families
26+
group key prefixes
4
crates
AES-GCM
optional encryption

Purpose

The storage layer provides a column-family key-value abstraction over RocksDB. The core Database trait exposes has, get, put, delete, iter, and apply(Transaction). Keys are typed via generic_array, giving compile-time guarantees on key size and column assignment. An optional AES-GCM encryption layer transparently encrypts values at rest.

pub trait Database {
    fn has(&self, col: Column, key: &[u8]) -> Result<bool>;
    fn get(&self, col: Column, key: &[u8]) -> Result<Option<Slice>>;
    fn put(&self, col: Column, key: &[u8], value: &[u8]) -> Result<()>;
    fn delete(&self, col: Column, key: &[u8]) -> Result<()>;
    fn iter(&self, col: Column) -> Result<DBIterator>;
    fn apply(&self, tx: Transaction) -> Result<()>;
}

Column Architecture

All persistent data is partitioned into 11 column families. Each maps to a dedicated RocksDB column family with independent compaction and bloom filters. The Group column is the most complex, containing 26+ logical key prefixes for governance state including namespace identity, namespace governance ops, group hierarchy, and group encryption keys.

Column Families (10 total) Meta schema version, node id Config node configuration Identity keypairs, public keys State shared context state KV PrivateState per-node private KV Delta causal deltas (DAG) Blobs binary content store Application WASM binaries, metadata Alias human-readable aliases Generic uncategorized data Group Column — Prefix-Partitioned Keys Each prefix isolates a logical namespace within the single Group column family group::info group metadata group::member members group::role role assignments group::cap capabilities group::alias member aliases group::context linked contexts group::invite invitations group::signing_key keys group::parent_ref hierarchy group::child_index tree group::oplog operation log group::oplog_head DAG heads group::state_hash root hash group::nonce monotonic group::settings config group::upgrade app upgrades group::epoch governance epoch group::sync_state sync group::pending_op queued group::applied_op applied
Membership
Resources
OpLog / Hashing
Lifecycle

Informal group::… labels in the diagram map to the typed keys and single-byte prefixes in Storage Schema (Group column): group::infoGroupMeta (0x20); group::memberGroupMember (0x21); group::role → role data on GroupMember values (0x21); group::contextGroupContextIndex (0x22) / reverse index ContextGroupRef (0x23); group::upgradeGroupUpgradeKey (0x24); group::signing_keyGroupSigningKey (0x25); group::capGroupMemberCapability (0x26); group::settings (defaults) → GroupDefaultCaps (0x29); migration markers → GroupContextLastMigration (0x2B); group::nonceGroupLocalGovNonce (0x2C); group::alias (member) → GroupMemberAlias (0x2D), group/context names → GroupAlias (0x2E), GroupContextAlias (0x2F); group::oplogGroupOpLog (0x30); group::oplog_headGroupOpHead (0x31); member–context links → GroupMemberContext (0x32), GroupContextMemberCap (0x33). Some cells (e.g. group::invite, group::state_hash) are illustrative and do not match a single prefix—use the schema tables as the source of truth.

Key Model

All keys are statically typed via Key<T>, a newtype over GenericArray<u8, T::Size>. The AsKeyParts / FromKeyParts traits define how a key is decomposed into column + byte components, giving compile-time column assignment.

pub struct Key<T: KeyParts>(GenericArray<u8, T::Size>);

pub trait AsKeyParts {
    type Size: ArrayLength<u8>;
    const COLUMN: Column;
    fn as_key(&self) -> Key<Self>;
}

pub trait FromKeyParts: AsKeyParts {
    fn from_key(key: Key<Self>) -> Self;
}

Key Types by Column

Meta
SchemaVersionKey, NodeIdKey — fixed singleton keys
Config
NodeConfigKey — serialized node configuration
Identity
ContextIdentityKey(ContextId) — maps context → keypair
State
ContextStateKey(ContextId, [u8]) — shared KV scoped to context
PrivateState
PrivateStateKey(ContextId, [u8]) — private KV (never synced)
Delta
DeltaKey(ContextId, DeltaId) — stores CausalDelta payload
Blobs
BlobKey(BlobId) — content-addressed binary data
Application
ApplicationKey(ApplicationId), ApplicationBlobKey — WASM binary + metadata
Alias
AliasKey(scope, name) — human-readable name → id mapping
Group
GroupInfoKey, GroupMemberKey, GroupRoleKey, GroupCapKey, GroupOpLogKey, etc. — prefix-partitioned (see above)

calimero-dag

A generic, in-memory causal DAG used for both context state deltas and governance operation logs. Provides topological ordering, pending queues for out-of-order delivery, and missing-parent detection for catch-up.

crates/dag

CausalDelta<T>

pub struct CausalDelta<T> {
    id: DeltaId,
    parents: Vec<DeltaId>,
    payload: T,
    timestamp: HLC,
    expected_root_hash: Hash,
}

Each delta records its causal parents, forming a partial order. The expected_root_hash enables fast consistency checks — a peer can verify it arrived at the same state after applying a delta.

DagStore<T>

pub struct DagStore<T> {
    deltas: HashMap<DeltaId, CausalDelta<T>>,
    applied: HashSet<DeltaId>,
    pending: Vec<CausalDelta<T>>,
    heads: HashSet<DeltaId>,
}

Tracks all known deltas, which have been applied, which are pending (missing parents), and the current DAG head set. New deltas promote heads automatically.

DeltaApplier<T> trait

pub trait DeltaApplier<T> {
    fn apply_delta(&mut self, delta: &CausalDelta<T>) -> Result<()>;
    fn restore_applied_delta(&mut self, delta: &CausalDelta<T>) -> Result<()>;
}

Key Operations

1

Topological Ordering

Before applying, all pending deltas are sorted in topological order (parents before children). This ensures deterministic replay regardless of arrival order.

2

Pending Queue

If a delta's parents haven't been seen yet, it enters the pending queue. When missing parents arrive, queued deltas are automatically drained and applied.

3

restore_applied_delta

Used during node restart to rebuild the in-memory DAG from persisted deltas without re-executing the payload (state is already in storage).

4

get_missing_parents

Returns the set of delta IDs referenced as parents but not yet received. Used by the sync protocol to request specific deltas from peers.

calimero-storage

Provides CRDT collections used by the WASM runtime for conflict-free replicated state. Each collection implements the Mergeable trait for automatic conflict resolution during sync.

crates/storage

UnorderedMap

Observed-remove map. Concurrent puts to the same key are resolved by LWW using HybridTimestamp. Deletions are tracked as tombstones until causally stable.

UnorderedSet

Observed-remove set. Add/remove conflicts resolved in favor of add (add-wins semantics). Internally backed by an UnorderedMap with unit values.

LwwRegister

Last-writer-wins register. Stores a single value with a HybridTimestamp. On merge, the value with the highest timestamp wins.

Core Traits

pub struct HybridTimestamp(Timestamp); // from uhlc: NTP64 wall-clock + 128-bit node ID tiebreaker

pub trait Mergeable {
    fn merge(&mut self, other: &Self) -> Result<(), MergeError>;
}

The Mergeable trait is the fundamental building block — any type that implements it can be used as a CRDT value in the storage layer. The runtime's host functions call merge when applying remote deltas.

RocksDB Implementation

The calimero-store-rocksdb crate provides the concrete Database implementation backed by RocksDB.

crates/store/rocksdb

Column Family Mapping

Each Column enum variant maps 1:1 to a RocksDB column family. CFs are created at DB open time. Each has independent compaction, bloom filters (10 bits/key), and block cache partitions.

WriteBatch Transactions

The Transaction type accumulates puts and deletes, then is atomically committed via WriteBatch. Guarantees all-or-nothing semantics for multi-key operations like delta application.

Snapshot Iteration

Iterators are backed by RocksDB snapshots for consistent point-in-time reads. Prefix iteration uses set_iterate_range for efficient scans within a column family.

Pinned Gets

Uses get_pinned_cf for zero-copy reads where possible. The returned Slice borrows directly from the block cache, avoiding allocation for large values.

CRDT Collections

The calimero-storage crate provides application-level CRDT collections built on top of the storage layer. These are what SDK applications use for state management.

Available Collections

UnorderedMap

Key-value store with LWW (Last-Write-Wins) semantics per entry. Entries can be inserted, updated, and removed independently across nodes.

Vector

Ordered append-only list. Items are pushed to the end. Positional inserts and removes use index-based CRDT logic.

Counter

Generic over ALLOW_DECREMENT: bool. Default Counter<false> (alias GCounter) is grow-only: each node increments its own slot and value() returns the sum across all nodes. Counter<true> (alias PNCounter) layers a second per-node map to also support decrement. Both variants are commutative and idempotent.

Storage Primitives

Beyond the public CRDT collections above (UnorderedMap, Vector, UnorderedSet, LwwRegister, Counter) the storage layer provides wrappers that constrain who can write what. Three flavours of constraint:

  • Signature-based (UserStorage, SharedStorage, AuthoredMap, AuthoredVector) — the context manager signs each mutating action with the executor's private key (sign_authorized_actions in crates/context/src/handlers/execute/mod.rs) after WASM returns its outcome. Peers verify the signature at merge time in Interface::apply_action in crates/storage/src/interface.rs using the runtime's ed25519_verify host function and reject the action with InvalidSignature on mismatch.
  • Structural (FrozenStorage) — no per-identity signature; immutability after first write is enforced by content-addressing. Once the SHA-256 hash → value mapping is published, attempts to overwrite the same hash are rejected.
  • None (the public collections) — anyone in the context can write, update, or remove any entry. Listed below for contrast.

Comparison

Primitive Storage stamp Keyspace Writes new Mutates existing Reads
UnorderedMap<K,V>
(public baseline — no auth)
K → V anyone anyone everyone
UserStorage<T> User { owner = executor_id } PublicKey → T, disjoint per-user slots only into your own slot only your own slot everyone (get_for_user)
FrozenStorage<T>
(structural — no signature check)
Frozen Hash(value) → T, content-addressed anyone nobody — immutable everyone
SharedStorage<T> Shared { writers, frozen } one slot with T (T often a nested map) signer ∈ writers signer ∈ writers; writers rotatable if !frozen everyone
AuthoredMap<K,V> User { owner } per-entry shared keyspace, K → V anyone (becomes owner) only owner everyone
AuthoredVector<V> User { owner } per-entry shared sequence, index → V anyone (push, becomes owner) only owner (update / tombstone) everyone

Picking one

what guarantee do you need? ├── none, public UnorderedMap / Vector / UnorderedSet / LwwRegister / Counter ├── immutable, content-addressed FrozenStorage<T> └── identity-bound writes ↓ ├── one slot per user, disjoint UserStorage<T> ├── one shared slot, named writer set SharedStorage<T> ├── shared keyspace, per-entry author AuthoredMap<K,V> └── shared sequence, per-entry author AuthoredVector<V>

UserStorage<T>

Per-user slot keyed by PublicKey. The executor can only write into their own slot (env::executor_id()); reads are unrestricted. Internally an UnorderedMap<PublicKey, T>.

SharedStorage<T>

A single value writable by any signer in a mutable writers set. Any current writer can rotate the set unless it is frozen. The context manager signs each write after WASM returns its outcome; peers verify the signature against the stored writer set at merge time. See ADR 0001 for the rotation-during-concurrent-write contract.

FrozenStorage<T>

Content-addressable immutable storage. insert returns a SHA-256 hash; reads are by hash. Same value always yields the same hash, and entries cannot be updated once written. Internally an UnorderedMap<Hash, FrozenValue<T>> with first-write-wins semantics.

AuthoredMap<K, V>

Shared keyspace map with per-entry ownership. Any member can insert a new key; only the inserter can update or remove their own entries. Each entry carries a StorageType::User { owner } stamp set from the executor's public key at insert time.

AuthoredVector<V>

Ordered shared vector with per-entry ownership. Any member can push; only the pusher can update or tombstone their entry. There is no physical remove — shifting indices would complicate concurrent-push merge semantics. Use tombstone(idx) to retract a slot.

Merge-time enforcement (signature-based primitives)

Local update/remove calls short-circuit non-owner attempts so bugs surface in-process. The load-bearing check happens at merge time in Interface::apply_action:

  • Signature — the runtime's ed25519_verify host function is called against the entry's stored owner (or, for SharedStorage, against the current writer set). Mismatch returns InvalidSignature.
  • Replay nonce — incoming nonce must be strictly greater than the stored value; equal nonces are rejected as NonceReplay (i.e. the comparison is incoming > stored, not ). The nonce is the action's SignatureData.nonce, set from env::time_now() (a wall-clock nanosecond timestamp from SystemTime::now()) at action build time. Caveat: wall-clock time is not strictly monotonic across NTP slews, leap-second corrections, VM-host clock changes, or process restarts. Two consecutive writes on the same node can collide or step backward under those conditions, in which case the second write is rejected as NonceReplay; conversely a node with a far-future clock can write a nonce that effectively locks the entity until later honest writes catch up. Per the inline comment in interface.rs, the nonce check is itself a transitional v2 mechanism that the project plans to retire after a soak period in favour of DAG-causal verification.
  • Per-entity scope — the stored nonce lives on the individual storage entity (the row keyed by entity id), not on the owner globally. For AuthoredMap each map key is its own entity; for UserStorage each per-user slot; for SharedStorage the single value.
  • Entity binding — a signed action targeting one entity cannot be replayed against a different entity even when the same key signed both: the entity id is part of the bytes hashed by Action::payload_for_signing(), so the signature is bound to that specific entity.

Key distinctions

UserStorage vs AuthoredMap

Both stamp entries with StorageType::User { owner }. The difference is keying:

  • UserStorage<T>: key is the public key, slots are disjoint per user.
  • AuthoredMap<K,V>: key is application-defined, owner is recorded in the entry's metadata. Two users can compete to insert the same key; first writer wins, subsequent updates are owner-only.

SharedStorage vs AuthoredMap

Both allow mutation after creation. The difference is the granularity of the writer set:

  • SharedStorage<T>: one collection-level writer set governs one logical T. Several named members can co-author the same value.
  • AuthoredMap<K,V>: per-entry writer set (currently size-1 = single author). The writer varies per key.

Why composition doesn't replace these primitives

AuthoredMap is not UnorderedMap<K, SharedStorage<V>>

You can nest SharedStorage inside an UnorderedMap for per-key multi-writer values, but the outer map is still public — outer keys can be inserted, overwritten, or removed by anyone. AuthoredMap puts the ownership stamp on the entry itself within a shared keyspace, so K cannot be replaced by non-owners.

SharedStorage is not UnorderedMap<K, UserStorage<V>>

You can nest UserStorage inside an UnorderedMap for per-key single-author values, but again the outer map is public (key-level tampering remains possible). And there is no way to model "several named people co-own this value" without a writer set — exactly what SharedStorage provides.

Merge Semantics

All collections implement the Mergeable trait. When state deltas arrive from peers, the storage layer calls merge() on each affected entry. The merge is:

  • Commutative — merge(A, B) = merge(B, A)
  • Associative — merge(merge(A, B), C) = merge(A, merge(B, C))
  • Idempotent — merge(A, A) = A

These properties guarantee eventual consistency regardless of message ordering or duplication.