Storage Layer
calimero-store + calimero-store-rocksdb + calimero-dag + calimero-storage
Purpose
The storage layer provides a column-family key-value abstraction over RocksDB. The core Database trait exposes has, get, put, delete, iter, and apply(Transaction). Keys are typed via generic_array, giving compile-time guarantees on key size and column assignment. An optional AES-GCM encryption layer transparently encrypts values at rest.
fn has(&self, col: Column, key: &[u8]) -> Result<bool>;
fn get(&self, col: Column, key: &[u8]) -> Result<Option<Slice>>;
fn put(&self, col: Column, key: &[u8], value: &[u8]) -> Result<()>;
fn delete(&self, col: Column, key: &[u8]) -> Result<()>;
fn iter(&self, col: Column) -> Result<DBIterator>;
fn apply(&self, tx: Transaction) -> Result<()>;
}
Column Architecture
All persistent data is partitioned into 11 column families. Each maps to a dedicated RocksDB column family with independent compaction and bloom filters. The Group column is the most complex, containing 26+ logical key prefixes for governance state including namespace identity, namespace governance ops, group hierarchy, and group encryption keys.
Informal group::… labels in the diagram map to the typed keys and single-byte prefixes in Storage Schema (Group column): group::info → GroupMeta (0x20); group::member → GroupMember (0x21); group::role → role data on GroupMember values (0x21); group::context → GroupContextIndex (0x22) / reverse index ContextGroupRef (0x23); group::upgrade → GroupUpgradeKey (0x24); group::signing_key → GroupSigningKey (0x25); group::cap → GroupMemberCapability (0x26); group::settings (defaults) → GroupDefaultCaps (0x29); migration markers → GroupContextLastMigration (0x2B); group::nonce → GroupLocalGovNonce (0x2C); group::alias (member) → GroupMemberAlias (0x2D), group/context names → GroupAlias (0x2E), GroupContextAlias (0x2F); group::oplog → GroupOpLog (0x30); group::oplog_head → GroupOpHead (0x31); member–context links → GroupMemberContext (0x32), GroupContextMemberCap (0x33). Some cells (e.g. group::invite, group::state_hash) are illustrative and do not match a single prefix—use the schema tables as the source of truth.
Key Model
All keys are statically typed via Key<T>, a newtype over GenericArray<u8, T::Size>. The AsKeyParts / FromKeyParts traits define how a key is decomposed into column + byte components, giving compile-time column assignment.
pub trait AsKeyParts {
type Size: ArrayLength<u8>;
const COLUMN: Column;
fn as_key(&self) -> Key<Self>;
}
pub trait FromKeyParts: AsKeyParts {
fn from_key(key: Key<Self>) -> Self;
}
Key Types by Column
calimero-dag
A generic, in-memory causal DAG used for both context state deltas and governance operation logs. Provides topological ordering, pending queues for out-of-order delivery, and missing-parent detection for catch-up.
crates/dagCausalDelta<T>
id: DeltaId,
parents: Vec<DeltaId>,
payload: T,
timestamp: HLC,
expected_root_hash: Hash,
}
Each delta records its causal parents, forming a partial order. The expected_root_hash enables fast consistency checks — a peer can verify it arrived at the same state after applying a delta.
DagStore<T>
deltas: HashMap<DeltaId, CausalDelta<T>>,
applied: HashSet<DeltaId>,
pending: Vec<CausalDelta<T>>,
heads: HashSet<DeltaId>,
}
Tracks all known deltas, which have been applied, which are pending (missing parents), and the current DAG head set. New deltas promote heads automatically.
DeltaApplier<T> trait
fn apply_delta(&mut self, delta: &CausalDelta<T>) -> Result<()>;
fn restore_applied_delta(&mut self, delta: &CausalDelta<T>) -> Result<()>;
}
Key Operations
Topological Ordering
Before applying, all pending deltas are sorted in topological order (parents before children). This ensures deterministic replay regardless of arrival order.
Pending Queue
If a delta's parents haven't been seen yet, it enters the pending queue. When missing parents arrive, queued deltas are automatically drained and applied.
restore_applied_delta
Used during node restart to rebuild the in-memory DAG from persisted deltas without re-executing the payload (state is already in storage).
get_missing_parents
Returns the set of delta IDs referenced as parents but not yet received. Used by the sync protocol to request specific deltas from peers.
calimero-storage
Provides CRDT collections used by the WASM runtime for conflict-free replicated state. Each collection implements the Mergeable trait for automatic conflict resolution during sync.
crates/storageUnorderedMap
Observed-remove map. Concurrent puts to the same key are resolved by LWW using HybridTimestamp. Deletions are tracked as tombstones until causally stable.
UnorderedSet
Observed-remove set. Add/remove conflicts resolved in favor of add (add-wins semantics). Internally backed by an UnorderedMap with unit values.
LwwRegister
Last-writer-wins register. Stores a single value with a HybridTimestamp. On merge, the value with the highest timestamp wins.
Core Traits
pub trait Mergeable {
fn merge(&mut self, other: &Self) -> Result<(), MergeError>;
}
The Mergeable trait is the fundamental building block — any type that implements it can be used as a CRDT value in the storage layer. The runtime's host functions call merge when applying remote deltas.
RocksDB Implementation
The calimero-store-rocksdb crate provides the concrete Database implementation backed by RocksDB.
crates/store/rocksdbColumn Family Mapping
Each Column enum variant maps 1:1 to a RocksDB column family. CFs are created at DB open time. Each has independent compaction, bloom filters (10 bits/key), and block cache partitions.
WriteBatch Transactions
The Transaction type accumulates puts and deletes, then is atomically committed via WriteBatch. Guarantees all-or-nothing semantics for multi-key operations like delta application.
Snapshot Iteration
Iterators are backed by RocksDB snapshots for consistent point-in-time reads. Prefix iteration uses set_iterate_range for efficient scans within a column family.
Pinned Gets
Uses get_pinned_cf for zero-copy reads where possible. The returned Slice borrows directly from the block cache, avoiding allocation for large values.
CRDT Collections
The calimero-storage crate provides application-level CRDT collections built on top of the storage layer. These are what SDK applications use for state management.
Available Collections
UnorderedMap
Key-value store with LWW (Last-Write-Wins) semantics per entry. Entries can be inserted, updated, and removed independently across nodes.
Vector
Ordered append-only list. Items are pushed to the end. Positional inserts and removes use index-based CRDT logic.
Counter
Generic over ALLOW_DECREMENT: bool. Default Counter<false> (alias GCounter) is grow-only: each node increments its own slot and value() returns the sum across all nodes. Counter<true> (alias PNCounter) layers a second per-node map to also support decrement. Both variants are commutative and idempotent.
Storage Primitives
Beyond the public CRDT collections above (UnorderedMap, Vector, UnorderedSet, LwwRegister, Counter) the storage layer provides wrappers that constrain who can write what. Three flavours of constraint:
- Signature-based (UserStorage, SharedStorage, AuthoredMap, AuthoredVector) — the context manager signs each mutating action with the executor's private key (
sign_authorized_actionsin crates/context/src/handlers/execute/mod.rs) after WASM returns its outcome. Peers verify the signature at merge time inInterface::apply_actionin crates/storage/src/interface.rs using the runtime's ed25519_verify host function and reject the action with InvalidSignature on mismatch. - Structural (FrozenStorage) — no per-identity signature; immutability after first write is enforced by content-addressing. Once the SHA-256 hash → value mapping is published, attempts to overwrite the same hash are rejected.
- None (the public collections) — anyone in the context can write, update, or remove any entry. Listed below for contrast.
Comparison
| Primitive | Storage stamp | Keyspace | Writes new | Mutates existing | Reads |
|---|---|---|---|---|---|
| UnorderedMap<K,V> (public baseline — no auth) |
— | K → V | anyone | anyone | everyone |
| UserStorage<T> | User { owner = executor_id } | PublicKey → T, disjoint per-user slots | only into your own slot | only your own slot | everyone (get_for_user) |
| FrozenStorage<T> (structural — no signature check) |
Frozen | Hash(value) → T, content-addressed | anyone | nobody — immutable | everyone |
| SharedStorage<T> | Shared { writers, frozen } | one slot with T (T often a nested map) | signer ∈ writers | signer ∈ writers; writers rotatable if !frozen | everyone |
| AuthoredMap<K,V> | User { owner } per-entry | shared keyspace, K → V | anyone (becomes owner) | only owner | everyone |
| AuthoredVector<V> | User { owner } per-entry | shared sequence, index → V | anyone (push, becomes owner) | only owner (update / tombstone) | everyone |
Picking one
UserStorage<T>
Per-user slot keyed by PublicKey. The executor can only write into their own slot (env::executor_id()); reads are unrestricted. Internally an UnorderedMap<PublicKey, T>.
SharedStorage<T>
A single value writable by any signer in a mutable writers set. Any current writer can rotate the set unless it is frozen. The context manager signs each write after WASM returns its outcome; peers verify the signature against the stored writer set at merge time. See ADR 0001 for the rotation-during-concurrent-write contract.
FrozenStorage<T>
Content-addressable immutable storage. insert returns a SHA-256 hash; reads are by hash. Same value always yields the same hash, and entries cannot be updated once written. Internally an UnorderedMap<Hash, FrozenValue<T>> with first-write-wins semantics.
AuthoredMap<K, V>
Shared keyspace map with per-entry ownership. Any member can insert a new key; only the inserter can update or remove their own entries. Each entry carries a StorageType::User { owner } stamp set from the executor's public key at insert time.
AuthoredVector<V>
Ordered shared vector with per-entry ownership. Any member can push; only the pusher can update or tombstone their entry. There is no physical remove — shifting indices would complicate concurrent-push merge semantics. Use tombstone(idx) to retract a slot.
Merge-time enforcement (signature-based primitives)
Local update/remove calls short-circuit non-owner attempts so bugs surface in-process. The load-bearing check happens at merge time in Interface::apply_action:
- Signature — the runtime's ed25519_verify host function is called against the entry's stored owner (or, for SharedStorage, against the current writer set). Mismatch returns InvalidSignature.
- Replay nonce — incoming nonce must be strictly greater than the stored value; equal nonces are rejected as NonceReplay (i.e. the comparison is incoming > stored, not ≥). The nonce is the action's SignatureData.nonce, set from env::time_now() (a wall-clock nanosecond timestamp from SystemTime::now()) at action build time. Caveat: wall-clock time is not strictly monotonic across NTP slews, leap-second corrections, VM-host clock changes, or process restarts. Two consecutive writes on the same node can collide or step backward under those conditions, in which case the second write is rejected as NonceReplay; conversely a node with a far-future clock can write a nonce that effectively locks the entity until later honest writes catch up. Per the inline comment in interface.rs, the nonce check is itself a transitional v2 mechanism that the project plans to retire after a soak period in favour of DAG-causal verification.
- Per-entity scope — the stored nonce lives on the individual storage entity (the row keyed by entity id), not on the owner globally. For AuthoredMap each map key is its own entity; for UserStorage each per-user slot; for SharedStorage the single value.
- Entity binding — a signed action targeting one entity cannot be replayed against a different entity even when the same key signed both: the entity id is part of the bytes hashed by Action::payload_for_signing(), so the signature is bound to that specific entity.
Key distinctions
UserStorage vs AuthoredMap
Both stamp entries with StorageType::User { owner }. The difference is keying:
- UserStorage<T>: key is the public key, slots are disjoint per user.
- AuthoredMap<K,V>: key is application-defined, owner is recorded in the entry's metadata. Two users can compete to insert the same key; first writer wins, subsequent updates are owner-only.
SharedStorage vs AuthoredMap
Both allow mutation after creation. The difference is the granularity of the writer set:
- SharedStorage<T>: one collection-level writer set governs one logical T. Several named members can co-author the same value.
- AuthoredMap<K,V>: per-entry writer set (currently size-1 = single author). The writer varies per key.
Why composition doesn't replace these primitives
AuthoredMap is not UnorderedMap<K, SharedStorage<V>>
You can nest SharedStorage inside an UnorderedMap for per-key multi-writer values, but the outer map is still public — outer keys can be inserted, overwritten, or removed by anyone. AuthoredMap puts the ownership stamp on the entry itself within a shared keyspace, so K cannot be replaced by non-owners.
SharedStorage is not UnorderedMap<K, UserStorage<V>>
You can nest UserStorage inside an UnorderedMap for per-key single-author values, but again the outer map is public (key-level tampering remains possible). And there is no way to model "several named people co-own this value" without a writer set — exactly what SharedStorage provides.
Merge Semantics
All collections implement the Mergeable trait. When state deltas arrive from peers, the storage layer calls merge() on each affected entry. The merge is:
- Commutative — merge(A, B) = merge(B, A)
- Associative — merge(merge(A, B), C) = merge(A, merge(B, C))
- Idempotent — merge(A, A) = A
These properties guarantee eventual consistency regardless of message ordering or duplication.