Concepts¶

Envelope encryption and the three-tier key hierarchy¶

Every document is sealed with its own random key, and keys are wrapped by keys, in three tiers:

Master KEK (MKEK) - a single 256-bit key held by the Key module. It is the only key the vault does not store itself. pdv never persists it; it asks the Key module for it for one wrap/unwrap and zeroizes it immediately after.
Subject KEK - one key per user (the "subject"). It is generated on the user's first write, wrapped with the Master KEK, and stored in pdv in wrapped form. It is never stored in cleartext.
Document DEK - one Data Encryption Key per document. It encrypts the document body and metadata, and is itself wrapped with the owner's Subject KEK and stored alongside the document.

flowchart TD
  MKEK["Master KEK<br>(Key module / OpenBao)"]
  SK["Subject KEK<br>(per user, stored wrapped)"]
  DEK["Document DEK<br>(per document, stored wrapped)"]
  Body["Document body + metadata"]
  MKEK -- wraps --> SK
  SK -- wraps --> DEK
  DEK -- encrypts --> Body

To read a document the vault unwraps the Subject KEK with the Master KEK, unwraps the Document DEK with the Subject KEK, then decrypts the body. Raw keys live only for the duration of one operation and are wiped with sodium_memzero.

Cryptography¶

Bodies and metadata are encrypted with libsodium AEAD. New data is sealed with XChaCha20-Poly1305, which runs on every host (pure software, constant-time, 192-bit nonce), so data keeps decrypting on whatever host it later runs on. The chosen suite is recorded per document, and AES-256-GCM stays supported for decrypting existing records and as an explicit, operator-pinned choice on a homogeneous AES-NI fleet -- but it is not selected automatically, because a host without AES-NI cannot run it (which would strand AES-GCM-sealed data on a DR failover or a move to non-AES-NI hardware).

Each ciphertext is bound to its context with Additional Authenticated Data (AAD): the document body, its metadata, and its wrapped DEK are each bound to the document UUID (and the body also to the owner). A blob cannot be lifted from one document and replayed into another.

Owner passphrase (gating against the operator)¶

By default the Master KEK alone unwraps a Subject KEK, so the operator (anyone who can reach the Key module's Master KEK) can in principle decrypt a vault. An owner may raise that bar by enabling a passphrase: a secret only they know, which no administrator can recover.

When enabled, the Subject KEK is wrapped in two layers rather than one:

flowchart TD
  SK["Subject KEK"]
  PP["Passphrase-derived key<br>(Argon2id, libsodium pwhash)"]
  MKEK["Master KEK"]
  PP -- wraps --> SK
  MKEK -- wraps --> PP

An unwrap now needs both the Master KEK and the passphrase, so a database dump plus the Master KEK is no longer enough; the operator is locked out. This lifts the threat model from "protect against database theft" to "protect against the operator". Enabling never lowers at-rest protection, because the Master KEK layer stays on top of the passphrase layer.

The protection is per subject (the whole vault), not per item: enabling it gates all of the owner's content at once. The passphrase is verified by attempting the unwrap, so the vault never stores it or a separate hash of it.

Unlocking a gated vault holds the Subject KEK in a short-lived session that an idle timeout (and logout) clears -- the passphrase itself is never kept:

sequenceDiagram
  actor Owner
  participant Vault
  participant Session as Unlocked-keys session
  Owner->>Vault: Enter passphrase
  Vault->>Vault: Derive key (Argon2id)
  Vault->>Vault: Unwrap with Master KEK, then with the derived key
  Vault->>Session: Hold the Subject KEK (sliding idle TTL)
  Note over Session: Each vault access renews the TTL, idle timeout or logout re-locks

Two consequences are inherent to the design and are surfaced in the UI:

A forgotten passphrase leaves the vault unrecoverable by anyone, so it is opt-in with a clear warning.
A consumer cannot read or write a gated vault while the owner is absent (the Subject KEK cannot be unwrapped without the passphrase). Cross-site unlock bridges this while the owner is present: see the cross-site threat model.

Cryptographic erasure¶

Deleting a document removes its wrapped DEK, so the body becomes unrecoverable even if the ciphertext file survived in a backup. This is key destruction, not byte overwriting (NIST SP 800-88 "cryptographic erase").

The same idea works per user: destroying a Subject KEK makes every one of that user's documents unrecoverable at once, which is how a GDPR-style erasure of an entire vault is performed.

Confidentiality at rest¶

A database dump or a disk image reveals nothing useful:

Document bodies are ciphertext, whether held in a managed file or inline in the database row.
Metadata (label, original filename, MIME type) is encrypted into a separate per-item blob, not stored in cleartext columns.
File-backed ciphertext is written to opaque random names with no extension, so the file listing does not leak names, types, or sizes by inference.

Where a body lives is a storage choice, not a confidentiality one. A record is just a file with no filename and a JSON MIME type, so records and file documents are stored the same way: a body at or below a configurable size threshold is kept inline in the row (atomic with it, no orphaned files, no file-usage bookkeeping), and anything larger uses a file. Both forms are AEAD ciphertext; the threshold is set on the vault settings form.

What is not hidden: the existence of a document row, its owner, its kind, and its ciphertext size. The boundary is confidentiality of content, not traffic analysis.

Identity before authorization¶

A consumer (an in-Drupal workflow or an external API client, modeled by the Consumers module) is always identified before it can be granted anything. Consent to share is captured on the vault site, never asserted by the consumer.

Authorization: trusts, grants, and orthogonal scopes¶

Once a consumer is identified, two independent kinds of authorization grant it access to a user's items:

a grant - per item (this consumer may read, or write, this item);
a trust - per kind (this consumer may read, or write, any item of this kind, now and in future), so the owner is not re-asked each time.

Both carry a scope, and the scopes are orthogonal: read and write are independent. A write authorization does not confer read, and a read authorization does not confer write. A consumer that needs both - for example a form that prefills from the vault (read) and saves the user's edits back (write) - must hold both, and is prompted for each as it is first needed. This keeps data minimisation tight: a write-only "drop box" consumer can store data it can never read back.

The owner establishes these through the consent ceremony, or directly from their own vault page. A write updates a record by reading-and-merging internally in the vault, so a write-only consumer can update without being able to read the result.