Skip to content

Concepts

Envelope encryption and the three-tier key hierarchy

Every document is sealed with its own random key, and keys are wrapped by keys, in three tiers:

  1. Master KEK (MKEK) - a single 256-bit key held by the Key module. It is the only key the vault does not store itself. pdv never persists it; it asks the Key module for it for one wrap/unwrap and zeroizes it immediately after.
  2. Subject KEK - one key per user (the "subject"). It is generated on the user's first write, wrapped with the Master KEK, and stored in pdv in wrapped form. It is never stored in cleartext.
  3. Document DEK - one Data Encryption Key per document. It encrypts the document body and metadata, and is itself wrapped with the owner's Subject KEK and stored alongside the document.
Master KEK  (Key module / OpenBao)
  wraps ->  Subject KEK   (per user, stored wrapped in pdv)
    wraps ->  Document DEK  (per document, stored wrapped in pdv)
      encrypts ->  document body + metadata

To read a document the vault unwraps the Subject KEK with the Master KEK, unwraps the Document DEK with the Subject KEK, then decrypts the body. Raw keys live only for the duration of one operation and are wiped with sodium_memzero.

Cryptography

Bodies and metadata are encrypted with libsodium AEAD. New data is sealed with XChaCha20-Poly1305, which runs on every host (pure software, constant-time, 192-bit nonce), so data keeps decrypting on whatever host it later runs on. The chosen suite is recorded per document, and AES-256-GCM stays supported for decrypting existing records and as an explicit, operator-pinned choice on a homogeneous AES-NI fleet -- but it is not selected automatically, because a host without AES-NI cannot run it (which would strand AES-GCM-sealed data on a DR failover or a move to non-AES-NI hardware).

Each ciphertext is bound to its context with Additional Authenticated Data (AAD): the document body, its metadata, and its wrapped DEK are each bound to the document UUID (and the body also to the owner). A blob cannot be lifted from one document and replayed into another.

Cryptographic erasure

Deleting a document removes its wrapped DEK, so the body becomes unrecoverable even if the ciphertext file survived in a backup. This is key destruction, not byte overwriting (NIST SP 800-88 "cryptographic erase").

The same idea works per user: destroying a Subject KEK makes every one of that user's documents unrecoverable at once, which is how a GDPR-style erasure of an entire vault is performed.

Confidentiality at rest

A database dump or a disk image reveals nothing useful:

  • Document bodies are ciphertext, whether held in a managed file or inline in the database row.
  • Metadata (label, original filename, MIME type) is encrypted into a separate per-item blob, not stored in cleartext columns.
  • File-backed ciphertext is written to opaque random names with no extension, so the file listing does not leak names, types, or sizes by inference.

Where a body lives is a storage choice, not a confidentiality one. A record is just a file with no filename and a JSON MIME type, so records and file documents are stored the same way: a body at or below a configurable size threshold is kept inline in the row (atomic with it, no orphaned files, no file-usage bookkeeping), and anything larger uses a file. Both forms are AEAD ciphertext; the threshold is set on the vault settings form.

What is not hidden: the existence of a document row, its owner, its kind, and its ciphertext size. The boundary is confidentiality of content, not traffic analysis.

Identity before authorization

A consumer (an in-Drupal workflow or an external API client, modeled by the Consumers module) is always identified before it can be granted anything. Consent to share is captured on the vault site, never asserted by the consumer.

Authorization: trusts, grants, and orthogonal scopes

Once a consumer is identified, two independent kinds of authorization grant it access to a user's items:

  • a grant - per item (this consumer may read, or write, this item);
  • a trust - per kind (this consumer may read, or write, any item of this kind, now and in future), so the owner is not re-asked each time.

Both carry a scope, and the scopes are orthogonal: read and write are independent. A write authorization does not confer read, and a read authorization does not confer write. A consumer that needs both - for example a form that prefills from the vault (read) and saves the user's edits back (write) - must hold both, and is prompted for each as it is first needed. This keeps data minimisation tight: a write-only "drop box" consumer can store data it can never read back.

The owner establishes these through the consent ceremony, or directly from their own vault page. A write updates a record by reading-and-merging internally in the vault, so a write-only consumer can update without being able to read the result.