Cross-site API: design and threat model¶

This is the design rationale and threat model for the cross-site surface. For the endpoint contract see the API reference; for setting up the two sites see Client-server configuration.

Goal¶

Let a consumer on a different site use a user's vault over HTTP. The consumer site does not have pdv installed; it holds OAuth client credentials and talks to the vault's API. Today's same-site shortcuts (the pdv_file element reading the vault in-process, local decryption, in-process grant checks) are replaced by HTTP calls.

Core principle: expose ConsumerApi, never Vault¶

Vault is the unguarded core: it decrypts any item with the Master KEK and has no notion of a consumer. It is never reachable over the network.

ConsumerApiInterface is the authorization boundary: every call is made as a consumer and bounded by that consumer's grants and trusts. The HTTP API is a thin transport in front of it: authenticate, resolve the acting consumer, delegate, serialize. All gating stays in ConsumerApi where it is tested. The same interface is implemented in-process locally and by a remote HTTP client on the consumer site, so local and remote consumers coexist on one vault with identical gating (see #12.0).

flowchart TD
  subgraph consumer["Remote consumer site"]
    RC["Remote client<br>(implements ConsumerApiInterface)"]
  end
  subgraph vault["Vault site"]
    API["HTTP API controllers"]
    CA["ConsumerApi<br>(the gate: grants / trusts)"]
    V["Vault<br>(crypto + Master KEK, in-process only)"]
  end
  RC -- "HTTPS + OAuth" --> API
  API --> CA
  CA --> V

Two-layer authorization¶

Transport / identity (who is calling): an OAuth client-credentials bearer token identifies the consumer (a consumer entity client id). This is all the token proves: which consumer, not which user.
Data authorization (what they may touch): every ConsumerApi call is gated by the consumer's grants (per item) and trusts (per owner+kind). A consumer reaches only data a user explicitly shared with it.

On a multi-tenant site an outermost scope wraps both: each consumer is bound to a single tenant, so it can only ever reach vaults in that realm, independently of any grant or trust.

The user is referenced by an opaque per-consumer handle, never the raw uid. The handle is the uid, encrypted (AEAD) under a key derived from the consumer's client id -- so resolution is a stateless decrypt, no handle->uid map is stored anywhere (see HandleCodec, #12.2). It is minted during the consent ceremony and is all the consumer ever sees or sends. The HTTP layer decodes handle -> uid vault-side before delegating to ConsumerApi, which stays uid-based internally. Consequences:

The vault's internal user ids never leave the site.
The same user cannot be linked across different consumers (each gets a distinct handle), so colluding consumers cannot correlate a person.
A handle is meaningless without a grant/trust behind it: even a valid handle only reaches data the owner shared with that consumer, because every call is still gated by ConsumerApi.

Putting it together, a single cross-site read carries an OAuth token (the consumer's identity) and a handle (the opaque user reference), and is gated and decrypted entirely vault-side:

sequenceDiagram
  participant RC as Remote client
  participant API as Vault HTTP API
  participant CA as ConsumerApi gate
  participant V as Vault
  RC->>API: GET read, with OAuth token and handle
  API->>API: token identifies the consumer, decode handle to uid
  API->>CA: read item, acting as this consumer
  CA->>CA: check grants and trusts, and tenant
  alt Authorized
    CA->>V: unwrap Subject KEK then DEK, decrypt
    V-->>RC: plaintext bytes, no-store
  else Not authorized
    CA-->>RC: 403 uniform, no existence leak
  end

API surface (maps 1:1 to ConsumerApiInterface)¶

GET read / readRecord / canRead / listReadableItems / kindAccess (per-kind read+write trust check, side-effect-free, so a UI can offer only actions that will succeed; also reports declined kinds so a re-prompting UI stops asking -- posture, never inventory)
GET kindLabels (/pdv-api/kind-labels) -- the vault's human, translatable kind labels for a language, so a consumer shows kind names in the user's language instead of machine names. Not user-scoped (shared reference metadata); the consumer caches the catalogue per connection and language.
POST/PUT createRecord / updateRecord / saveFile -- the trust-gated writes only (returns the ItemRef). A write succeeds when the consumer already holds standing write authorization and is refused (uniform 403) otherwise. The *WithConsent ConsumerApi variants are deliberately NOT exposed (see Decisions: cross-site writes never use inline consent).
Consent ceremony (#12.4): an unauthorized write returns a consent_required body with a consent_url; the owner is redirected to the existing approval pages, which mint the write authorization, and bounced back to the consumer's return_url.

Assets and trust boundaries¶

Assets: users' decrypted PII (record values, file bytes), item metadata (labels, filenames, kinds), the grant/trust graph, OAuth client secrets, the Master KEK.
Trust boundary: the network between consumer site and vault site, and the consumer site itself (a separate operator). The vault must treat every request as hostile until authenticated and gated.
In the TCB, never exposed: Vault, the MKEK, raw ciphertext, another consumer's grants.

Threats and mitigations¶

#	Threat	Mitigation
T1	Bearer token theft / replay	HTTPS required; short-lived access tokens; rotation; tokens scoped to the consumer only.
T2	Consumer impersonation	Client-credentials secret kept server-side on the consumer; never in a browser.
T3	Over-broad access (consumer reads data it was not given)	Every call gated by grants/trusts in `ConsumerApi`; no token scope grants blanket vault access; least privilege via per-kind / per-item authorization.
T4	IDOR: consumer asserts another user's `uid`	Existing checks: item ownership must match the asserted uid AND the consumer must hold a grant/trust the owner created. A wrong uid fails; a consumer only reaches what each owner shared with it.
T5	Open redirect / leak via consumer `return_url`	Operator origin allowlist (`return_url_origins`); fail-closed for any unlisted origin. The shared guard `GrantRequestCallback::assertSafeReturnUrl` rejects credentials (userinfo) and backslashes, so the parsed host cannot diverge from the host a browser would navigate to (`https://ok\@evil/` no longer passes); the state token correlates the callback.
T6	Consent-callback replay / CSRF	The `state` token is single-use and bound to the request; verified on callback.
T7	Plaintext PII on the wire	Inherent (the consumer needs the data); mitigate with mandatory TLS, minimal fields returned (`ItemRef` carries no body), and audit logging of every read.
T8	Enumeration / existence oracle via errors	Uniform "not authorized" responses; do not reveal whether an item exists, its kind, or decryption failures.
T9	DoS / abuse of the public surface	Two in-app limits: a write-body cap (`pdv_server_api.settings.max_write_bytes`, default 25 MiB) on every write, and per-principal rate limiting (`ConsumerFloodGuard`) on reads, writes and consent starts -- an over-quota call gets HTTP 429 and emits a `pdv.flood.throttled` audit event. Read/write are limited on two dimensions, both on the cross-site API settings (`pdv_server_api.settings.flood`): per consumer (guards the server) and per consumer-per-user (guards one user vault from a single consumer pounding it within its overall budget). The consent-start limit is keyed by uid and lives on the core vault settings (`pdv.settings.flood`). A threshold of 0 disables that dimension. A reverse-proxy / WAF limiter is still recommended in front for network-layer floods.
T10	Confused-deputy via CORS (browser callers)	Default to server-to-server only; no permissive CORS unless a browser flow is explicitly designed and reviewed.
T11	Audit gaps	Actual reads/writes/consent emit an audit event with the correlation id, as same-site already does (ConsumerApi `pdv.read`/`pdv.write`). The cross-site surface additionally audits the operations the in-process API leaves silent -- listing readable items (`pdv.read.listed`) and access probes (`pdv.read.probed`), tagged `source=cross_site_api` -- so metadata harvesting and enumeration attempts are visible, not just successful decrypts (#12.2). Cross-system tracing: every request carries a vault-minted `correlation_id` (one per request via `CorrelationContext`), echoed back in the `X-Correlation-ID` response header so a caller can join its own logs to vault rows. A caller may also send `X-Correlation-ID`; it is sanitized and recorded as `client_ref` -- a join hint only, never an authorization input, so a spoofed value can at most mislabel its own rows.

Locked vaults¶

When an owner has enabled a passphrase, their Subject KEK cannot be unwrapped without it, so a consumer call against a locked vault fails closed: the server returns a vault_locked signal (HTTP 403) and the client surfaces it as a VaultLockedClientException. No data is read or written while the vault is locked.

A consumer can prompt the owner to unlock, but only while the owner is present in the browser, and without ever handling the passphrase or the Subject KEK itself. Unlock uses a split key:

The owner unlocks on the vault site; the session key Ks lives only in the owner's browser (a vault-domain cookie / X-Pdv-Vault-Unlock header). The vault stores enc_Ks(Subject KEK) keyed by a hash of Ks, never Ks itself.
For a cross-site consumer, the vault hands back a one-time code (single use, consumer-bound, short TTL) on the redirect. The consumer exchanges it server-to-server for Ks, then relays Ks as a request header on subsequent calls. Ks is never persisted on the consumer; it is held in an own-domain cookie for the duration.
The code's payload carries Ks encrypted under a key derived from the code itself, so the stored handoff row is useless to anyone who does not present the code. Redemption is single-use and bound to the minting consumer; a wrong consumer or a second redemption fails closed.

The owner passphrase is therefore never transmitted off the vault site, and the operator-lockout property of the passphrase holds across the cross-site flow.

Security review (#12.S)¶

Reviewed the built surface against the threats above (adversarial pass on the crypto / auth / IDOR core plus a structured pass on the rest). No exploitable vulnerability found in the high-severity areas:

Handle crypto (T2/T4): forging or cross-using a handle requires the per-consumer key (HKDF over the server hash salt + client id); the AEAD key and AAD both bind the client id, so a handle replayed under another consumer fails authentication. decode() is failure-closed (every malformed/foreign/tampered case throws and collapses to a uniform 403).
OAuth gate (T2): X-Consumer-ID is attacker-controllable but inert on its own -- the _auth: ['oauth2'] route guard makes simple_oauth overwrite it from the validated token, and a token-less request is anonymous and fails the access pdv api permission before reaching a controller. This rests on the invariant that every API route keeps _auth: ['oauth2']; RouteGuardTest now enforces it.
IDOR (T3/T4): ConsumerApi checks item ownership against the decoded uid and a grant/trust on every read/write; there is no path to assert a raw uid.
Error oracle (T8): missing-item, wrong-owner, no-grant, and unknown-consumer all collapse to one 403; the 409 conflict is reachable only after authorization.
Redirect/CSRF (T5/T6): return_url origin allowlist is fail-closed; the consent state is single-use both sides.

Changes made: the T9 write-body cap, per-principal rate limiting (ConsumerFloodGuard, with a pdv.flood.throttled audit event), and RouteGuardTest. Deferred (documented, not code-blocking): the handle-key rotation cadence (see Still open). A reverse-proxy / WAF limiter in front remains recommended for network-layer floods.

Out of scope for this note¶

The OpenBao-backed MKEK (#8) is a companion but separate; a real multi-site service wants the root secret out of config.
Federated / multi-vault discovery.

Decisions¶

OAuth stack and mapping. simple_oauth with the client-credentials grant. simple_oauth already uses the consumers module's Consumer entity as its OAuth client, and pdv already uses consumer entities as its consumers, so they are the same entity: one client_id is both the OAuth client and the pdv consumer. No join entity. An access token resolves directly to a consumer.
User identity: opaque per-consumer handle. A consumer never sees or sends a raw uid; it uses a handle minted at consent, unique per (consumer, user). The handle is the uid encrypted under a per-consumer key (HandleCodec), so the vault stores no map -- resolution is a stateless decrypt, and a handle replayed under another consumer's token fails authentication. ConsumerApi stays uid-based internally. (See "Two-layer authorization" above.)
Token scopes: grants are the single source of truth. Read vs write is decided entirely by the consumer's grants/trusts, not duplicated at the token layer. The token carries one coarse scope ("may call the pdv API"); a stolen token is still bounded by the consumer's grants. Per-operation read/write token scopes are deferred as an optional defense-in-depth layer, to avoid two disagreeing authorization sources.
Cross-site writes never use inline consent. ConsumerApi's *WithConsent write methods will, when the owner has not enabled "require explicit consent," store directly with no standing-authorization check -- safe same-site, where that path is only reached because a user is physically submitting a form (the submission is the consent). Over HTTP there is no user present, so the API exposes only the trust-gated createRecord / updateRecord / saveFile (#12.3, the last added so a file write exists without the inline shortcut). A write the consumer is not pre-authorized for does not happen on its assertion; it must go through the consent ceremony (#12.4), where the owner approves.

Still open¶

Lifetime/rotation policy for the opaque handle. Because the handle is stateless ciphertext, it stays decodable as long as the per-consumer key holds; "revocation" is really the grant/trust going away (a decoded handle then reaches nothing). Rotating the derived key (or the site hash salt) invalidates every handle for that consumer at once, forcing re-consent -- acceptable, but the trigger/cadence is undecided.