Skip to content

Cross-site API: design and threat model

This is the design rationale and threat model for the cross-site surface. For the endpoint contract see the API reference; for setting up the two sites see Client-server configuration.

Goal

Let a consumer on a different site use a user's vault over HTTP. The consumer site does not have pdv installed; it holds OAuth client credentials and talks to the vault's API. Today's same-site shortcuts (the pdv_file element reading the vault in-process, local decryption, in-process grant checks) are replaced by HTTP calls.

Core principle: expose ConsumerApi, never Vault

Vault is the unguarded core: it decrypts any item with the Master KEK and has no notion of a consumer. It is never reachable over the network.

ConsumerApiInterface is the authorization boundary: every call is made as a consumer and bounded by that consumer's grants and trusts. The HTTP API is a thin transport in front of it: authenticate, resolve the acting consumer, delegate, serialize. All gating stays in ConsumerApi where it is tested. The same interface is implemented in-process locally and by a remote HTTP client on the consumer site, so local and remote consumers coexist on one vault with identical gating (see #12.0).

remote consumer site                 vault site
--------------------                 ----------
[remote client]  --HTTPS+OAuth-->  [HTTP API controllers]
(implements                              |
 ConsumerApiInterface)                   v
                                   [ConsumerApi] (the gate: grants/trusts)
                                         |
                                         v
                                   [Vault] (crypto + MKEK, in-process only)

Two-layer authorization

  1. Transport / identity (who is calling): an OAuth client-credentials bearer token identifies the consumer (a consumer entity client id). This is all the token proves: which consumer, not which user.
  2. Data authorization (what they may touch): every ConsumerApi call is gated by the consumer's grants (per item) and trusts (per owner+kind). A consumer reaches only data a user explicitly shared with it.

On a multi-tenant site an outermost scope wraps both: each consumer is bound to a single tenant, so it can only ever reach vaults in that realm, independently of any grant or trust.

The user is referenced by an opaque per-consumer handle, never the raw uid. The handle is the uid, encrypted (AEAD) under a key derived from the consumer's client id -- so resolution is a stateless decrypt, no handle->uid map is stored anywhere (see HandleCodec, #12.2). It is minted during the consent ceremony and is all the consumer ever sees or sends. The HTTP layer decodes handle -> uid vault-side before delegating to ConsumerApi, which stays uid-based internally. Consequences:

  • The vault's internal user ids never leave the site.
  • The same user cannot be linked across different consumers (each gets a distinct handle), so colluding consumers cannot correlate a person.
  • A handle is meaningless without a grant/trust behind it: even a valid handle only reaches data the owner shared with that consumer, because every call is still gated by ConsumerApi.

API surface (maps 1:1 to ConsumerApiInterface)

  • GET read / readRecord / canRead / listReadableItems / kindAccess (per-kind read+write trust check, side-effect-free, so a UI can offer only actions that will succeed; also reports declined kinds so a re-prompting UI stops asking -- posture, never inventory)
  • GET kindLabels (/pdv-api/kind-labels) -- the vault's human, translatable kind labels for a language, so a consumer shows kind names in the user's language instead of machine names. Not user-scoped (shared reference metadata); the consumer caches the catalogue per connection and language.
  • POST/PUT createRecord / updateRecord / saveFile -- the trust-gated writes only (returns the ItemRef). A write succeeds when the consumer already holds standing write authorization and is refused (uniform 403) otherwise. The *WithConsent ConsumerApi variants are deliberately NOT exposed (see Decisions: cross-site writes never use inline consent).
  • Consent ceremony (#12.4): an unauthorized write returns a consent_required body with a consent_url; the owner is redirected to the existing approval pages, which mint the write authorization, and bounced back to the consumer's return_url.

Assets and trust boundaries

  • Assets: users' decrypted PII (record values, file bytes), item metadata (labels, filenames, kinds), the grant/trust graph, OAuth client secrets, the Master KEK.
  • Trust boundary: the network between consumer site and vault site, and the consumer site itself (a separate operator). The vault must treat every request as hostile until authenticated and gated.
  • In the TCB, never exposed: Vault, the MKEK, raw ciphertext, another consumer's grants.

Threats and mitigations

# Threat Mitigation
T1 Bearer token theft / replay HTTPS required; short-lived access tokens; rotation; tokens scoped to the consumer only.
T2 Consumer impersonation Client-credentials secret kept server-side on the consumer; never in a browser.
T3 Over-broad access (consumer reads data it was not given) Every call gated by grants/trusts in ConsumerApi; no token scope grants blanket vault access; least privilege via per-kind / per-item authorization.
T4 IDOR: consumer asserts another user's uid Existing checks: item ownership must match the asserted uid AND the consumer must hold a grant/trust the owner created. A wrong uid fails; a consumer only reaches what each owner shared with it.
T5 Open redirect / leak via consumer return_url Operator origin allowlist (return_url_origins); fail-closed for any unlisted origin. The shared guard GrantRequestCallback::assertSafeReturnUrl rejects credentials (userinfo) and backslashes, so the parsed host cannot diverge from the host a browser would navigate to (https://ok\@evil/ no longer passes); the state token correlates the callback.
T6 Consent-callback replay / CSRF The state token is single-use and bound to the request; verified on callback.
T7 Plaintext PII on the wire Inherent (the consumer needs the data); mitigate with mandatory TLS, minimal fields returned (ItemRef carries no body), and audit logging of every read.
T8 Enumeration / existence oracle via errors Uniform "not authorized" responses; do not reveal whether an item exists, its kind, or decryption failures.
T9 DoS / abuse of the public surface Two in-app limits: a write-body cap (pdv_server_api.settings.max_write_bytes, default 25 MiB) on every write, and per-principal rate limiting (ConsumerFloodGuard) on reads, writes and consent starts -- an over-quota call gets HTTP 429 and emits a pdv.flood.throttled audit event. Read/write are limited on two dimensions, both on the cross-site API settings (pdv_server_api.settings.flood): per consumer (guards the server) and per consumer-per-user (guards one user vault from a single consumer pounding it within its overall budget). The consent-start limit is keyed by uid and lives on the core vault settings (pdv.settings.flood). A threshold of 0 disables that dimension. A reverse-proxy / WAF limiter is still recommended in front for network-layer floods.
T10 Confused-deputy via CORS (browser callers) Default to server-to-server only; no permissive CORS unless a browser flow is explicitly designed and reviewed.
T11 Audit gaps Actual reads/writes/consent emit an audit event with the correlation id, as same-site already does (ConsumerApi pdv.read/pdv.write). The cross-site surface additionally audits the operations the in-process API leaves silent -- listing readable items (pdv.read.listed) and access probes (pdv.read.probed), tagged source=cross_site_api -- so metadata harvesting and enumeration attempts are visible, not just successful decrypts (#12.2). Cross-system tracing: every request carries a vault-minted correlation_id (one per request via CorrelationContext), echoed back in the X-Correlation-ID response header so a caller can join its own logs to vault rows. A caller may also send X-Correlation-ID; it is sanitized and recorded as client_ref -- a join hint only, never an authorization input, so a spoofed value can at most mislabel its own rows.

Security review (#12.S)

Reviewed the built surface against the threats above (adversarial pass on the crypto / auth / IDOR core plus a structured pass on the rest). No exploitable vulnerability found in the high-severity areas:

  • Handle crypto (T2/T4): forging or cross-using a handle requires the per-consumer key (HKDF over the server hash salt + client id); the AEAD key and AAD both bind the client id, so a handle replayed under another consumer fails authentication. decode() is failure-closed (every malformed/foreign/tampered case throws and collapses to a uniform 403).
  • OAuth gate (T2): X-Consumer-ID is attacker-controllable but inert on its own -- the _auth: ['oauth2'] route guard makes simple_oauth overwrite it from the validated token, and a token-less request is anonymous and fails the access pdv api permission before reaching a controller. This rests on the invariant that every API route keeps _auth: ['oauth2']; RouteGuardTest now enforces it.
  • IDOR (T3/T4): ConsumerApi checks item ownership against the decoded uid and a grant/trust on every read/write; there is no path to assert a raw uid.
  • Error oracle (T8): missing-item, wrong-owner, no-grant, and unknown-consumer all collapse to one 403; the 409 conflict is reachable only after authorization.
  • Redirect/CSRF (T5/T6): return_url origin allowlist is fail-closed; the consent state is single-use both sides.

Changes made: the T9 write-body cap, per-principal rate limiting (ConsumerFloodGuard, with a pdv.flood.throttled audit event), and RouteGuardTest. Deferred (documented, not code-blocking): the handle-key rotation cadence (see Still open). A reverse-proxy / WAF limiter in front remains recommended for network-layer floods.

Out of scope for this note

  • The OpenBao-backed MKEK (#8) is a companion but separate; a real multi-site service wants the root secret out of config.
  • Federated / multi-vault discovery.

Decisions

  • OAuth stack and mapping. simple_oauth with the client-credentials grant. simple_oauth already uses the consumers module's Consumer entity as its OAuth client, and pdv already uses consumer entities as its consumers, so they are the same entity: one client_id is both the OAuth client and the pdv consumer. No join entity. An access token resolves directly to a consumer.
  • User identity: opaque per-consumer handle. A consumer never sees or sends a raw uid; it uses a handle minted at consent, unique per (consumer, user). The handle is the uid encrypted under a per-consumer key (HandleCodec), so the vault stores no map -- resolution is a stateless decrypt, and a handle replayed under another consumer's token fails authentication. ConsumerApi stays uid-based internally. (See "Two-layer authorization" above.)
  • Token scopes: grants are the single source of truth. Read vs write is decided entirely by the consumer's grants/trusts, not duplicated at the token layer. The token carries one coarse scope ("may call the pdv API"); a stolen token is still bounded by the consumer's grants. Per-operation read/write token scopes are deferred as an optional defense-in-depth layer, to avoid two disagreeing authorization sources.

  • Cross-site writes never use inline consent. ConsumerApi's *WithConsent write methods will, when the owner has not enabled "require explicit consent," store directly with no standing-authorization check -- safe same-site, where that path is only reached because a user is physically submitting a form (the submission is the consent). Over HTTP there is no user present, so the API exposes only the trust-gated createRecord / updateRecord / saveFile (#12.3, the last added so a file write exists without the inline shortcut). A write the consumer is not pre-authorized for does not happen on its assertion; it must go through the consent ceremony (#12.4), where the owner approves.

Still open

  • Lifetime/rotation policy for the opaque handle. Because the handle is stateless ciphertext, it stays decodable as long as the per-consumer key holds; "revocation" is really the grant/trust going away (a decoded handle then reaches nothing). Rotating the derived key (or the site hash salt) invalidates every handle for that consumer at once, forcing re-consent -- acceptable, but the trigger/cadence is undecided.