# donto — Substrate PRD

**Document type:** Product requirements document. Successor to
PRD-TRUST-KERNEL-001 (2026-05-07).
**Version:** PRD-SUBSTRATE-002.
**Date:** 2026-05-28.
**Status:** Draft. Authors: Thomas Davis, Ajax Davis.
**Supersedes:** parts of `docs/ROADMAP-NEXT.md` and
`docs/ROADMAP-AFTER-MAY18.md`. Complements (does not replace) the
canonical PRD at `docs/DONTO-PRD.md`.

---

## 0. Executive position

**donto is infrastructure.** Not a memory framework. Not a genealogy
application. Not a knowledge graph product. Not an LLM extraction
service. Not an ontology editor. Not a citation manager. The
substrate is the thing multiple independent consumers run *against*
without colliding and without donto biasing toward any one of them.

The named first-tier consumers we are designing for, in alphabetical
order so none of them is the special case:

- **`donto-memory`** — an agentic-memory runtime (the framework
  described in the May 2026 memory-design draft). Long-lived agents
  need bitemporal recall, contradiction-preserving preferences,
  policy-gated retrieval, identity-stable referents, and
  reconsolidation-as-derivation. donto already supplies all of these
  as substrate primitives; donto-memory wraps them with read-time
  dynamics (salience, recall counts, decay clocks) and a sleep-time
  workflow (cluster, reflect, consolidate). donto-memory is *not in
  this repository* and never will be. It is a consumer.

- **`genes`** — a genealogy research workspace built against the
  same donto instance that serves dontopedia.com. The genealogy
  corpus has historically driven much of donto's evolution because
  it stresses every invariant, but genealogy is a consumer like any
  other. The 14-family object model in `DONTO-PRD.md` §6 contains
  *zero* genealogy-specific tables: every column donto-side is
  domain-neutral, and the genealogy-flavoured pieces (parent–child
  age-gap shape, Mary Watson worked example, Annie Davis test
  fixtures) live in the consumer's tree or in Lean shape libraries
  that consumers opt into.

- **`donto-lang`** — the language-documentation pilot (PRD §13).
  Five importers ship today (CLDF, CoNLL-U, UniMorph, LIFT, EAF);
  the consumer code that drives real datasets (Glottolog, UD,
  UniMorph, LIFT, ELAN) is the pilot, not the substrate.

- **third parties** — anyone who can implement the *consumer
  contract* (§4) and read DontoQL. We do not curate the third-tier
  consumer list, but we keep the contract stable so somebody can
  appear.

The product question this PRD answers:

> Given that a memory runtime, a genealogy app, a language pilot,
> and a legal-evidence system all want to run against the same
> donto instance, what does the substrate need to commit to so
> none of them has to win at any other's expense?

---

## 1. Non-mission

This section is load-bearing. Most product drift in research
infrastructure happens by quietly absorbing the use case of the
loudest consumer. The following are *explicit non-missions*; if a
proposed feature requires them, it does not belong in donto core.

| donto will never own | because |
|---|---|
| Memory salience, recall counts, decay clocks | Read-time dynamics of one consumer; lives in `donto-memory`'s overlay. |
| Family-tree visualisation, GEDCOM-specific edges, kinship-recursion rules | Genealogy-domain logic; lives in `genes` overlays or Lean libraries. |
| Phonological/morphological paradigm validation | Language-domain logic; lives in `donto-lang` Lean shapes. |
| LLM extraction prompts, model choice, cost budgets | Application policy of `donto-api`; the substrate exposes ingest, not extraction. |
| Default reasoning engine (RDFS / OWL closure as a baked-in semantics) | Lean overlay certifies *opt-in*; substrate stays neutral on entailment. |
| Default entity collapse at write time | Identity is a hypothesis (I8). Collapsing belongs in the consumer's query lens. |
| Default predicate normalisation at write time | Alignment is typed and scoped (I7). Closure rides at query time. |
| User-account management, OAuth, session tokens | Sidecar concerns of consumers and gateways. |
| Notification, alerting, paging | Operational concerns; `donto-alert-sink` is a thin pluggable interface. |
| Workflow orchestration | Lives in Temporal / `donto-api-worker`, not in `pg_donto`. |
| Domain-specific UI | TUI exists for substrate operators only; consumer UIs are out of scope. |

A simple test (the **substrate test**) lives in §10. It is a
guardrail against drift; we apply it to every feature proposal.

---

## 2. The substrate contract (what donto guarantees consumers)

donto offers consumers the following, with stability across major
versions and breaking-change discipline across minor versions:

### 2.1 Evidence-anchored claims

Every statement is filed under exactly one context. Every claim of
maturity ≥ E2 carries (or transitively carries) at least one
evidence link. The default for a `hypothesis_only=false` ingest
without anchor is to land at E1 (candidate) until provenance is
attached. Consumers may opt into the legacy
`hypothesis_only=true` path for explicitly speculative recall, as
long as the speculation is flagged at the storage layer.

### 2.2 Bitemporal discipline

`valid_time` (world-time) and `tx_time` (system-time) are
non-optional on `donto_statement`. The same discipline extends to
alignments, identity edges, policies, attestations, reviews, and
releases via `donto_event_log`. The contract: consumers can always
ask *"what did the substrate believe at system-time `t`?"* and
*"what holds in the world at valid-time `t'`?"* and get a
deterministic answer.

### 2.3 Paraconsistent storage

Two currently-believed rows with the same subject and predicate but
conflicting objects, polarities, or valid-time intervals are *legal
substrate state*. donto exposes a contradiction frontier; it does
not pick winners. Consumers (memory runtime, genealogy reviewer,
language analyst) decide what to do with contradictions through
their own argument-edge writes, review decisions, or query lenses.

### 2.4 Typed predicate alignment

Eleven relations × three safety flags (`safe_for_query_expansion`,
`safe_for_export`, `safe_for_logical_inference`). A materialised
closure rides at query time when `PREDICATES EXPAND` is set. The
substrate does not auto-collapse aligned predicates at write time.

### 2.5 Identity as hypothesis

Symbols (freely-minted IRIs) live forever. Identity edges weight
coreference between symbols. Identity hypotheses name clustering
solutions. Queries pick a lens (`strict`, `likely`, `exploratory`,
or a consumer-defined hypothesis IRI) at evaluation time. A merge
accepted under one hypothesis never destroys data; querying under a
different hypothesis returns the unmerged view.

### 2.6 Policy capsules with action-level granularity

Fifteen actions: `read_metadata`, `read_content`, `quote`,
`view_anchor_location`, `derive_claims`, `derive_embeddings`,
`translate`, `summarize`, `export_claims`, `export_sources`,
`export_anchors`, `train_model`, `publish_release`,
`share_with_third_party`, `federated_query`. The default is
fail-closed. Attestation credentials grant subset actions to
specific holders for specific purposes with mandatory rationale.

### 2.7 DontoQL with lens parameters

The 21-clause language exposes scope, polarity, maturity,
identity, predicate expansion, modality, extraction level,
`POLICY ALLOWS`, `SCHEMA_LENS`, bitemporal `AS_OF`, and
contradiction-pressure ordering as first-class clauses. Consumers
get one query algebra rather than a stack of ad-hoc SDKs.

### 2.8 Append-only event log

Non-statement objects (alignments, identity hypotheses, policies,
attestations, reviews, releases, predicate descriptors, frames)
mutate exclusively through `donto_event_log`. There is no
`UPDATE … SET` on these tables; all changes are events.

### 2.9 Content-addressed blob substrate

`donto_blob` stores content by SHA-256 with pluggable backends
(LocalFS, GCS, mock). The same revision body uploaded by ten
documents lands as one blob. Consumers reference blobs through
revisions, not directly.

### 2.10 Three-tier source-provenance trace

`donto-trace` resolves surface text to byte-offset spans through
exact-line equality → substring-within-line → full-body fallback,
with cross-shard caching. Backward-fill is idempotent and
resumable. Consumers do not implement provenance; they consume it.

### 2.11 Lean overlay — certifies, does not gate

`donto_engine` certifies shapes and rules via a stdio JSON
protocol. The substrate never blocks on the Lean side. Sidecar
absence degrades shape/rule/cert endpoints only.

### 2.12 Signed release envelopes

Ed25519 over a manifest SHA-256 with `did:key` identifiers.
RO-Crate is the portable export format; JSONL is the native one.
Verification is self-contained and offline-safe.

---

## 3. The substrate contract (what consumers see)

For every guarantee in §2 there is a stable surface consumers
program against:

| Guarantee | Surface |
|---|---|
| Evidence-anchored claims | `donto_assert`, `donto_evidence_link`, `POST /assert`, `POST /evidence/link/*` |
| Bitemporal discipline | `tx_time`, `valid_time` columns; DontoQL `AS_OF`; `donto_event_log` |
| Paraconsistency | `donto_argument` table; `ORDER BY contradiction_pressure`; `donto_v_contradiction_frontier` |
| Predicate alignment | `donto_predicate_alignment`; `PREDICATES EXPAND` / `EXPAND_ABOVE N` / `STRICT`; `donto align` CLI |
| Identity hypotheses | `donto_identity_edge`, `donto_identity_hypothesis`; `IDENTITY_LENS` clause |
| Policy capsules | `donto_policy_capsule`, `donto_authorise`, `donto_effective_actions`; `POLICY ALLOWS` clause |
| DontoQL | `POST /dontoql`, `POST /sparql`; `donto query` CLI |
| Event log | `donto_event_log`; no consumer-side UPDATE on event-logged objects |
| Blob substrate | `BlobStore` trait; `donto_register_blob`; `donto blob` CLI |
| Provenance trace | `donto_trace_log`, `donto_revision_line`, `donto:hasSpan` claims; `donto trace` CLI |
| Lean overlay | `POST /shapes/validate`, `POST /rules/derive`; DIR JSON envelope |
| Release envelopes | `donto release {keygen, sign, verify, build, pipeline}`; RO-Crate output |

These are the published surfaces; everything else in the codebase
is implementation. We will version them under
`donto:contract/major.minor` with semver semantics from M11
onward.

---

## 4. The consumer contract (what consumers must do)

In exchange for the guarantees in §2, consumers commit to:

### 4.1 Context namespacing

Every consumer files its writes under a stable IRI prefix.
Recommended forms:

```
ctx:memory/<module>/<session_id>
ctx:genes/<topic>
ctx:linguistic/<language_or_corpus>
ctx:legal/<case_id>
ctx:medical/<cohort>
```

A consumer must not write under another consumer's namespace
without coordination. The substrate does not enforce this; the
convention is what makes multi-consumer operation safe.

### 4.2 Predicate registry discipline

A consumer that mints new predicates must register descriptors
(`donto_predicate_descriptor`) including label, gloss,
subject-type hint, object-type hint, an example, and a nearest-
neighbour confidence at mint time. The substrate refuses to
register a predicate at maturity ≥ E2 without a descriptor (M10
deliverable; see §6.2). Auto-minted candidate predicates remain
at E1 in the consumer's namespace until reviewed.

### 4.3 Policy declaration

A consumer that registers documents must declare the policy IRI it
is using. The substrate's fail-closed default
(`policy:default/restricted_pending_review`) catches misuse
without preventing it; declared policies make exports tractable.

### 4.4 Modality and extraction-level declaration

If a consumer cares about modality (a memory runtime certainly
will: `model_output` is different from `oral_history` is different
from `community_protocol`), it must declare modality overlays at
ingest. Queries that filter by modality drop statements without an
explicit overlay row, by design.

### 4.5 Overlay registration

A consumer that introduces consumer-specific state (memory salience,
genealogy DNA-match channels, linguistic paradigm-cell coverage)
must register the overlay tables through `donto_overlay_registry`
(M10 deliverable; see §6.1). Overlay tables are bitemporal-lint-
checked and policy-aware.

### 4.6 No direct core mutation

Consumers may not `UPDATE`, `DELETE`, or `TRUNCATE` against any
`donto_*` core table. Reads are unrestricted (subject to policy);
writes go through `donto_assert` / `donto_retract` /
`donto_correct` / `donto_event_log_append` / overlay-table inserts.

### 4.7 Tripwire contribution

A consumer that depends on a substrate invariant (e.g.,
donto-memory depending on bitemporal `AS_OF` returning the
expected pre-revocation view of a preference claim) commits a
tripwire test to `packages/donto-client/tests/invariants_*.rs`.
This is how the substrate stays honest about the contracts it
serves.

### 4.8 Adapter loss reporting

A consumer that imports from or exports to external systems must
produce a `LossReport` describing what the external format cannot
represent (governance, contradiction, time, n-ary frames, anchors,
review state). The substrate enforces this for adapters that ship
in this repository; consumers extending the adapter surface accept
the same discipline.

---

## 5. What's already there

(See `donto-paper-2026-05-28.html` on this site for the long-form
treatment. This is the one-page recap.)

Live as of 2026-05-28:

- **Storage substrate:** 91 PostgreSQL relations, 131 idempotent
  migrations, applied under an advisory lock with a SHA-256 ledger.
- **Live corpus** (the `genes` consumer): **39,294,083 statements,
  938,918 distinct predicates, 19,230 contexts, 1.84 M evidence
  links, 48 GB on disk, 281 retractions** (7 × 10⁻⁶).
- **Query:** DontoQL v2 (21 clauses) + SPARQL 1.1 subset; nested-
  loop evaluator with unification; sub-100 ms point queries through
  1 M rows on the standard hardware.
- **Extraction:** six apertures (surface, linguistic,
  presupposition, inferential, conceivable, recursive); Temporal-
  orchestrated; vocabulary-aware prompting.
- **Trust Kernel:** policy capsules with 15 allowed actions;
  attestations with rationale; fail-closed default;
  F-1 substrate-side gap closed by migration 0123.
- **Identity & alignment:** typed alignment (11 relations × 3 safety
  flags) with materialised closure; identity edges and hypotheses
  with lenses at query time.
- **Provenance:** three-tier trace with cross-shard cache; content-
  addressed blob store with LocalFS / GCS backends; immutable
  document revisions.
- **Lean overlay:** `donto_engine` certifying shapes and rules via
  stdio JSON; `parent-child age-gap` shipped as the worked example;
  parity with the autoresearch-genealogy library is the open work.
- **Release:** JSONL + RO-Crate + Ed25519 + `did:key`; CLI
  `donto release pipeline`.
- **Test surface:** 77 tripwire test files, ~20 K LOC, 592
  `#[tokio::test]`, 91 `#[test]`, 511 `pg_or_skip!`.

The substrate side of M0–M4 is complete. M5–M9 are
application-level or operations-level work. The gaps that prevent
donto from being a *generic* substrate (as opposed to a
substrate that happens to host genealogy well) are listed in §6.

---

## 6. M10 — Substrate Hardening

**M10 is the milestone that makes donto a real substrate.** It
collects the changes needed so a memory runtime can land on top of
donto without bending the substrate toward memory, while genealogy
keeps running against the same instance without bending the
substrate toward genealogy. M10 is twelve concrete deliverables;
none of them is domain-specific.

### 6.1 Overlay extension API

**Status:** new.
**Problem:** Consumers will introduce state that doesn't belong in
core but does need bitemporal discipline, policy inheritance, and
event-log integration. donto-memory needs salience, recall counts,
reconsolidation queues. genes needs DNA-match channels, blocking
indexes, kin-recursion caches. Today's options are *"add to
core"* (wrong) or *"side database"* (also wrong; reintroduces
split-brain).

**Delta:** Introduce `donto_overlay_registry` (migration 0132):

```sql
create table if not exists donto_overlay_registry (
    overlay_iri      text primary key,    -- e.g. ctx:memory/overlay/access
    consumer_iri     text not null,        -- e.g. ctx:memory
    table_name       text not null,        -- e.g. donto_x_memory_access
    owns_key         text not null,        -- statement_id | record_id | ...
    policy_inherits  text not null default 'from_target',
    bitemporal       boolean not null default true,
    description      text,
    registered_by    text not null,
    registered_at    timestamptz not null default now()
);
```

Plus three CLI verbs: `donto overlay register`, `donto overlay
lint`, `donto overlay drop`. Lint checks:
- Overlay table has `tx_time tstzrange` (or `valid_time` for
  world-time overlays) with `lower_inc` constraint.
- Overlay table references at least one substrate primary key.
- Overlay table is in the consumer's prefix namespace.
- Overlay table has no `ON DELETE CASCADE` against core tables (so
  retraction cannot silently delete consumer overlay state).

**Acceptance:** donto-memory's `donto_x_memory_access`,
`donto_x_memory_state`, and `donto_x_reconsolidation_queue`
register cleanly. genes's `donto_x_dna_match_channel` registers
cleanly. Lint catches a deliberately-broken overlay (missing
`tx_time`) in a tripwire test.

### 6.2 Predicate registry minting controls

**Status:** PRD §6.9 spec; not enforced.
**Problem:** 938 K distinct predicates in production. LLM-driven
proliferation has no write-side brake.

**Delta:** Migration 0133 makes `donto_predicate_descriptor`
required for any predicate registered at maturity ≥ E2 *under a
curated context*. A new column
`donto_predicate.minting_status` (`candidate` / `approved` /
`deprecated` / `merged`) drives the gate. Auto-mint by extraction
writes `candidate` status; review promotes to `approved`. The
existing `donto-api align` worker proposes near-duplicate merges
nightly.

**Acceptance:** A curated-context write attempting to use a
predicate without a descriptor fails with a structured error. The
permissive-context write succeeds. `donto predicates audit`
returns counts by minting status. Three tripwires:
descriptor-required, alignment-proposed, merge-applied.

### 6.3 Hot-path policy projection cache

**Status:** identified in `ROADMAP-AFTER-MAY18.md`; not landed.
**Problem:** §12.3 (H8) shows policy gating scales modestly but
becomes a bottleneck on hot paths at 1 M+ rows. donto-memory's hot
path runs hundreds of recall queries per session; policy cannot
re-evaluate per row.

**Delta:** Migration 0134 introduces `donto_effective_policy_cache`,
a matview keyed by `(target_kind, target_id)` materialising
`donto_effective_actions` output. Refreshed on policy assignment
and revocation events via trigger. Hot-path lookups become a single
B-tree probe.

**Acceptance:** A 100 K-row `POLICY ALLOWS` filter completes in
< 100 ms on the standard hardware (vs ~812 ms today). Tripwire
verifies cache consistency under concurrent policy changes.

### 6.4 Identity-lens precomputed clusters

**Status:** identified; not landed.
**Problem:** Identity-lens query evaluation today walks identity
edges per query. For a memory runtime that wants "all preferences
about this user under the `strict` lens", per-query traversal is
too slow.

**Delta:** Migration 0135 introduces
`donto_identity_cluster_cache(hypothesis_iri, symbol_iri,
referent_id)`, refreshed when identity edges or hypotheses change.
DontoQL `IDENTITY_LENS` consults the cache instead of walking
edges. The cache is per-hypothesis, so adding a new lens does not
invalidate the others.

**Acceptance:** A query under `IDENTITY_LENS strict_identity_v1`
on a 10 K-symbol corpus completes in < 50 ms. Adding a new edge
invalidates only one hypothesis's cache.

### 6.5 HTTP-middleware Trust Kernel enforcement (F-1 follow-on)

**Status:** identified; substrate side closed; sidecar side open.
**Problem:** The substrate fails closed if no policy is provided
(F-1 closure, migration 0123), but consumers writing via legacy
ingest paths can still produce unpoliced rows that land on the
default fail-closed policy rather than being refused outright.

**Delta:** A new middleware in `dontosrv` rejects `POST /assert`,
`POST /documents/register`, `POST /documents/revision`, and
`POST /extract/exhaustive` calls that do not name a policy IRI.
The middleware is opt-in by configuration: `enforce_policy_at_http
= true` is the default for new deployments; existing deployments
must explicitly enable it after a migration window.

**Acceptance:** A `curl -X POST /documents/register` without
`policy_iri` returns 400; with `policy_iri` it succeeds. Migration
notes document the enable-after-window procedure. genes corpus
runs cleanly under the enforced middleware after a one-pass
backfill.

### 6.6 Characterisation matviews

**Status:** identified; not landed.
**Problem:** Subject cardinality and polarity-mixed contradictions
do not return in routine time at 39 M rows. Consumers asking *"how
big is my namespace?"* or *"how contested is my preference data?"*
hit the same wall.

**Delta:** Three new matviews (migrations 0136–0138):

```sql
donto_subject_stats             -- per-subject row count, last update
donto_contradiction_pressure    -- per-subject-predicate contradiction count
donto_predicate_proliferation   -- predicate count by namespace prefix
```

All three refresh on a daily systemd timer (`donto-matviews.timer`).
Manual refresh available via `donto refresh-matviews`. Refresh is
incremental where possible (PostgreSQL `REFRESH MATERIALIZED VIEW
CONCURRENTLY`).

**Acceptance:** `donto status` returns subject cardinality in
< 1 s. donto-memory and genes both rely on these matviews for
their dashboards. Tripwire verifies matview consistency after
inserts.

### 6.7 True-deletion path for legal/privacy

**Status:** new.
**Problem:** "No destructive overwrite" is correct for audit, but
personal memory (donto-memory's domain) and sensitive cultural
material (genes's domain) both have legitimate deletion paths.
GDPR right-to-be-forgotten, native-title-sensitive material,
medical-record retraction requirements all need a path that
preserves audit-of-deletion without preserving the deleted
content.

**Delta:** Migration 0139 introduces *encrypted-blob tombstoning*:

1. Blobs may be ingested with an optional `encryption_key_iri`
   referencing a key managed outside the substrate.
2. A new `donto_blob_tombstone` operation marks a blob as
   permanently inaccessible: the key reference is dropped, the
   bytes are overwritten with zero, the tombstone records *that*
   the deletion occurred (by whom, when, under what attestation,
   citing what authority).
3. Claims referencing tombstoned blobs continue to exist as audit
   records but their evidence links resolve to a redaction marker.
4. The Trust Kernel grows a new action: `request_deletion`. A
   holder with this action under a policy assigned to a target
   may initiate tombstoning.

**Acceptance:** GDPR-style deletion against a user-preference blob
produces an audit trail without preserving the blob content.
Re-running a release build against a corpus with tombstoned blobs
emits a `LossReport` noting redactions but does not include the
content. Three tripwires: tombstone-creates-audit-but-not-content,
release-respects-tombstones, tombstone-requires-attestation.

### 6.8 Schema discovery API

**Status:** partial (`/predicates`, `/schema` exist; not
exhaustive).
**Problem:** Consumers binding to donto need to introspect what's
available — predicates, contexts, modalities, extraction levels,
policies, frame types, alignment relations. Today they have to
read SQL or migration files.

**Delta:** A new endpoint family under `/discovery/*`:

```
GET /discovery/contract-version      contract version + supported clauses
GET /discovery/contexts              context tree
GET /discovery/contexts/<iri>        context detail + parents + policies
GET /discovery/predicates            predicates with minting status
GET /discovery/predicates/<iri>      descriptor + alignment edges
GET /discovery/modalities            allowed modality values
GET /discovery/extraction-levels     allowed extraction levels
GET /discovery/policies              policy capsules
GET /discovery/policies/<iri>        capsule detail + assignments
GET /discovery/frame-types           registered frame types
GET /discovery/alignment-relations   the 11 alignment relations with safety flags
GET /discovery/identity-hypotheses   registered identity hypotheses
GET /discovery/overlays              registered consumer overlays (§6.1)
GET /discovery/dontoql-grammar       BNF of the current DontoQL grammar
GET /discovery/openapi               OpenAPI 3.1 spec for everything above
```

All endpoints policy-gate as `read_metadata`. The OpenAPI spec is
the contract for SDK generators.

**Acceptance:** A new consumer with a fresh donto-client install
can render its own UI by hitting `/discovery/*` exclusively, without
parsing migration files.

### 6.9 Lean overlay parity

**Status:** identified; not landed.
**Problem:** `packages/lean/` is skeletal; the developed shape
library lives in `autoresearch-genealogy/lean/Genealogy/`. The two
should converge in the substrate-neutral parts (functional, typed-
literal, transitive closure, inverse, symmetric, parent–child
age-gap as the worked-genealogy-shape example).

**Delta:** Port the substrate-neutral shapes and rules into
`packages/lean/Donto/Shapes/Stdlib.lean` and `packages/lean/Donto/
Rules/Stdlib.lean`. Domain-specific shapes (kinship-recursion
bounds, paradigm-cell coverage) stay in the consumer's tree;
`donto_engine` loads them via a registry.

**Acceptance:** `lake build` from a fresh checkout produces a
working `donto_engine` against the standard shape library. The
genealogy-specific shapes ship in `genes/lean/` and load on
demand. Tripwires for each standard-library shape ship in the
substrate test suite.

### 6.10 Multi-tenant deployment pattern

**Status:** undocumented.
**Problem:** Two consumers (donto-memory and genes) running
against one Postgres instance need a convention for isolation.
Database-per-consumer is too expensive; schema-per-consumer would
break shared substrate primitives.

**Delta:** Document the *context-namespace-with-policy-isolation*
pattern as the canonical multi-tenant deployment:

- Each consumer owns a context namespace (`ctx:memory/*`,
  `ctx:genes/*`).
- Each consumer registers a default policy that all of its
  contexts inherit.
- Cross-consumer reads require attestation under the producing
  consumer's policy.
- Cross-consumer writes are not supported; consumers must publish
  release envelopes if they want their data consumed elsewhere.

The pattern doc lives at `docs/MULTI-TENANT-DEPLOYMENT.md` with
the systemd unit files, Caddy routes, and policy templates for
both deployment models (single-instance and per-consumer-instance).

**Acceptance:** A second consumer can be onboarded onto an
existing donto instance in under 30 minutes of operator time
(create namespace, create default policy, register overlays).

### 6.11 Consumer SDK promise

**Status:** Rust client exists (2,665 LOC, comprehensive);
TypeScript client is partial; Python client is missing.

**Delta:**

- **`donto-client` (Rust):** the reference SDK. Maintain at
  parity with HTTP routes; promise semantic versioning from
  M10.
- **`client-ts`:** ship a 1.0 release covering reads, asserts,
  retracts, and DontoQL submission. The dontopedia frontend
  becomes the integration test.
- **`client-py`:** new. Synchronous and `async` (httpx) variants.
  Coverage: reads, asserts, retracts, DontoQL, policy, evidence
  links. donto-memory and donto-api both consume this. Generated
  from the OpenAPI spec where possible.

**Acceptance:** All three clients ship a smoke-test suite against
a local donto instance. The Python client handles `donto-memory`'s
hot-path query within 50 ms p50.

### 6.12 Policy-aware recall projection

**Status:** new; specifically motivated by memory consumers but
domain-neutral.
**Problem:** A consumer wants to ask *"give me everything in this
context that this attested agent can quote / export / train on"*
in one round-trip rather than per-row policy decisions.

**Delta:** A new SQL function and HTTP route:

```sql
donto_recall_projection(
    p_holder text,
    p_action text,
    p_scope  jsonb,
    p_pattern jsonb,
    p_lens jsonb default null
) returns setof <statement-with-policy-flags>;
```

The function joins through `donto_effective_policy_cache` (§6.3),
applies the identity lens (§6.4), filters by the attestation
chain, and returns rows pre-marked with which actions the caller
may perform on each. Consumers like donto-memory build their
"Memory Evidence Bundle" by post-processing this single result
set.

**Acceptance:** A `POST /recall` returning a 200-row bundle
completes in < 100 ms on the standard hardware. Tripwire verifies
that policy revocation propagates to the projection within one
matview refresh cycle.

---

## 7. M11 — Federation (next after M10)

The M9 federation memo's recommended stack is **RO-Crate + VC + DID
+ DataCite**, with live cross-instance SPARQL `SERVICE` rejected as
the primary layer. M11 lands the *end-to-end demonstration* of this
stack:

- **M11.1** — Two-instance smoke test: instance A signs, instance B
  verifies. No network required; `did:key` round-trip.
- **M11.2** — DataCite registration: a release envelope publishes
  its identifier to the DataCite API; the manifest URL resolves.
- **M11.3** — Cross-instance attestation: instance B's reader can
  prove (via VC) that it is entitled to consume A's release under
  A's policy.
- **M11.4** — Selective disclosure spike: SD-JWT or BBS+ proof that
  redacts claim payloads at presentation time when A's policy
  requires.

The substrate-side work for M11 is minimal (envelope verification
already ships); the work is operational and demonstrational.

---

## 8. M12 — Scale & Calibration

The H10 hard-target run (PRD §25 — sub-100 ms point queries at 10
M rows on standard hardware) is a benchmark, not a feature. M12
lands it formally and adds:

- **M12.1** — H10 lock: run the bench at 10 M rows, publish
  numbers.
- **M12.2** — Predicate audit workflow: a monthly job that
  proposes merge / deprecate candidates from the
  `donto_predicate_proliferation` matview (§6.6).
- **M12.3** — Reviewer-acceptance calibration: comparing
  extraction-time confidence against post-review human
  acceptance, producing per-aperture and per-domain calibration
  curves.
- **M12.4** — Adapter-failure analytics: quarantine-rate dashboards
  per adapter.

---

## 9. Roadmap shape

This is the substrate roadmap. Consumer roadmaps live in their own
trees.

```
M10 Substrate Hardening (2026-Q3)
├─ 6.1  Overlay extension API
├─ 6.2  Predicate minting controls
├─ 6.3  Effective-policy matview
├─ 6.4  Identity-lens cluster cache
├─ 6.5  HTTP-middleware Trust Kernel
├─ 6.6  Characterisation matviews
├─ 6.7  True-deletion path
├─ 6.8  Schema discovery API
├─ 6.9  Lean overlay parity
├─ 6.10 Multi-tenant pattern doc
├─ 6.11 SDK promise (Rust, TS, Python)
└─ 6.12 Recall projection function

M11 Federation (2026-Q4)
├─ 11.1 Two-instance smoke
├─ 11.2 DataCite registration
├─ 11.3 Cross-instance attestation
└─ 11.4 Selective disclosure spike

M12 Scale & Calibration (2027-Q1)
├─ 12.1 H10 lock
├─ 12.2 Predicate audit
├─ 12.3 Reviewer-acceptance calibration
└─ 12.4 Adapter-failure analytics
```

What is *not* on this roadmap, deliberately:

- A "memory module" library inside donto.
- A "kinship recursion" library inside donto.
- A "paradigm cell coverage" library inside donto.
- A built-in LLM extraction loop.
- A built-in identity-resolution model.
- A built-in vector store.
- A built-in UI for any consumer.

Those belong to their consumers.

---

## 10. The substrate test

Before adding any feature, three questions:

**Q1. Would more than one of `donto-memory`, `genes`, `donto-lang`,
and at least one plausible third-party consumer benefit from this
feature?** If only one would benefit, the feature belongs in the
consumer, not the substrate.

**Q2. Does this feature require any consumer-domain vocabulary
baked into a column type or check constraint?** (e.g., a
`kinship_type` enum, a `salience_decay` formula, a
`paradigm_cell_grain` field). If yes, refactor as a registry table
or a consumer overlay (§6.1).

**Q3. Does this feature commit donto to a reasoning, ranking, or
collapse policy a consumer might legitimately disagree with?**
(e.g., a default summarisation behaviour, an automatic identity
merge threshold, a salience-decay schedule). If yes, expose the
behaviour as a lens parameter the consumer chooses at query time.

If the answer to any of these is "I don't know", the feature is
not ready to land in donto. It lands in a consumer first, proves
itself there, and only graduates to substrate after a second
consumer asks for it.

---

## 11. Risks

### 11.1 Mission drift

The most likely failure mode for an evidence-grade substrate that
runs a 39 M-row genealogy corpus is that genealogy quietly becomes
the product. Counter-controls:

- The substrate test (§10) is applied to every feature proposal.
- Domain-specific work lands in `genes/` (a separate repository
  tree, not the donto repo) once `donto-lang` and a second
  third-party consumer are real.
- The PRD reviews every quarter check that no `donto_*` core table
  contains domain-specific columns.

### 11.2 Predicate proliferation

The 938 K-predicate problem (§13 of the May 2026 paper) is the
visible artefact of insufficient minting controls. §6.2 is the
direct fix; §6.6's matview is how we measure it.

### 11.3 Policy bleed

Multi-tenant deployment (§6.10) without careful policy templates
risks one consumer's data leaking into another consumer's exports.
Counter-controls:

- Default policy is fail-closed; cross-consumer reads require
  explicit attestation.
- Release blockers (PRD §17.2) include unresolved policy as a
  hard gate.
- Tripwire: an export from `ctx:memory/*` containing a row from
  `ctx:genes/*` must fail unless explicitly authorised.

### 11.4 Tombstoning vs audit

True-deletion (§6.7) is in tension with append-only discipline
(I3). The encrypted-blob model resolves the tension at the bytes
layer (key dropped, bytes zeroed, *fact-of-deletion* preserved),
but it requires consumers to ingest sensitive blobs with
encryption keys from the start. Retrofitting tombstones over
plaintext blobs is impossible.

Counter-control: document the encrypted-ingest pattern
prominently; ship `donto blob ingest --encrypted` as the default
for sensitive-policy capsules.

### 11.5 Contract creep

The schema-discovery API (§6.8) commits the substrate to a
publicly-visible surface. Once an SDK is generated from
`/discovery/openapi`, breaking changes affect every consumer.
Counter-control: contract-version field, semver discipline,
deprecation cycles measured in quarters not weeks.

### 11.6 Overlay sprawl

The overlay-extension API (§6.1) is meant to absorb consumer-
specific state cleanly. The risk is that consumers register
overlays casually and the substrate's table count balloons.
Counter-control: the `donto overlay lint` step enforces bitemporal
discipline and policy inheritance; quarterly overlay review.

---

## 12. Definition of done for "donto is a true substrate"

donto graduates from "substrate-flavoured product" to "real
substrate" when *all twelve* of the following hold simultaneously:

1. Three independent consumers (`donto-memory`, `genes`,
   `donto-lang`) run against the same donto instance without
   colliding, each in its own context namespace.
2. A new consumer can be onboarded in < 30 minutes of operator
   time.
3. Every `donto_*` core table is domain-neutral (no genealogy or
   memory column).
4. Every consumer's domain state lives in a registered overlay
   (§6.1) under that consumer's IRI prefix.
5. The schema-discovery API (§6.8) is the *only* surface
   consumers consult to bind to the substrate; no migration-file
   reading.
6. The SDK promise (§6.11) ships in Rust, TypeScript, and Python
   with semver discipline.
7. The HTTP middleware enforces policy presence on every write
   path (§6.5).
8. True deletion (§6.7) is operational with a documented
   encrypted-blob pattern.
9. The recall projection (§6.12) returns a policy-flagged bundle
   in < 100 ms on standard hardware at 10 M rows.
10. Two-instance federation (M11.1) round-trips a signed release.
11. The predicate proliferation matview (§6.6) shows
    `approved` predicates outnumbering `candidate` predicates
    across the live corpus.
12. A quarterly substrate review confirms zero domain-specific
    columns added to core in the prior quarter.

When all twelve hold, the substrate is doing what it claims.

---

## 13. The naming

The May 2026 memory-design draft proposed a clean naming split,
which we adopt:

| Name | Role |
|---|---|
| **`donto`** | the evidence operating system; this PRD. |
| **`donto-memory`** | agentic memory runtime (consumer). |
| **`donto-agent`** | SDK / runtime integration for agents (consumer). |
| **`donto-sleep`** | Temporal consolidation workers (consumer-side). |
| **`genes`** | genealogy research workspace (consumer). |
| **`donto-lang`** | language-documentation pilot (consumer). |

donto is the substrate that makes any of these *possible*. None of
them lives inside the donto repository. The substrate's job is to
disappear behind them.

---

## 14. Conclusion

The single thesis of this PRD: **donto succeeds when it
disappears.**

The right test of the substrate is that a memory framework, a
genealogy app, and a language pilot each look at donto, find what
they need, and never have to argue with each other or with us
about the substrate. The May 2026 paper described what donto *is*.
This PRD describes the work that makes that *enough*.

The substrate is two-thirds of the way there. M10 closes the gap
between *"a great database that genealogy happens to run on"* and
*"a domain-neutral evidence operating system anyone can build on"*.
M11 makes it federated. M12 locks the scale numbers. After that,
the substrate's job is to stay out of its consumers' way.

If donto-memory ships in the next year and runs against the same
donto instance that powers `genes.apexpots.com`, without either
consumer compromising the other, the substrate has won. If donto
is ever described as *"the memory framework"* or *"the genealogy
system"* — even by us — the substrate has lost.

---

*End of PRD.*

---

## Appendix A: Mapping from the May 2026 memory-design synthesis

For traceability with the memory-design conversation that
motivated this PRD:

| Memory-design point | donto substrate response |
|---|---|
| "Use donto as the canonical memory database" | §0, §1, §2 — adopted. |
| "Kuzu should not be canonical" | §1 non-mission — not absorbing graph projection systems; consumers may keep Kuzu as a read-optimised cache. |
| "Markdown preferences should not be canonical" | §1 non-mission — markdown is an export/import surface. |
| "Reconsolidation should never rewrite" | §2.8 event log; substrate enforces append-only. Consumer reconsolidation lives in `donto-sleep`. |
| "Memory predicate registry from day one" | §6.2 predicate minting controls. |
| "Policy cost on the hot path" | §6.3 effective-policy projection cache. |
| "Read-time dynamics are not world-time and not belief-time" | §2.2 — explicit. `last_accessed_at` lives in a consumer overlay (§6.1), not in `donto_statement`. |
| "donto_overlay_registry" | §6.1 — adopted. |
| "Memory Evidence Bundle from a `POST /memory/query`" | §6.12 generalised as `donto_recall_projection` so genes and donto-lang can use it too. |
| "DontoDelta language of append/invalidate operations" | Already substrate behaviour; documenting as the *consumer-side* delta format is `donto-memory`'s work, not the substrate's. |
| "Cap per-record reconsolidation frequency" | Consumer concern; lives in `donto-sleep`. |
| "True deletion for legal/privacy" | §6.7 tombstone path. |
| "Naming split: donto / donto-memory / donto-agent / donto-sleep" | §13 — adopted. |

The memory-design draft did most of the conceptual work; this PRD
translates it into substrate-level commitments that hold up for
genealogy, language documentation, and any third consumer
equally.
