genes.apexpots.com / research source: donto-substrate-prd-2026-05-28.md

donto — Substrate PRD (2026-05-28)

donto — Substrate PRD

Document type: Product requirements document. Successor to PRD-TRUST-KERNEL-001 (2026-05-07). Version: PRD-SUBSTRATE-002. Date: 2026-05-28. Status: Draft. Authors: Thomas Davis, Ajax Davis. Supersedes: parts of docs/ROADMAP-NEXT.md and docs/ROADMAP-AFTER-MAY18.md. Complements (does not replace) the canonical PRD at docs/DONTO-PRD.md.


0. Executive position

donto is infrastructure. Not a memory framework. Not a genealogy application. Not a knowledge graph product. Not an LLM extraction service. Not an ontology editor. Not a citation manager. The substrate is the thing multiple independent consumers run against without colliding and without donto biasing toward any one of them.

The named first-tier consumers we are designing for, in alphabetical order so none of them is the special case:

The product question this PRD answers:

Given that a memory runtime, a genealogy app, a language pilot, and a legal-evidence system all want to run against the same donto instance, what does the substrate need to commit to so none of them has to win at any other's expense?


1. Non-mission

This section is load-bearing. Most product drift in research infrastructure happens by quietly absorbing the use case of the loudest consumer. The following are explicit non-missions; if a proposed feature requires them, it does not belong in donto core.

donto will never own because
Memory salience, recall counts, decay clocks Read-time dynamics of one consumer; lives in donto-memory's overlay.
Family-tree visualisation, GEDCOM-specific edges, kinship-recursion rules Genealogy-domain logic; lives in genes overlays or Lean libraries.
Phonological/morphological paradigm validation Language-domain logic; lives in donto-lang Lean shapes.
LLM extraction prompts, model choice, cost budgets Application policy of donto-api; the substrate exposes ingest, not extraction.
Default reasoning engine (RDFS / OWL closure as a baked-in semantics) Lean overlay certifies opt-in; substrate stays neutral on entailment.
Default entity collapse at write time Identity is a hypothesis (I8). Collapsing belongs in the consumer's query lens.
Default predicate normalisation at write time Alignment is typed and scoped (I7). Closure rides at query time.
User-account management, OAuth, session tokens Sidecar concerns of consumers and gateways.
Notification, alerting, paging Operational concerns; donto-alert-sink is a thin pluggable interface.
Workflow orchestration Lives in Temporal / donto-api-worker, not in pg_donto.
Domain-specific UI TUI exists for substrate operators only; consumer UIs are out of scope.

A simple test (the substrate test) lives in §10. It is a guardrail against drift; we apply it to every feature proposal.


2. The substrate contract (what donto guarantees consumers)

donto offers consumers the following, with stability across major versions and breaking-change discipline across minor versions:

2.1 Evidence-anchored claims

Every statement is filed under exactly one context. Every claim of maturity ≥ E2 carries (or transitively carries) at least one evidence link. The default for a hypothesis_only=false ingest without anchor is to land at E1 (candidate) until provenance is attached. Consumers may opt into the legacy hypothesis_only=true path for explicitly speculative recall, as long as the speculation is flagged at the storage layer.

2.2 Bitemporal discipline

valid_time (world-time) and tx_time (system-time) are non-optional on donto_statement. The same discipline extends to alignments, identity edges, policies, attestations, reviews, and releases via donto_event_log. The contract: consumers can always ask "what did the substrate believe at system-time t?" and "what holds in the world at valid-time t'?" and get a deterministic answer.

2.3 Paraconsistent storage

Two currently-believed rows with the same subject and predicate but conflicting objects, polarities, or valid-time intervals are legal substrate state. donto exposes a contradiction frontier; it does not pick winners. Consumers (memory runtime, genealogy reviewer, language analyst) decide what to do with contradictions through their own argument-edge writes, review decisions, or query lenses.

2.4 Typed predicate alignment

Eleven relations × three safety flags (safe_for_query_expansion, safe_for_export, safe_for_logical_inference). A materialised closure rides at query time when PREDICATES EXPAND is set. The substrate does not auto-collapse aligned predicates at write time.

2.5 Identity as hypothesis

Symbols (freely-minted IRIs) live forever. Identity edges weight coreference between symbols. Identity hypotheses name clustering solutions. Queries pick a lens (strict, likely, exploratory, or a consumer-defined hypothesis IRI) at evaluation time. A merge accepted under one hypothesis never destroys data; querying under a different hypothesis returns the unmerged view.

2.6 Policy capsules with action-level granularity

Fifteen actions: read_metadata, read_content, quote, view_anchor_location, derive_claims, derive_embeddings, translate, summarize, export_claims, export_sources, export_anchors, train_model, publish_release, share_with_third_party, federated_query. The default is fail-closed. Attestation credentials grant subset actions to specific holders for specific purposes with mandatory rationale.

2.7 DontoQL with lens parameters

The 21-clause language exposes scope, polarity, maturity, identity, predicate expansion, modality, extraction level, POLICY ALLOWS, SCHEMA_LENS, bitemporal AS_OF, and contradiction-pressure ordering as first-class clauses. Consumers get one query algebra rather than a stack of ad-hoc SDKs.

2.8 Append-only event log

Non-statement objects (alignments, identity hypotheses, policies, attestations, reviews, releases, predicate descriptors, frames) mutate exclusively through donto_event_log. There is no UPDATE … SET on these tables; all changes are events.

2.9 Content-addressed blob substrate

donto_blob stores content by SHA-256 with pluggable backends (LocalFS, GCS, mock). The same revision body uploaded by ten documents lands as one blob. Consumers reference blobs through revisions, not directly.

2.10 Three-tier source-provenance trace

donto-trace resolves surface text to byte-offset spans through exact-line equality → substring-within-line → full-body fallback, with cross-shard caching. Backward-fill is idempotent and resumable. Consumers do not implement provenance; they consume it.

2.11 Lean overlay — certifies, does not gate

donto_engine certifies shapes and rules via a stdio JSON protocol. The substrate never blocks on the Lean side. Sidecar absence degrades shape/rule/cert endpoints only.

2.12 Signed release envelopes

Ed25519 over a manifest SHA-256 with did:key identifiers. RO-Crate is the portable export format; JSONL is the native one. Verification is self-contained and offline-safe.


3. The substrate contract (what consumers see)

For every guarantee in §2 there is a stable surface consumers program against:

Guarantee Surface
Evidence-anchored claims donto_assert, donto_evidence_link, POST /assert, POST /evidence/link/*
Bitemporal discipline tx_time, valid_time columns; DontoQL AS_OF; donto_event_log
Paraconsistency donto_argument table; ORDER BY contradiction_pressure; donto_v_contradiction_frontier
Predicate alignment donto_predicate_alignment; PREDICATES EXPAND / EXPAND_ABOVE N / STRICT; donto align CLI
Identity hypotheses donto_identity_edge, donto_identity_hypothesis; IDENTITY_LENS clause
Policy capsules donto_policy_capsule, donto_authorise, donto_effective_actions; POLICY ALLOWS clause
DontoQL POST /dontoql, POST /sparql; donto query CLI
Event log donto_event_log; no consumer-side UPDATE on event-logged objects
Blob substrate BlobStore trait; donto_register_blob; donto blob CLI
Provenance trace donto_trace_log, donto_revision_line, donto:hasSpan claims; donto trace CLI
Lean overlay POST /shapes/validate, POST /rules/derive; DIR JSON envelope
Release envelopes donto release {keygen, sign, verify, build, pipeline}; RO-Crate output

These are the published surfaces; everything else in the codebase is implementation. We will version them under donto:contract/major.minor with semver semantics from M11 onward.


4. The consumer contract (what consumers must do)

In exchange for the guarantees in §2, consumers commit to:

4.1 Context namespacing

Every consumer files its writes under a stable IRI prefix. Recommended forms:

ctx:memory/<module>/<session_id>
ctx:genes/<topic>
ctx:linguistic/<language_or_corpus>
ctx:legal/<case_id>
ctx:medical/<cohort>

A consumer must not write under another consumer's namespace without coordination. The substrate does not enforce this; the convention is what makes multi-consumer operation safe.

4.2 Predicate registry discipline

A consumer that mints new predicates must register descriptors (donto_predicate_descriptor) including label, gloss, subject-type hint, object-type hint, an example, and a nearest- neighbour confidence at mint time. The substrate refuses to register a predicate at maturity ≥ E2 without a descriptor (M10 deliverable; see §6.2). Auto-minted candidate predicates remain at E1 in the consumer's namespace until reviewed.

4.3 Policy declaration

A consumer that registers documents must declare the policy IRI it is using. The substrate's fail-closed default (policy:default/restricted_pending_review) catches misuse without preventing it; declared policies make exports tractable.

4.4 Modality and extraction-level declaration

If a consumer cares about modality (a memory runtime certainly will: model_output is different from oral_history is different from community_protocol), it must declare modality overlays at ingest. Queries that filter by modality drop statements without an explicit overlay row, by design.

4.5 Overlay registration

A consumer that introduces consumer-specific state (memory salience, genealogy DNA-match channels, linguistic paradigm-cell coverage) must register the overlay tables through donto_overlay_registry (M10 deliverable; see §6.1). Overlay tables are bitemporal-lint- checked and policy-aware.

4.6 No direct core mutation

Consumers may not UPDATE, DELETE, or TRUNCATE against any donto_* core table. Reads are unrestricted (subject to policy); writes go through donto_assert / donto_retract / donto_correct / donto_event_log_append / overlay-table inserts.

4.7 Tripwire contribution

A consumer that depends on a substrate invariant (e.g., donto-memory depending on bitemporal AS_OF returning the expected pre-revocation view of a preference claim) commits a tripwire test to packages/donto-client/tests/invariants_*.rs. This is how the substrate stays honest about the contracts it serves.

4.8 Adapter loss reporting

A consumer that imports from or exports to external systems must produce a LossReport describing what the external format cannot represent (governance, contradiction, time, n-ary frames, anchors, review state). The substrate enforces this for adapters that ship in this repository; consumers extending the adapter surface accept the same discipline.


5. What's already there

(See donto-paper-2026-05-28.html on this site for the long-form treatment. This is the one-page recap.)

Live as of 2026-05-28:

The substrate side of M0–M4 is complete. M5–M9 are application-level or operations-level work. The gaps that prevent donto from being a generic substrate (as opposed to a substrate that happens to host genealogy well) are listed in §6.


6. M10 — Substrate Hardening

M10 is the milestone that makes donto a real substrate. It collects the changes needed so a memory runtime can land on top of donto without bending the substrate toward memory, while genealogy keeps running against the same instance without bending the substrate toward genealogy. M10 is twelve concrete deliverables; none of them is domain-specific.

6.1 Overlay extension API

Status: new. Problem: Consumers will introduce state that doesn't belong in core but does need bitemporal discipline, policy inheritance, and event-log integration. donto-memory needs salience, recall counts, reconsolidation queues. genes needs DNA-match channels, blocking indexes, kin-recursion caches. Today's options are "add to core" (wrong) or "side database" (also wrong; reintroduces split-brain).

Delta: Introduce donto_overlay_registry (migration 0132):

create table if not exists donto_overlay_registry (
    overlay_iri      text primary key,    -- e.g. ctx:memory/overlay/access
    consumer_iri     text not null,        -- e.g. ctx:memory
    table_name       text not null,        -- e.g. donto_x_memory_access
    owns_key         text not null,        -- statement_id | record_id | ...
    policy_inherits  text not null default 'from_target',
    bitemporal       boolean not null default true,
    description      text,
    registered_by    text not null,
    registered_at    timestamptz not null default now()
);

Plus three CLI verbs: donto overlay register, donto overlay lint, donto overlay drop. Lint checks:

  • Overlay table has tx_time tstzrange (or valid_time for world-time overlays) with lower_inc constraint.
  • Overlay table references at least one substrate primary key.
  • Overlay table is in the consumer's prefix namespace.
  • Overlay table has no ON DELETE CASCADE against core tables (so retraction cannot silently delete consumer overlay state).

Acceptance: donto-memory's donto_x_memory_access, donto_x_memory_state, and donto_x_reconsolidation_queue register cleanly. genes's donto_x_dna_match_channel registers cleanly. Lint catches a deliberately-broken overlay (missing tx_time) in a tripwire test.

6.2 Predicate registry minting controls

Status: PRD §6.9 spec; not enforced. Problem: 938 K distinct predicates in production. LLM-driven proliferation has no write-side brake.

Delta: Migration 0133 makes donto_predicate_descriptor required for any predicate registered at maturity ≥ E2 under a curated context. A new column donto_predicate.minting_status (candidate / approved / deprecated / merged) drives the gate. Auto-mint by extraction writes candidate status; review promotes to approved. The existing donto-api align worker proposes near-duplicate merges nightly.

Acceptance: A curated-context write attempting to use a predicate without a descriptor fails with a structured error. The permissive-context write succeeds. donto predicates audit returns counts by minting status. Three tripwires: descriptor-required, alignment-proposed, merge-applied.

6.3 Hot-path policy projection cache

Status: identified in ROADMAP-AFTER-MAY18.md; not landed. Problem: §12.3 (H8) shows policy gating scales modestly but becomes a bottleneck on hot paths at 1 M+ rows. donto-memory's hot path runs hundreds of recall queries per session; policy cannot re-evaluate per row.

Delta: Migration 0134 introduces donto_effective_policy_cache, a matview keyed by (target_kind, target_id) materialising donto_effective_actions output. Refreshed on policy assignment and revocation events via trigger. Hot-path lookups become a single B-tree probe.

Acceptance: A 100 K-row POLICY ALLOWS filter completes in < 100 ms on the standard hardware (vs ~812 ms today). Tripwire verifies cache consistency under concurrent policy changes.

6.4 Identity-lens precomputed clusters

Status: identified; not landed. Problem: Identity-lens query evaluation today walks identity edges per query. For a memory runtime that wants "all preferences about this user under the strict lens", per-query traversal is too slow.

Delta: Migration 0135 introduces donto_identity_cluster_cache(hypothesis_iri, symbol_iri, referent_id), refreshed when identity edges or hypotheses change. DontoQL IDENTITY_LENS consults the cache instead of walking edges. The cache is per-hypothesis, so adding a new lens does not invalidate the others.

Acceptance: A query under IDENTITY_LENS strict_identity_v1 on a 10 K-symbol corpus completes in < 50 ms. Adding a new edge invalidates only one hypothesis's cache.

6.5 HTTP-middleware Trust Kernel enforcement (F-1 follow-on)

Status: identified; substrate side closed; sidecar side open. Problem: The substrate fails closed if no policy is provided (F-1 closure, migration 0123), but consumers writing via legacy ingest paths can still produce unpoliced rows that land on the default fail-closed policy rather than being refused outright.

Delta: A new middleware in dontosrv rejects POST /assert, POST /documents/register, POST /documents/revision, and POST /extract/exhaustive calls that do not name a policy IRI. The middleware is opt-in by configuration: enforce_policy_at_http = true is the default for new deployments; existing deployments must explicitly enable it after a migration window.

Acceptance: A curl -X POST /documents/register without policy_iri returns 400; with policy_iri it succeeds. Migration notes document the enable-after-window procedure. genes corpus runs cleanly under the enforced middleware after a one-pass backfill.

6.6 Characterisation matviews

Status: identified; not landed. Problem: Subject cardinality and polarity-mixed contradictions do not return in routine time at 39 M rows. Consumers asking "how big is my namespace?" or "how contested is my preference data?" hit the same wall.

Delta: Three new matviews (migrations 0136–0138):

donto_subject_stats             -- per-subject row count, last update
donto_contradiction_pressure    -- per-subject-predicate contradiction count
donto_predicate_proliferation   -- predicate count by namespace prefix

All three refresh on a daily systemd timer (donto-matviews.timer). Manual refresh available via donto refresh-matviews. Refresh is incremental where possible (PostgreSQL REFRESH MATERIALIZED VIEW CONCURRENTLY).

Acceptance: donto status returns subject cardinality in < 1 s. donto-memory and genes both rely on these matviews for their dashboards. Tripwire verifies matview consistency after inserts.

6.7 True-deletion path for legal/privacy

Status: new. Problem: "No destructive overwrite" is correct for audit, but personal memory (donto-memory's domain) and sensitive cultural material (genes's domain) both have legitimate deletion paths. GDPR right-to-be-forgotten, native-title-sensitive material, medical-record retraction requirements all need a path that preserves audit-of-deletion without preserving the deleted content.

Delta: Migration 0139 introduces encrypted-blob tombstoning:

  1. Blobs may be ingested with an optional encryption_key_iri referencing a key managed outside the substrate.
  2. A new donto_blob_tombstone operation marks a blob as permanently inaccessible: the key reference is dropped, the bytes are overwritten with zero, the tombstone records that the deletion occurred (by whom, when, under what attestation, citing what authority).
  3. Claims referencing tombstoned blobs continue to exist as audit records but their evidence links resolve to a redaction marker.
  4. The Trust Kernel grows a new action: request_deletion. A holder with this action under a policy assigned to a target may initiate tombstoning.

Acceptance: GDPR-style deletion against a user-preference blob produces an audit trail without preserving the blob content. Re-running a release build against a corpus with tombstoned blobs emits a LossReport noting redactions but does not include the content. Three tripwires: tombstone-creates-audit-but-not-content, release-respects-tombstones, tombstone-requires-attestation.

6.8 Schema discovery API

Status: partial (/predicates, /schema exist; not exhaustive). Problem: Consumers binding to donto need to introspect what's available — predicates, contexts, modalities, extraction levels, policies, frame types, alignment relations. Today they have to read SQL or migration files.

Delta: A new endpoint family under /discovery/*:

GET /discovery/contract-version      contract version + supported clauses
GET /discovery/contexts              context tree
GET /discovery/contexts/<iri>        context detail + parents + policies
GET /discovery/predicates            predicates with minting status
GET /discovery/predicates/<iri>      descriptor + alignment edges
GET /discovery/modalities            allowed modality values
GET /discovery/extraction-levels     allowed extraction levels
GET /discovery/policies              policy capsules
GET /discovery/policies/<iri>        capsule detail + assignments
GET /discovery/frame-types           registered frame types
GET /discovery/alignment-relations   the 11 alignment relations with safety flags
GET /discovery/identity-hypotheses   registered identity hypotheses
GET /discovery/overlays              registered consumer overlays (§6.1)
GET /discovery/dontoql-grammar       BNF of the current DontoQL grammar
GET /discovery/openapi               OpenAPI 3.1 spec for everything above

All endpoints policy-gate as read_metadata. The OpenAPI spec is the contract for SDK generators.

Acceptance: A new consumer with a fresh donto-client install can render its own UI by hitting /discovery/* exclusively, without parsing migration files.

6.9 Lean overlay parity

Status: identified; not landed. Problem: packages/lean/ is skeletal; the developed shape library lives in autoresearch-genealogy/lean/Genealogy/. The two should converge in the substrate-neutral parts (functional, typed- literal, transitive closure, inverse, symmetric, parent–child age-gap as the worked-genealogy-shape example).

Delta: Port the substrate-neutral shapes and rules into packages/lean/Donto/Shapes/Stdlib.lean and packages/lean/Donto/ Rules/Stdlib.lean. Domain-specific shapes (kinship-recursion bounds, paradigm-cell coverage) stay in the consumer's tree; donto_engine loads them via a registry.

Acceptance: lake build from a fresh checkout produces a working donto_engine against the standard shape library. The genealogy-specific shapes ship in genes/lean/ and load on demand. Tripwires for each standard-library shape ship in the substrate test suite.

6.10 Multi-tenant deployment pattern

Status: undocumented. Problem: Two consumers (donto-memory and genes) running against one Postgres instance need a convention for isolation. Database-per-consumer is too expensive; schema-per-consumer would break shared substrate primitives.

Delta: Document the context-namespace-with-policy-isolation pattern as the canonical multi-tenant deployment:

  • Each consumer owns a context namespace (ctx:memory/*, ctx:genes/*).
  • Each consumer registers a default policy that all of its contexts inherit.
  • Cross-consumer reads require attestation under the producing consumer's policy.
  • Cross-consumer writes are not supported; consumers must publish release envelopes if they want their data consumed elsewhere.

The pattern doc lives at docs/MULTI-TENANT-DEPLOYMENT.md with the systemd unit files, Caddy routes, and policy templates for both deployment models (single-instance and per-consumer-instance).

Acceptance: A second consumer can be onboarded onto an existing donto instance in under 30 minutes of operator time (create namespace, create default policy, register overlays).

6.11 Consumer SDK promise

Status: Rust client exists (2,665 LOC, comprehensive); TypeScript client is partial; Python client is missing.

Delta:

  • donto-client (Rust): the reference SDK. Maintain at parity with HTTP routes; promise semantic versioning from M10.
  • client-ts: ship a 1.0 release covering reads, asserts, retracts, and DontoQL submission. The dontopedia frontend becomes the integration test.
  • client-py: new. Synchronous and async (httpx) variants. Coverage: reads, asserts, retracts, DontoQL, policy, evidence links. donto-memory and donto-api both consume this. Generated from the OpenAPI spec where possible.

Acceptance: All three clients ship a smoke-test suite against a local donto instance. The Python client handles donto-memory's hot-path query within 50 ms p50.

6.12 Policy-aware recall projection

Status: new; specifically motivated by memory consumers but domain-neutral. Problem: A consumer wants to ask "give me everything in this context that this attested agent can quote / export / train on" in one round-trip rather than per-row policy decisions.

Delta: A new SQL function and HTTP route:

donto_recall_projection(
    p_holder text,
    p_action text,
    p_scope  jsonb,
    p_pattern jsonb,
    p_lens jsonb default null
) returns setof <statement-with-policy-flags>;

The function joins through donto_effective_policy_cache (§6.3), applies the identity lens (§6.4), filters by the attestation chain, and returns rows pre-marked with which actions the caller may perform on each. Consumers like donto-memory build their "Memory Evidence Bundle" by post-processing this single result set.

Acceptance: A POST /recall returning a 200-row bundle completes in < 100 ms on the standard hardware. Tripwire verifies that policy revocation propagates to the projection within one matview refresh cycle.


7. M11 — Federation (next after M10)

The M9 federation memo's recommended stack is **RO-Crate + VC + DID

The substrate-side work for M11 is minimal (envelope verification already ships); the work is operational and demonstrational.


8. M12 — Scale & Calibration

The H10 hard-target run (PRD §25 — sub-100 ms point queries at 10 M rows on standard hardware) is a benchmark, not a feature. M12 lands it formally and adds:


9. Roadmap shape

This is the substrate roadmap. Consumer roadmaps live in their own trees.

M10 Substrate Hardening (2026-Q3)
├─ 6.1  Overlay extension API
├─ 6.2  Predicate minting controls
├─ 6.3  Effective-policy matview
├─ 6.4  Identity-lens cluster cache
├─ 6.5  HTTP-middleware Trust Kernel
├─ 6.6  Characterisation matviews
├─ 6.7  True-deletion path
├─ 6.8  Schema discovery API
├─ 6.9  Lean overlay parity
├─ 6.10 Multi-tenant pattern doc
├─ 6.11 SDK promise (Rust, TS, Python)
└─ 6.12 Recall projection function

M11 Federation (2026-Q4)
├─ 11.1 Two-instance smoke
├─ 11.2 DataCite registration
├─ 11.3 Cross-instance attestation
└─ 11.4 Selective disclosure spike

M12 Scale & Calibration (2027-Q1)
├─ 12.1 H10 lock
├─ 12.2 Predicate audit
├─ 12.3 Reviewer-acceptance calibration
└─ 12.4 Adapter-failure analytics

What is not on this roadmap, deliberately:

Those belong to their consumers.


10. The substrate test

Before adding any feature, three questions:

Q1. Would more than one of donto-memory, genes, donto-lang, and at least one plausible third-party consumer benefit from this feature? If only one would benefit, the feature belongs in the consumer, not the substrate.

Q2. Does this feature require any consumer-domain vocabulary baked into a column type or check constraint? (e.g., a kinship_type enum, a salience_decay formula, a paradigm_cell_grain field). If yes, refactor as a registry table or a consumer overlay (§6.1).

Q3. Does this feature commit donto to a reasoning, ranking, or collapse policy a consumer might legitimately disagree with? (e.g., a default summarisation behaviour, an automatic identity merge threshold, a salience-decay schedule). If yes, expose the behaviour as a lens parameter the consumer chooses at query time.

If the answer to any of these is "I don't know", the feature is not ready to land in donto. It lands in a consumer first, proves itself there, and only graduates to substrate after a second consumer asks for it.


11. Risks

11.1 Mission drift

The most likely failure mode for an evidence-grade substrate that runs a 39 M-row genealogy corpus is that genealogy quietly becomes the product. Counter-controls:

  • The substrate test (§10) is applied to every feature proposal.
  • Domain-specific work lands in genes/ (a separate repository tree, not the donto repo) once donto-lang and a second third-party consumer are real.
  • The PRD reviews every quarter check that no donto_* core table contains domain-specific columns.

11.2 Predicate proliferation

The 938 K-predicate problem (§13 of the May 2026 paper) is the visible artefact of insufficient minting controls. §6.2 is the direct fix; §6.6's matview is how we measure it.

11.3 Policy bleed

Multi-tenant deployment (§6.10) without careful policy templates risks one consumer's data leaking into another consumer's exports. Counter-controls:

  • Default policy is fail-closed; cross-consumer reads require explicit attestation.
  • Release blockers (PRD §17.2) include unresolved policy as a hard gate.
  • Tripwire: an export from ctx:memory/* containing a row from ctx:genes/* must fail unless explicitly authorised.

11.4 Tombstoning vs audit

True-deletion (§6.7) is in tension with append-only discipline (I3). The encrypted-blob model resolves the tension at the bytes layer (key dropped, bytes zeroed, fact-of-deletion preserved), but it requires consumers to ingest sensitive blobs with encryption keys from the start. Retrofitting tombstones over plaintext blobs is impossible.

Counter-control: document the encrypted-ingest pattern prominently; ship donto blob ingest --encrypted as the default for sensitive-policy capsules.

11.5 Contract creep

The schema-discovery API (§6.8) commits the substrate to a publicly-visible surface. Once an SDK is generated from /discovery/openapi, breaking changes affect every consumer. Counter-control: contract-version field, semver discipline, deprecation cycles measured in quarters not weeks.

11.6 Overlay sprawl

The overlay-extension API (§6.1) is meant to absorb consumer- specific state cleanly. The risk is that consumers register overlays casually and the substrate's table count balloons. Counter-control: the donto overlay lint step enforces bitemporal discipline and policy inheritance; quarterly overlay review.


12. Definition of done for "donto is a true substrate"

donto graduates from "substrate-flavoured product" to "real substrate" when all twelve of the following hold simultaneously:

  1. Three independent consumers (donto-memory, genes, donto-lang) run against the same donto instance without colliding, each in its own context namespace.
  2. A new consumer can be onboarded in < 30 minutes of operator time.
  3. Every donto_* core table is domain-neutral (no genealogy or memory column).
  4. Every consumer's domain state lives in a registered overlay (§6.1) under that consumer's IRI prefix.
  5. The schema-discovery API (§6.8) is the only surface consumers consult to bind to the substrate; no migration-file reading.
  6. The SDK promise (§6.11) ships in Rust, TypeScript, and Python with semver discipline.
  7. The HTTP middleware enforces policy presence on every write path (§6.5).
  8. True deletion (§6.7) is operational with a documented encrypted-blob pattern.
  9. The recall projection (§6.12) returns a policy-flagged bundle in < 100 ms on standard hardware at 10 M rows.
  10. Two-instance federation (M11.1) round-trips a signed release.
  11. The predicate proliferation matview (§6.6) shows approved predicates outnumbering candidate predicates across the live corpus.
  12. A quarterly substrate review confirms zero domain-specific columns added to core in the prior quarter.

When all twelve hold, the substrate is doing what it claims.


13. The naming

The May 2026 memory-design draft proposed a clean naming split, which we adopt:

Name Role
donto the evidence operating system; this PRD.
donto-memory agentic memory runtime (consumer).
donto-agent SDK / runtime integration for agents (consumer).
donto-sleep Temporal consolidation workers (consumer-side).
genes genealogy research workspace (consumer).
donto-lang language-documentation pilot (consumer).

donto is the substrate that makes any of these possible. None of them lives inside the donto repository. The substrate's job is to disappear behind them.


14. Conclusion

The single thesis of this PRD: donto succeeds when it disappears.

The right test of the substrate is that a memory framework, a genealogy app, and a language pilot each look at donto, find what they need, and never have to argue with each other or with us about the substrate. The May 2026 paper described what donto is. This PRD describes the work that makes that enough.

The substrate is two-thirds of the way there. M10 closes the gap between "a great database that genealogy happens to run on" and "a domain-neutral evidence operating system anyone can build on". M11 makes it federated. M12 locks the scale numbers. After that, the substrate's job is to stay out of its consumers' way.

If donto-memory ships in the next year and runs against the same donto instance that powers genes.apexpots.com, without either consumer compromising the other, the substrate has won. If donto is ever described as "the memory framework" or "the genealogy system" — even by us — the substrate has lost.


End of PRD.


Appendix A: Mapping from the May 2026 memory-design synthesis

For traceability with the memory-design conversation that motivated this PRD:

Memory-design point donto substrate response
"Use donto as the canonical memory database" §0, §1, §2 — adopted.
"Kuzu should not be canonical" §1 non-mission — not absorbing graph projection systems; consumers may keep Kuzu as a read-optimised cache.
"Markdown preferences should not be canonical" §1 non-mission — markdown is an export/import surface.
"Reconsolidation should never rewrite" §2.8 event log; substrate enforces append-only. Consumer reconsolidation lives in donto-sleep.
"Memory predicate registry from day one" §6.2 predicate minting controls.
"Policy cost on the hot path" §6.3 effective-policy projection cache.
"Read-time dynamics are not world-time and not belief-time" §2.2 — explicit. last_accessed_at lives in a consumer overlay (§6.1), not in donto_statement.
"donto_overlay_registry" §6.1 — adopted.
"Memory Evidence Bundle from a POST /memory/query" §6.12 generalised as donto_recall_projection so genes and donto-lang can use it too.
"DontoDelta language of append/invalidate operations" Already substrate behaviour; documenting as the consumer-side delta format is donto-memory's work, not the substrate's.
"Cap per-record reconsolidation frequency" Consumer concern; lives in donto-sleep.
"True deletion for legal/privacy" §6.7 tombstone path.
"Naming split: donto / donto-memory / donto-agent / donto-sleep" §13 — adopted.

The memory-design draft did most of the conceptual work; this PRD translates it into substrate-level commitments that hold up for genealogy, language documentation, and any third consumer equally.