Document type: Product requirements document.
Successor to PRD-TRUST-KERNEL-001 (2026-05-07).
Version: PRD-SUBSTRATE-002. Date:
2026-05-28. Status: Draft. Authors: Thomas Davis, Ajax
Davis. Supersedes: parts of
docs/ROADMAP-NEXT.md and
docs/ROADMAP-AFTER-MAY18.md. Complements (does not replace)
the canonical PRD at docs/DONTO-PRD.md.
donto is infrastructure. Not a memory framework. Not a genealogy application. Not a knowledge graph product. Not an LLM extraction service. Not an ontology editor. Not a citation manager. The substrate is the thing multiple independent consumers run against without colliding and without donto biasing toward any one of them.
The named first-tier consumers we are designing for, in alphabetical order so none of them is the special case:
donto-memory — an agentic-memory
runtime (the framework described in the May 2026 memory-design draft).
Long-lived agents need bitemporal recall, contradiction-preserving
preferences, policy-gated retrieval, identity-stable referents, and
reconsolidation-as-derivation. donto already supplies all of these as
substrate primitives; donto-memory wraps them with read-time dynamics
(salience, recall counts, decay clocks) and a sleep-time workflow
(cluster, reflect, consolidate). donto-memory is not in this
repository and never will be. It is a consumer.
genes — a genealogy research
workspace built against the same donto instance that serves
dontopedia.com. The genealogy corpus has historically driven much of
donto's evolution because it stresses every invariant, but genealogy is
a consumer like any other. The 14-family object model in
DONTO-PRD.md §6 contains zero genealogy-specific
tables: every column donto-side is domain-neutral, and the
genealogy-flavoured pieces (parent–child age-gap shape, Mary Watson
worked example, Annie Davis test fixtures) live in the consumer's tree
or in Lean shape libraries that consumers opt into.
donto-lang — the
language-documentation pilot (PRD §13). Five importers ship today (CLDF,
CoNLL-U, UniMorph, LIFT, EAF); the consumer code that drives real
datasets (Glottolog, UD, UniMorph, LIFT, ELAN) is the pilot, not the
substrate.
third parties — anyone who can implement the consumer contract (§4) and read DontoQL. We do not curate the third-tier consumer list, but we keep the contract stable so somebody can appear.
The product question this PRD answers:
Given that a memory runtime, a genealogy app, a language pilot, and a legal-evidence system all want to run against the same donto instance, what does the substrate need to commit to so none of them has to win at any other's expense?
This section is load-bearing. Most product drift in research infrastructure happens by quietly absorbing the use case of the loudest consumer. The following are explicit non-missions; if a proposed feature requires them, it does not belong in donto core.
| donto will never own | because |
|---|---|
| Memory salience, recall counts, decay clocks | Read-time dynamics of one consumer; lives in
donto-memory's overlay. |
| Family-tree visualisation, GEDCOM-specific edges, kinship-recursion rules | Genealogy-domain logic; lives in genes overlays or Lean
libraries. |
| Phonological/morphological paradigm validation | Language-domain logic; lives in donto-lang Lean
shapes. |
| LLM extraction prompts, model choice, cost budgets | Application policy of donto-api; the substrate exposes
ingest, not extraction. |
| Default reasoning engine (RDFS / OWL closure as a baked-in semantics) | Lean overlay certifies opt-in; substrate stays neutral on entailment. |
| Default entity collapse at write time | Identity is a hypothesis (I8). Collapsing belongs in the consumer's query lens. |
| Default predicate normalisation at write time | Alignment is typed and scoped (I7). Closure rides at query time. |
| User-account management, OAuth, session tokens | Sidecar concerns of consumers and gateways. |
| Notification, alerting, paging | Operational concerns; donto-alert-sink is a thin
pluggable interface. |
| Workflow orchestration | Lives in Temporal / donto-api-worker, not in
pg_donto. |
| Domain-specific UI | TUI exists for substrate operators only; consumer UIs are out of scope. |
A simple test (the substrate test) lives in §10. It is a guardrail against drift; we apply it to every feature proposal.
donto offers consumers the following, with stability across major versions and breaking-change discipline across minor versions:
Every statement is filed under exactly one context. Every claim of
maturity ≥ E2 carries (or transitively carries) at least one evidence
link. The default for a hypothesis_only=false ingest
without anchor is to land at E1 (candidate) until provenance is
attached. Consumers may opt into the legacy
hypothesis_only=true path for explicitly speculative
recall, as long as the speculation is flagged at the storage layer.
valid_time (world-time) and tx_time
(system-time) are non-optional on donto_statement. The same
discipline extends to alignments, identity edges, policies,
attestations, reviews, and releases via donto_event_log.
The contract: consumers can always ask "what did the substrate
believe at system-time t?" and "what holds in the
world at valid-time t'?" and get a deterministic
answer.
Two currently-believed rows with the same subject and predicate but conflicting objects, polarities, or valid-time intervals are legal substrate state. donto exposes a contradiction frontier; it does not pick winners. Consumers (memory runtime, genealogy reviewer, language analyst) decide what to do with contradictions through their own argument-edge writes, review decisions, or query lenses.
Eleven relations × three safety flags
(safe_for_query_expansion, safe_for_export,
safe_for_logical_inference). A materialised closure rides
at query time when PREDICATES EXPAND is set. The substrate
does not auto-collapse aligned predicates at write time.
Symbols (freely-minted IRIs) live forever. Identity edges weight
coreference between symbols. Identity hypotheses name clustering
solutions. Queries pick a lens (strict,
likely, exploratory, or a consumer-defined
hypothesis IRI) at evaluation time. A merge accepted under one
hypothesis never destroys data; querying under a different hypothesis
returns the unmerged view.
Fifteen actions: read_metadata,
read_content, quote,
view_anchor_location, derive_claims,
derive_embeddings, translate,
summarize, export_claims,
export_sources, export_anchors,
train_model, publish_release,
share_with_third_party, federated_query. The
default is fail-closed. Attestation credentials grant subset actions to
specific holders for specific purposes with mandatory rationale.
The 21-clause language exposes scope, polarity, maturity, identity,
predicate expansion, modality, extraction level,
POLICY ALLOWS, SCHEMA_LENS, bitemporal
AS_OF, and contradiction-pressure ordering as first-class
clauses. Consumers get one query algebra rather than a stack of ad-hoc
SDKs.
Non-statement objects (alignments, identity hypotheses, policies,
attestations, reviews, releases, predicate descriptors, frames) mutate
exclusively through donto_event_log. There is no
UPDATE … SET on these tables; all changes are events.
donto_blob stores content by SHA-256 with pluggable
backends (LocalFS, GCS, mock). The same revision body uploaded by ten
documents lands as one blob. Consumers reference blobs through
revisions, not directly.
donto-trace resolves surface text to byte-offset spans
through exact-line equality → substring-within-line → full-body
fallback, with cross-shard caching. Backward-fill is idempotent and
resumable. Consumers do not implement provenance; they consume it.
donto_engine certifies shapes and rules via a stdio JSON
protocol. The substrate never blocks on the Lean side. Sidecar absence
degrades shape/rule/cert endpoints only.
Ed25519 over a manifest SHA-256 with did:key
identifiers. RO-Crate is the portable export format; JSONL is the native
one. Verification is self-contained and offline-safe.
For every guarantee in §2 there is a stable surface consumers program against:
| Guarantee | Surface |
|---|---|
| Evidence-anchored claims | donto_assert, donto_evidence_link,
POST /assert, POST /evidence/link/* |
| Bitemporal discipline | tx_time, valid_time columns; DontoQL
AS_OF; donto_event_log |
| Paraconsistency | donto_argument table;
ORDER BY contradiction_pressure;
donto_v_contradiction_frontier |
| Predicate alignment | donto_predicate_alignment;
PREDICATES EXPAND / EXPAND_ABOVE N /
STRICT; donto align CLI |
| Identity hypotheses | donto_identity_edge,
donto_identity_hypothesis; IDENTITY_LENS
clause |
| Policy capsules | donto_policy_capsule, donto_authorise,
donto_effective_actions; POLICY ALLOWS
clause |
| DontoQL | POST /dontoql, POST /sparql;
donto query CLI |
| Event log | donto_event_log; no consumer-side UPDATE on
event-logged objects |
| Blob substrate | BlobStore trait; donto_register_blob;
donto blob CLI |
| Provenance trace | donto_trace_log, donto_revision_line,
donto:hasSpan claims; donto trace CLI |
| Lean overlay | POST /shapes/validate, POST /rules/derive;
DIR JSON envelope |
| Release envelopes | donto release {keygen, sign, verify, build, pipeline};
RO-Crate output |
These are the published surfaces; everything else in the codebase is
implementation. We will version them under
donto:contract/major.minor with semver semantics from M11
onward.
In exchange for the guarantees in §2, consumers commit to:
Every consumer files its writes under a stable IRI prefix. Recommended forms:
ctx:memory/<module>/<session_id>
ctx:genes/<topic>
ctx:linguistic/<language_or_corpus>
ctx:legal/<case_id>
ctx:medical/<cohort>
A consumer must not write under another consumer's namespace without coordination. The substrate does not enforce this; the convention is what makes multi-consumer operation safe.
A consumer that mints new predicates must register descriptors
(donto_predicate_descriptor) including label, gloss,
subject-type hint, object-type hint, an example, and a nearest-
neighbour confidence at mint time. The substrate refuses to register a
predicate at maturity ≥ E2 without a descriptor (M10 deliverable; see
§6.2). Auto-minted candidate predicates remain at E1 in the consumer's
namespace until reviewed.
A consumer that registers documents must declare the policy IRI it is
using. The substrate's fail-closed default
(policy:default/restricted_pending_review) catches misuse
without preventing it; declared policies make exports tractable.
If a consumer cares about modality (a memory runtime certainly will:
model_output is different from oral_history is
different from community_protocol), it must declare
modality overlays at ingest. Queries that filter by modality drop
statements without an explicit overlay row, by design.
A consumer that introduces consumer-specific state (memory salience,
genealogy DNA-match channels, linguistic paradigm-cell coverage) must
register the overlay tables through donto_overlay_registry
(M10 deliverable; see §6.1). Overlay tables are bitemporal-lint- checked
and policy-aware.
Consumers may not UPDATE, DELETE, or
TRUNCATE against any donto_* core table. Reads
are unrestricted (subject to policy); writes go through
donto_assert / donto_retract /
donto_correct / donto_event_log_append /
overlay-table inserts.
A consumer that depends on a substrate invariant (e.g., donto-memory
depending on bitemporal AS_OF returning the expected
pre-revocation view of a preference claim) commits a tripwire test to
packages/donto-client/tests/invariants_*.rs. This is how
the substrate stays honest about the contracts it serves.
A consumer that imports from or exports to external systems must
produce a LossReport describing what the external format
cannot represent (governance, contradiction, time, n-ary frames,
anchors, review state). The substrate enforces this for adapters that
ship in this repository; consumers extending the adapter surface accept
the same discipline.
(See donto-paper-2026-05-28.html on this site for the
long-form treatment. This is the one-page recap.)
Live as of 2026-05-28:
genes consumer):
39,294,083 statements, 938,918 distinct predicates, 19,230
contexts, 1.84 M evidence links, 48 GB on disk, 281 retractions
(7 × 10⁻⁶).donto_engine certifying
shapes and rules via stdio JSON; parent-child age-gap
shipped as the worked example; parity with the autoresearch-genealogy
library is the open work.did:key; CLI donto release pipeline.#[tokio::test], 91 #[test], 511
pg_or_skip!.The substrate side of M0–M4 is complete. M5–M9 are application-level or operations-level work. The gaps that prevent donto from being a generic substrate (as opposed to a substrate that happens to host genealogy well) are listed in §6.
M10 is the milestone that makes donto a real substrate. It collects the changes needed so a memory runtime can land on top of donto without bending the substrate toward memory, while genealogy keeps running against the same instance without bending the substrate toward genealogy. M10 is twelve concrete deliverables; none of them is domain-specific.
Status: new. Problem: Consumers will introduce state that doesn't belong in core but does need bitemporal discipline, policy inheritance, and event-log integration. donto-memory needs salience, recall counts, reconsolidation queues. genes needs DNA-match channels, blocking indexes, kin-recursion caches. Today's options are "add to core" (wrong) or "side database" (also wrong; reintroduces split-brain).
Delta: Introduce donto_overlay_registry
(migration 0132):
create table if not exists donto_overlay_registry (
overlay_iri text primary key, -- e.g. ctx:memory/overlay/access
consumer_iri text not null, -- e.g. ctx:memory
table_name text not null, -- e.g. donto_x_memory_access
owns_key text not null, -- statement_id | record_id | ...
policy_inherits text not null default 'from_target',
bitemporal boolean not null default true,
description text,
registered_by text not null,
registered_at timestamptz not null default now()
);Plus three CLI verbs: donto overlay register,
donto overlay lint, donto overlay drop. Lint
checks:
tx_time tstzrange (or
valid_time for world-time overlays) with
lower_inc constraint.ON DELETE CASCADE against core
tables (so retraction cannot silently delete consumer overlay
state).Acceptance: donto-memory's
donto_x_memory_access, donto_x_memory_state,
and donto_x_reconsolidation_queue register cleanly. genes's
donto_x_dna_match_channel registers cleanly. Lint catches a
deliberately-broken overlay (missing tx_time) in a tripwire
test.
Status: PRD §6.9 spec; not enforced. Problem: 938 K distinct predicates in production. LLM-driven proliferation has no write-side brake.
Delta: Migration 0133 makes
donto_predicate_descriptor required for any predicate
registered at maturity ≥ E2 under a curated context. A new
column donto_predicate.minting_status
(candidate / approved /
deprecated / merged) drives the gate.
Auto-mint by extraction writes candidate status; review
promotes to approved. The existing
donto-api align worker proposes near-duplicate merges
nightly.
Acceptance: A curated-context write attempting to
use a predicate without a descriptor fails with a structured error. The
permissive-context write succeeds. donto predicates audit
returns counts by minting status. Three tripwires: descriptor-required,
alignment-proposed, merge-applied.
Status: identified in
ROADMAP-AFTER-MAY18.md; not landed.
Problem: §12.3 (H8) shows policy gating scales modestly
but becomes a bottleneck on hot paths at 1 M+ rows. donto-memory's hot
path runs hundreds of recall queries per session; policy cannot
re-evaluate per row.
Delta: Migration 0134 introduces
donto_effective_policy_cache, a matview keyed by
(target_kind, target_id) materialising
donto_effective_actions output. Refreshed on policy
assignment and revocation events via trigger. Hot-path lookups become a
single B-tree probe.
Acceptance: A 100 K-row POLICY ALLOWS
filter completes in < 100 ms on the standard hardware (vs ~812 ms
today). Tripwire verifies cache consistency under concurrent policy
changes.
Status: identified; not landed.
Problem: Identity-lens query evaluation today walks
identity edges per query. For a memory runtime that wants "all
preferences about this user under the strict lens",
per-query traversal is too slow.
Delta: Migration 0135 introduces
donto_identity_cluster_cache(hypothesis_iri, symbol_iri, referent_id),
refreshed when identity edges or hypotheses change. DontoQL
IDENTITY_LENS consults the cache instead of walking edges.
The cache is per-hypothesis, so adding a new lens does not invalidate
the others.
Acceptance: A query under
IDENTITY_LENS strict_identity_v1 on a 10 K-symbol corpus
completes in < 50 ms. Adding a new edge invalidates only one
hypothesis's cache.
Status: identified; substrate side closed; sidecar side open. Problem: The substrate fails closed if no policy is provided (F-1 closure, migration 0123), but consumers writing via legacy ingest paths can still produce unpoliced rows that land on the default fail-closed policy rather than being refused outright.
Delta: A new middleware in dontosrv
rejects POST /assert,
POST /documents/register,
POST /documents/revision, and
POST /extract/exhaustive calls that do not name a policy
IRI. The middleware is opt-in by configuration:
enforce_policy_at_http = true is the default for new
deployments; existing deployments must explicitly enable it after a
migration window.
Acceptance: A
curl -X POST /documents/register without
policy_iri returns 400; with policy_iri it
succeeds. Migration notes document the enable-after-window procedure.
genes corpus runs cleanly under the enforced middleware after a one-pass
backfill.
Status: identified; not landed. Problem: Subject cardinality and polarity-mixed contradictions do not return in routine time at 39 M rows. Consumers asking "how big is my namespace?" or "how contested is my preference data?" hit the same wall.
Delta: Three new matviews (migrations 0136–0138):
donto_subject_stats -- per-subject row count, last update
donto_contradiction_pressure -- per-subject-predicate contradiction count
donto_predicate_proliferation -- predicate count by namespace prefixAll three refresh on a daily systemd timer
(donto-matviews.timer). Manual refresh available via
donto refresh-matviews. Refresh is incremental where
possible (PostgreSQL
REFRESH MATERIALIZED VIEW CONCURRENTLY).
Acceptance: donto status returns
subject cardinality in < 1 s. donto-memory and genes both rely on
these matviews for their dashboards. Tripwire verifies matview
consistency after inserts.
Status: new. Problem: "No destructive overwrite" is correct for audit, but personal memory (donto-memory's domain) and sensitive cultural material (genes's domain) both have legitimate deletion paths. GDPR right-to-be-forgotten, native-title-sensitive material, medical-record retraction requirements all need a path that preserves audit-of-deletion without preserving the deleted content.
Delta: Migration 0139 introduces encrypted-blob tombstoning:
encryption_key_iri referencing a key managed outside the
substrate.donto_blob_tombstone operation marks a blob as
permanently inaccessible: the key reference is dropped, the bytes are
overwritten with zero, the tombstone records that the deletion
occurred (by whom, when, under what attestation, citing what
authority).request_deletion.
A holder with this action under a policy assigned to a target may
initiate tombstoning.Acceptance: GDPR-style deletion against a
user-preference blob produces an audit trail without preserving the blob
content. Re-running a release build against a corpus with tombstoned
blobs emits a LossReport noting redactions but does not
include the content. Three tripwires:
tombstone-creates-audit-but-not-content, release-respects-tombstones,
tombstone-requires-attestation.
Status: partial (/predicates,
/schema exist; not exhaustive). Problem:
Consumers binding to donto need to introspect what's available —
predicates, contexts, modalities, extraction levels, policies, frame
types, alignment relations. Today they have to read SQL or migration
files.
Delta: A new endpoint family under
/discovery/*:
GET /discovery/contract-version contract version + supported clauses
GET /discovery/contexts context tree
GET /discovery/contexts/<iri> context detail + parents + policies
GET /discovery/predicates predicates with minting status
GET /discovery/predicates/<iri> descriptor + alignment edges
GET /discovery/modalities allowed modality values
GET /discovery/extraction-levels allowed extraction levels
GET /discovery/policies policy capsules
GET /discovery/policies/<iri> capsule detail + assignments
GET /discovery/frame-types registered frame types
GET /discovery/alignment-relations the 11 alignment relations with safety flags
GET /discovery/identity-hypotheses registered identity hypotheses
GET /discovery/overlays registered consumer overlays (§6.1)
GET /discovery/dontoql-grammar BNF of the current DontoQL grammar
GET /discovery/openapi OpenAPI 3.1 spec for everything above
All endpoints policy-gate as read_metadata. The OpenAPI
spec is the contract for SDK generators.
Acceptance: A new consumer with a fresh donto-client
install can render its own UI by hitting /discovery/*
exclusively, without parsing migration files.
Status: identified; not landed.
Problem: packages/lean/ is skeletal; the
developed shape library lives in
autoresearch-genealogy/lean/Genealogy/. The two should
converge in the substrate-neutral parts (functional, typed- literal,
transitive closure, inverse, symmetric, parent–child age-gap as the
worked-genealogy-shape example).
Delta: Port the substrate-neutral shapes and rules
into packages/lean/Donto/Shapes/Stdlib.lean and
packages/lean/Donto/ Rules/Stdlib.lean. Domain-specific
shapes (kinship-recursion bounds, paradigm-cell coverage) stay in the
consumer's tree; donto_engine loads them via a
registry.
Acceptance: lake build from a fresh
checkout produces a working donto_engine against the
standard shape library. The genealogy-specific shapes ship in
genes/lean/ and load on demand. Tripwires for each
standard-library shape ship in the substrate test suite.
Status: undocumented. Problem: Two consumers (donto-memory and genes) running against one Postgres instance need a convention for isolation. Database-per-consumer is too expensive; schema-per-consumer would break shared substrate primitives.
Delta: Document the context-namespace-with-policy-isolation pattern as the canonical multi-tenant deployment:
ctx:memory/*,
ctx:genes/*).The pattern doc lives at docs/MULTI-TENANT-DEPLOYMENT.md
with the systemd unit files, Caddy routes, and policy templates for both
deployment models (single-instance and per-consumer-instance).
Acceptance: A second consumer can be onboarded onto an existing donto instance in under 30 minutes of operator time (create namespace, create default policy, register overlays).
Status: Rust client exists (2,665 LOC, comprehensive); TypeScript client is partial; Python client is missing.
Delta:
donto-client (Rust): the reference
SDK. Maintain at parity with HTTP routes; promise semantic versioning
from M10.client-ts: ship a 1.0 release covering
reads, asserts, retracts, and DontoQL submission. The dontopedia
frontend becomes the integration test.client-py: new. Synchronous and
async (httpx) variants. Coverage: reads, asserts, retracts,
DontoQL, policy, evidence links. donto-memory and donto-api both consume
this. Generated from the OpenAPI spec where possible.Acceptance: All three clients ship a smoke-test
suite against a local donto instance. The Python client handles
donto-memory's hot-path query within 50 ms p50.
Status: new; specifically motivated by memory consumers but domain-neutral. Problem: A consumer wants to ask "give me everything in this context that this attested agent can quote / export / train on" in one round-trip rather than per-row policy decisions.
Delta: A new SQL function and HTTP route:
donto_recall_projection(
p_holder text,
p_action text,
p_scope jsonb,
p_pattern jsonb,
p_lens jsonb default null
) returns setof <statement-with-policy-flags>;The function joins through donto_effective_policy_cache
(§6.3), applies the identity lens (§6.4), filters by the attestation
chain, and returns rows pre-marked with which actions the caller may
perform on each. Consumers like donto-memory build their "Memory
Evidence Bundle" by post-processing this single result set.
Acceptance: A POST /recall returning a
200-row bundle completes in < 100 ms on the standard hardware.
Tripwire verifies that policy revocation propagates to the projection
within one matview refresh cycle.
The M9 federation memo's recommended stack is **RO-Crate + VC + DID
SERVICE
rejected as the primary layer. M11 lands the end-to-end
demonstration of this stack:did:key
round-trip.The substrate-side work for M11 is minimal (envelope verification already ships); the work is operational and demonstrational.
The H10 hard-target run (PRD §25 — sub-100 ms point queries at 10 M rows on standard hardware) is a benchmark, not a feature. M12 lands it formally and adds:
donto_predicate_proliferation matview (§6.6).This is the substrate roadmap. Consumer roadmaps live in their own trees.
M10 Substrate Hardening (2026-Q3)
├─ 6.1 Overlay extension API
├─ 6.2 Predicate minting controls
├─ 6.3 Effective-policy matview
├─ 6.4 Identity-lens cluster cache
├─ 6.5 HTTP-middleware Trust Kernel
├─ 6.6 Characterisation matviews
├─ 6.7 True-deletion path
├─ 6.8 Schema discovery API
├─ 6.9 Lean overlay parity
├─ 6.10 Multi-tenant pattern doc
├─ 6.11 SDK promise (Rust, TS, Python)
└─ 6.12 Recall projection function
M11 Federation (2026-Q4)
├─ 11.1 Two-instance smoke
├─ 11.2 DataCite registration
├─ 11.3 Cross-instance attestation
└─ 11.4 Selective disclosure spike
M12 Scale & Calibration (2027-Q1)
├─ 12.1 H10 lock
├─ 12.2 Predicate audit
├─ 12.3 Reviewer-acceptance calibration
└─ 12.4 Adapter-failure analytics
What is not on this roadmap, deliberately:
Those belong to their consumers.
Before adding any feature, three questions:
Q1. Would more than one of donto-memory,
genes, donto-lang, and at least one plausible
third-party consumer benefit from this feature? If only one
would benefit, the feature belongs in the consumer, not the
substrate.
Q2. Does this feature require any consumer-domain vocabulary
baked into a column type or check constraint? (e.g., a
kinship_type enum, a salience_decay formula, a
paradigm_cell_grain field). If yes, refactor as a registry
table or a consumer overlay (§6.1).
Q3. Does this feature commit donto to a reasoning, ranking, or collapse policy a consumer might legitimately disagree with? (e.g., a default summarisation behaviour, an automatic identity merge threshold, a salience-decay schedule). If yes, expose the behaviour as a lens parameter the consumer chooses at query time.
If the answer to any of these is "I don't know", the feature is not ready to land in donto. It lands in a consumer first, proves itself there, and only graduates to substrate after a second consumer asks for it.
The most likely failure mode for an evidence-grade substrate that runs a 39 M-row genealogy corpus is that genealogy quietly becomes the product. Counter-controls:
genes/ (a separate
repository tree, not the donto repo) once donto-lang and a
second third-party consumer are real.donto_*
core table contains domain-specific columns.The 938 K-predicate problem (§13 of the May 2026 paper) is the visible artefact of insufficient minting controls. §6.2 is the direct fix; §6.6's matview is how we measure it.
Multi-tenant deployment (§6.10) without careful policy templates risks one consumer's data leaking into another consumer's exports. Counter-controls:
ctx:memory/* containing a row
from ctx:genes/* must fail unless explicitly
authorised.True-deletion (§6.7) is in tension with append-only discipline (I3). The encrypted-blob model resolves the tension at the bytes layer (key dropped, bytes zeroed, fact-of-deletion preserved), but it requires consumers to ingest sensitive blobs with encryption keys from the start. Retrofitting tombstones over plaintext blobs is impossible.
Counter-control: document the encrypted-ingest pattern prominently;
ship donto blob ingest --encrypted as the default for
sensitive-policy capsules.
The schema-discovery API (§6.8) commits the substrate to a
publicly-visible surface. Once an SDK is generated from
/discovery/openapi, breaking changes affect every consumer.
Counter-control: contract-version field, semver discipline, deprecation
cycles measured in quarters not weeks.
The overlay-extension API (§6.1) is meant to absorb consumer-
specific state cleanly. The risk is that consumers register overlays
casually and the substrate's table count balloons. Counter-control: the
donto overlay lint step enforces bitemporal discipline and
policy inheritance; quarterly overlay review.
donto graduates from "substrate-flavoured product" to "real substrate" when all twelve of the following hold simultaneously:
donto-memory,
genes, donto-lang) run against the same donto
instance without colliding, each in its own context namespace.donto_* core table is domain-neutral (no
genealogy or memory column).approved predicates outnumbering candidate
predicates across the live corpus.When all twelve hold, the substrate is doing what it claims.
The May 2026 memory-design draft proposed a clean naming split, which we adopt:
| Name | Role |
|---|---|
donto |
the evidence operating system; this PRD. |
donto-memory |
agentic memory runtime (consumer). |
donto-agent |
SDK / runtime integration for agents (consumer). |
donto-sleep |
Temporal consolidation workers (consumer-side). |
genes |
genealogy research workspace (consumer). |
donto-lang |
language-documentation pilot (consumer). |
donto is the substrate that makes any of these possible. None of them lives inside the donto repository. The substrate's job is to disappear behind them.
The single thesis of this PRD: donto succeeds when it disappears.
The right test of the substrate is that a memory framework, a genealogy app, and a language pilot each look at donto, find what they need, and never have to argue with each other or with us about the substrate. The May 2026 paper described what donto is. This PRD describes the work that makes that enough.
The substrate is two-thirds of the way there. M10 closes the gap between "a great database that genealogy happens to run on" and "a domain-neutral evidence operating system anyone can build on". M11 makes it federated. M12 locks the scale numbers. After that, the substrate's job is to stay out of its consumers' way.
If donto-memory ships in the next year and runs against the same
donto instance that powers genes.apexpots.com, without
either consumer compromising the other, the substrate has won. If donto
is ever described as "the memory framework" or "the
genealogy system" — even by us — the substrate has lost.
End of PRD.
For traceability with the memory-design conversation that motivated this PRD:
| Memory-design point | donto substrate response |
|---|---|
| "Use donto as the canonical memory database" | §0, §1, §2 — adopted. |
| "Kuzu should not be canonical" | §1 non-mission — not absorbing graph projection systems; consumers may keep Kuzu as a read-optimised cache. |
| "Markdown preferences should not be canonical" | §1 non-mission — markdown is an export/import surface. |
| "Reconsolidation should never rewrite" | §2.8 event log; substrate enforces append-only. Consumer
reconsolidation lives in donto-sleep. |
| "Memory predicate registry from day one" | §6.2 predicate minting controls. |
| "Policy cost on the hot path" | §6.3 effective-policy projection cache. |
| "Read-time dynamics are not world-time and not belief-time" | §2.2 — explicit. last_accessed_at lives in a consumer
overlay (§6.1), not in donto_statement. |
| "donto_overlay_registry" | §6.1 — adopted. |
"Memory Evidence Bundle from a POST /memory/query" |
§6.12 generalised as donto_recall_projection so genes
and donto-lang can use it too. |
| "DontoDelta language of append/invalidate operations" | Already substrate behaviour; documenting as the
consumer-side delta format is donto-memory's work,
not the substrate's. |
| "Cap per-record reconsolidation frequency" | Consumer concern; lives in donto-sleep. |
| "True deletion for legal/privacy" | §6.7 tombstone path. |
| "Naming split: donto / donto-memory / donto-agent / donto-sleep" | §13 — adopted. |
The memory-design draft did most of the conceptual work; this PRD translates it into substrate-level commitments that hold up for genealogy, language documentation, and any third consumer equally.