A technical report

donto-memory v0.1.0 —
A persistent memory layer for AI agents on the donto evidence substrate

Author: Thomas Davis ([email protected]) · Date: 2026-05-28 · Live: memories.apexpots.com · Code: github.com/thomasdavis/donto-memory · License: Apache-2.0 OR MIT

Abstract. This report describes donto-memory, a persistent memory layer for long-lived AI agents, built as a thin Rust binary on top of the donto bitemporal paraconsistent quad store. Memory is captured as plain text and expanded by a multi-aperture LLM extractor into typed ontological statements that are stored under the same evidence regime as all other claims in the substrate. Recall returns a Memory Evidence Bundle — a ranked list of statements with full provenance, identity-lens resolution, bitemporal time-travel, and per-row policy gating. The system is deployed at memories.apexpots.com, shares a Postgres instance with the 39-million-statement genealogy corpus that exercises the substrate, and is operated as five Rust crates totalling roughly 7,200 lines, two SQL migrations registering six overlay tables, and a 16-action Trust Kernel that the consumer inherits unchanged from the substrate. The report explains the architecture, the save and recall contracts, the five extraction apertures, the substrate handshake at contract 0.1.0-m10, the audit-log observability surface, and the limitations that the v0.1.0 implementation deliberately leaves for later milestones.

Try the live system Agent guide Browse jobs Source code

1Motivation

Modern AI agents are stateless. Each conversation begins with a blank prompt; everything an agent learns about a user, a task, or the world is lost unless an application developer writes their own persistence layer. The current de-facto answer is a vector database with two collections: chat transcripts and "extracted facts." That approach works for surface-level recall but discards the structure that makes the captured information durable across versions of the agent: source attestations, contradictions between sources, the difference between belief and access, the difference between when a fact was true in the world and when we came to believe it, and the policy gate that decides whether a downstream caller is even allowed to see the row.

donto-memory was built to ask a different question. The donto substrate is a bitemporal paraconsistent quad store already deployed in production at genes.apexpots.com, where it holds 39 million statements about North-Queensland genealogy, Aboriginal history, DNA matches, and oral-history corpora. The substrate already supports contradictions as data (two sources disagreeing about Annie Davis's birth year both live forever), bitemporal provenance (valid_time × tx_time), identity lenses, 16-action policy capsules, and an HTTP gateway (dontosrv) with a published M10 contract version. The question is: what does an agentic memory layer look like if it inherits all of that for free?

This report is the v0.1.0 answer. donto-memory is a single Rust binary that exposes a POST /memorize endpoint (raw text in, plus typically 50–300 LLM-extracted ontological statements out) and a POST /recall endpoint (a Memory Evidence Bundle out). Every fact it captures is a real donto_statement row sharing the same paraconsistent + bitemporal + policy-aware semantics as the genealogy corpus next to it. The agent gets a memory layer with the audit, the contradiction handling, and the access governance already wired in.

2Design principles

donto-memory makes three commitments that distinguish it from a typical vector-database-plus-LLM memory stack:

2.1. No silent rewrite

When you re-memorise a contradicting fact, the prior fact is not overwritten. Both live forever as separate donto_statement rows under distinct tx_time ranges. Recall surfaces the latest by default and flags the contradiction. The application chooses what to do. This is identical to the substrate's posture elsewhere: the genealogy corpus stores both "Annie Davis born 1879" and "Annie Davis born 1881" if two reputable sources disagree, and neither belief is silently destroyed.

2.2. Read events are not belief events

Calling /recall does not affect the truth value of any claim. It writes an access event to a separate overlay (donto_x_memory_access), bumps a private recall counter in donto_x_memory_state, and optionally enqueues a reconsolidation task. None of these touch donto_statement or the substrate's tx_time discipline. Reading what we believe is not the same as changing what we believe.

2.3. Policy-aware by default

Every recall asks for a specific action (one of 16 — see §12) and every returned row carries an effective_actions map plus an action_allowed shortcut for the requested action. Rows are not silently dropped when the holder is not attested: they appear with action_allowed=false and (if the action is content-exposing) their values redacted. The default policy is fail-closed for content-exposing actions, read_metadata is the always-permitted floor. The whole gate runs inside the substrate's existing POST /recall Trust Kernel pipeline.

3Architecture

┌──────────────────────────────────────────────────────────────────┐
│  donto-memory (single Rust binary, four clap subcommands)        │
│                                                                  │
│   donto-memory migrate      donto-memory api                     │
│   donto-memory substrate    donto-memory worker                  │
│           │                       │                              │
│           ▼                       ▼                              │
│   ┌────────────────────────────────────────────────────────┐     │
│   │   donto_memory_core (library)                          │     │
│   │   ───────────────────────────                          │     │
│   │   modules:    episodic / semantic-claim / preference   │     │
│   │   hot_path:   recall composer + RRF fusion             │     │
│   │   sleep_path: reflect + apply DontoDelta               │     │
│   │   substrate:  reqwest → dontosrv                       │     │
│   │   overlays:   tokio-postgres helpers                   │     │
│   │   extract:    5-aperture LLM via OpenAI-compatible     │     │
│   └────────────────────────────┬───────────────────────────┘     │
└────────────────────────────────┼─────────────────────────────────┘
                                 ▼
                  ┌────────────────────────────┐
                  │   donto substrate          │
                  │   dontosrv :7879           │
                  │   contract 0.1.0-m10       │
                  │   (any donto instance)     │
                  └────────────────────────────┘

Figure 1 — donto-memory ships as one binary with two long-running modes (api + worker) plus operational subcommands. The library (donto-memory-core) is the only place that knows the substrate's protocol; the binary wraps it in axum routes and a tokio loop.

The deployment topology on the host VM donto-db mirrors the genealogy stack:

donto-memory-api (systemd) listens on 127.0.0.1:7900 and serves the API + documentation surface.
Caddy terminates TLS for memories.apexpots.com (Cloudflare in front, Full SSL mode), reverse-proxying to :7900.
dontosrv on :7879 is the same sidecar that serves genes.apexpots.com; it points at the donto-pg Docker container on :5432.
Postgres is shared. The six donto-memory overlay tables sit alongside the substrate's core schema; an M10 overlay-naming lint guarantees the consumer cannot accidentally shadow substrate identifiers.

The binary's four subcommands are:

migrate — applies the SQL files in migrations/ in lexical order, then calls donto_overlay_register on the substrate for each overlay so the M10 lint accepts them.
api — the long-running axum server.
worker — the sleep-path tokio loop that drains the reconsolidation queue (default poll 5 s).
substrate — the handshake utility that echoes GET /discovery/contract-version and /discovery/substrate-health.

4The save contract

POST /memorize is the single entry point most callers will use. The handler runs four steps in strict order:

4.1. Episodic storage (always)

The raw text is asserted verbatim as a mem:episodic/chunk statement filed under ctx:memory/episodic/session/<session_id>. This is the canonical bytes-on-disk record of what the caller sent. It happens before any LLM call so that the chunk is durable even if extraction fails.

4.2. LLM extraction (optional)

If the runtime has DONTO_MEMORY_LLM_API_KEY configured (production currently routes through OpenRouter to z-ai/glm-5), the chunk is sent to an extractor that asks for typed ontological statements as JSON. Two modes are supported:

single — one LLM call; expects 20–40 facts; latency 5–10 s; cost approximately $0.005–$0.015 per call on the current model.
exhaustive — five aperture prompts (§5) run in parallel via futures::future::join_all; expects 80–300 facts; latency 30–180 s depending on input length; cost approximately $0.04–$0.07.

The default is exhaustive, controllable per-request via "mode" and per-deployment via DONTO_MEMORY_EXTRACT_MODE.

4.3. Semantic ingest

Each extracted fact becomes a typed statement (subject, predicate, object) filed under ctx:memory/claims/session/<session_id> with a source_record_iri link back to the episodic chunk. Content-hash deduplication runs across the union of aperture outputs; duplicates from later apertures increment a dedup_collisions counter rather than producing a second row.

4.4. Receipt

The caller receives the episodic record IDs, every semantic record ID, the per-aperture yields (with elapsed time and any aperture-level errors), token usage, and — as of the 2026-05-28 release — the full list of extracted facts in the response body. The facts payload makes the response self-contained: an agent does not need a follow-up /recall just to see what was captured.

5Five apertures

The exhaustive extractor runs five LLM calls in parallel, each with a different system prompt that asks for a different kind of statement. The apertures span an increasing range of speculation:

Aperture	Captures	Confidence band	Modality tag
surface	What the text explicitly states. "The user lives in Brooklyn" yields `(user, ex:residesIn, brooklyn)`.	0.95–1.0	`asserted`
linguistic	Clause-by-clause decomposition. Every noun phrase → entity, verb phrase → event, modifier → property. Pulls 2–3× the fact count of surface alone.	0.85–1.0	`asserted`
presupposition	What the text takes for granted. "told me" presupposes that user and agent exist and communicate.	0.70–0.95	`hypothesis_only`
inferential	Common-knowledge consequences of stated facts. "lives in Brooklyn" yields "lives in New York City", "lives in USA".	0.40–0.70	`inferred`
conceivable	Claims that could plausibly hold given entity types. "the user has fingers", "the restaurant has a menu".	~0.85	`hypothesis_only`

Apertures run independently and one aperture's failure (LLM JSON parse error, timeout, etc.) does not abort the others. The per-aperture error field is preserved in the response and the job log, so the caller can see which slice of the input the model failed on. In the measured football example (§15) three of five apertures succeeded and the response still contained 48 deduplicated facts.

5.1. Why aperture decomposition

A single "extract every fact you can" prompt under-performs even modest decomposition. The phenomenon is similar to the extraction maximalism work in donto's own extraction pipeline (Davis 2026, internal): asking one model one question implicitly biases it toward surface assertions. Asking the same model five different questions, each scoped to a particular kind of claim, reliably produces an order of magnitude more captured statements without obviously degrading precision. The confidence bands and hypothesis_only tagging give downstream filters (maturity floors, polarity choices) a way to throw out the speculative tail when precision matters.

5.2. Content-key deduplication

Every ExtractedFact exposes a stable content_key() computed as sha256(subject ‖ 0x1f ‖ predicate ‖ 0x1f ‖ object). Apertures are merged into a single ordered output by walking them in order and skipping any fact whose content key has already been seen. Confidence, modality, notes, and the aperture tag are ignored for purposes of dedup — the rationale is that two apertures producing the same triple is more useful than two stored rows that say the same thing under different metadata.

6The recall contract

POST /recall produces a Memory Evidence Bundle: a ranked list of statements with their provenance, gated by policy, optionally resolved through an identity lens, optionally time-shifted to a past tx_time. The handler performs six steps:

Module dispatch. Each enabled memory module (episodic, semantic-claim, preference) runs its own retrieval against the substrate via POST /recall on dontosrv. The substrate returns its own rows; the module wraps them as candidate BundleRows.
Policy gate. Every candidate row passes through the substrate's Trust Kernel. If the requesting holder is not attested for the requested action on the row's source policy, action_allowed is false. permitted_only=true (default) filters denied rows.
Identity-lens resolution. If lens_name is set, the substrate returns the cluster representative for each subject / object IRI. With no lens, every IRI is returned verbatim — useful when you want strict identity semantics.
Bitemporal time-travel. If as_of_tx is set, the substrate returns the rows it currently believed at that timestamp. A claim retracted last week is still visible to an as_of_tx query pointing at "before last week."
Fusion. Candidate rows from each module are merged via Reciprocal Rank Fusion with k=60. A row surfaced by multiple modules ranks higher than a row found by only one.
Side effects. For each returned row, donto-memory writes a row to donto_x_memory_access, bumps recall state in donto_x_memory_state, and (if enabled) enqueues a reconsolidation task in donto_x_memory_reconsolidation_queue. None of these side effects touch donto_statement.

The response shape is fully serialised: every row carries the substrate's statement_id, subject, predicate, both object_iri and object_lit, context, polarity, maturity, the tx_lo/tx_hi range, the lens-resolved subject and object, the full effective_actions map, the action_allowed shortcut, the record_iri, the module_iri that surfaced it, and the fused score + rank. There is no hidden post-processing.

7Memory modules

A memory module is a (form, function, version) tuple implementing two methods: ingest(input) → MemoryRecord and retrieve(query) → Vec<BundleRow>. The default registry ships three modules:

module_iri	form	function	Role
`mem:module/episodic`	token	experiential	Verbatim event / chunk recall. Each ingest writes one `mem:episodic/chunk` statement with the raw text as the object literal. Used for raw user utterances and the always-saved bytes-on-disk record of a memorize call.
`mem:module/semantic-claim`	structured	factual	Extracted typed claims. Each fact becomes one substrate statement; the record anchors to the statement's `statement_id`. The vast majority of post-memorize rows are here.
`mem:module/preference`	structured	preference	Append-only key/value. A subsequent preference on the same key with a different value creates a new statement plus a `supersedes` argument edge to the prior. Recall returns the most recent; the older value remains queryable under `as_of_tx`.

The registry is open: a consumer can add additional modules by inserting a row into donto_x_memory_module and providing a runtime implementation. The substrate cares about the overlay table naming convention, not the module count.

8Overlay tables

donto-memory adds six tables to the shared Postgres database, all prefixed donto_x_memory_* per the substrate's M10 §6.1 overlay-naming lint. The lint requires that each overlay carry a tx_time tstzrange with a lower_inc check and reference at least one substrate (non-overlay) primary key — guarantees that the substrate's bitemporal contract is preserved even when consumer code mutates the overlay.

Table	Role
`donto_x_memory_module`	Registered modules. `form`, `function`, `label`, `description`, `config` jsonb, `enabled` boolean.
`donto_x_memory_record`	One row per unit of memory, anchored to exactly one of `statement_id` / `frame_id` / `context_iri`. Carries `holder_iri`, `session_iri`, `expected_policy_iri`.
`donto_x_memory_access`	Append-only access events. Five kinds — retrieved, surfaced, cited, ignored, corrected. Powers the read / belief distinction.
`donto_x_memory_state`	Per-record derived state: salience, recall count, `last_accessed_at`, `next_review_at`, decay clock. Bitemporal: each state update closes `tx_time` on the prior row and opens a new one.
`donto_x_memory_reconsolidation_queue`	Sleep-path work items with five reasons — recall, contradiction, policy_change, scheduled_review, explicit. Coalesced inside a configurable window to avoid duplicate processing.
`donto_x_memory_job_log`	(Added 2026-05-28.) Per-request audit row capturing full request, full response, endpoint, holder, session id, status code, elapsed ms, model usage. Powers the `/jobs` observability page (§13).

All six are registered with the substrate via donto_overlay_register at migrate time. After registration the substrate's lint considers them part of its own bookkeeping for purposes of contract-version assertions.

9Substrate handshake

donto-memory speaks the substrate's M10 contract floor and refuses to run against an older substrate. At process startup the binary calls GET /discovery/contract-version on the configured dontosrv and aborts unless the reported version is at or above 0.1.0-m10:

2026-05-28T09:35:53Z  INFO donto_memory_core::substrate:
  substrate contract floor satisfied
  actual=0.1.0-m10 floor=0.1.0-m10

The handshake establishes a small set of invariants the runtime can rely on: the existence of POST /recall, POST /arguments/assert, POST /ingest/batch, the 16 named actions, the bitemporal contract for donto_statement.tx_time, and the overlay-registration function. None of these are duck-typed at request time — the floor covers them all up-front.

The diagnostic GET /substrate endpoint on donto-memory echoes the upstream handshake so an operator can verify the binding without having to know dontosrv's URL. Operators monitoring the binding can call this in a loop.

10Identity lenses

The substrate stores every entity reference verbatim. When two references actually refer to the same real-world entity, that fact is recorded as a weighted identity edge — never collapsed at storage. At query time, an identity lens parameter controls how strict the equivalence judgement is. donto-memory passes lens_name through unchanged; the substrate does the clustering work.

Default seeded lenses:

strict_identity_v1 — only edges with confidence ≥ 0.98.
likely_identity_v1 — ≥ 0.85.
exploratory_identity_v1 — ≥ 0.60.

For most agent workloads, null (no lens) is the right default — every IRI is itself, no expansion. The lens parameter becomes interesting when the agent is recalling memories about a person whose name varies across sources, the canonical case in the genealogy testbed: "Annie Davis", "Mrs Watson", "Mary Watson" all index back to the same person if the substrate has accumulated enough corroborating identity edges. With likely_identity_v1, a query about ex:annie-davis returns rows about all three names.

The lens model is the project's answer to the question that LLM-only memory systems answer with vector similarity: "what other memories are about the same thing?" The substrate's answer is more conservative — it requires evidence-backed identity edges to cross between names — but it preserves the ability to disagree about identity, to mark some edges as low-confidence, and to time-shift through periods when the system did not yet believe two names were the same person.

11Bitemporal time-travel

Every claim has two times:

valid_time — when the fact was true in the world.
tx_time — when we believed it.

Recall with "as_of_tx": "2026-05-01T00:00:00Z" returns the rows the system currently believed on that date. If a fact was retracted on 2026-05-15, an as_of_tx=2026-05-10 query still sees it. This is the "what did we know on date X?" pattern, useful for agent self-explanation, regression analysis after a corrective ingest, and forensic debugging of "why did I say that two weeks ago?"

The discipline carries through donto-memory's own overlays. The donto_x_memory_state table is bitemporal: each salience update closes tx_time on the prior state row and opens a new one. So an agent asking "what was this memory's salience on the day I responded to that question?" gets a real answer instead of a current snapshot.

Importantly, the audit-log overlay (donto_x_memory_job_log) is also bitemporal-compatible — a constraint that mattered for the M10 overlay lint to accept it.

12Policy & Trust Kernel

Every recall asks for a specific action. The substrate gates each row based on the source's policy capsule plus the holder's attestation. The 16 actions:

Action	When to ask
`read_metadata`	See that the row exists. Default-permitted, the always-allowed floor.
`read_content`	Read the actual values. The common agent recall case.
`quote`	Include verbatim in a user-visible answer.
`view_anchor_location`	See where in the source the claim was extracted.
`derive_claims`	Extract new derived statements.
`derive_embeddings`	Generate embeddings.
`translate` / `summarize`	Translate or summarise content.
`export_claims` / `export_sources` / `export_anchors`	Include in a release.
`train_model`	Use in model training.
`publish_release`	Include in a citable release.
`share_with_third_party`	Pass to another agent or system.
`federated_query`	Answer a federated query against another donto instance.
`request_deletion`	Initiate tombstoning. Heavyweight; requires its own attestation.

The Trust Kernel is fail-closed for content-exposing actions. In the live deployment, where most sources have the substrate's default policy capsule with no holder-specific attestations, a recall asking read_content sees rows but with values redacted and action_allowed=false. The same recall asking read_metadata sees the same rows with values intact. This produces the "transparent about denial" behaviour described in the M10 PRD: the system never silently drops a row, it always explains which actions it would and would not permit.

A consumer that wants permissive memory semantics (for example a single-user agent reading back its own memories) attaches a donto_attestation to its own holder IRI covering the actions it needs. This is the same attestation mechanism the genealogy stack uses to grant fieldworkers different action sets than external readers.

13Audit log

Every POST /memorize, POST /memorize/batch, POST /recall, and POST /ingest/<module> writes one row to donto_x_memory_job_log on return. The row captures: the full request body, the full response body, endpoint, holder, session id, status code, elapsed milliseconds, and (when applicable) facts-extracted / facts-ingested / rows-returned / model name / token usage.

The log feeds a small observability surface:

GET /jobs — HTML list view. Filter by endpoint or holder. Sortable columns for ms / facts / rows / tokens.
GET /jobs/list.json — JSON variant for programmatic consumers.
GET /jobs/:id — HTML detail page. Renders every extracted fact (for memorize jobs) as a sortable table showing subject, predicate, object, polarity, modality, confidence, and aperture.
GET /jobs/:id/raw — JSON for the same detail.

The detail page is the answer to the question that opened this report: "what did the system actually do?" An operator clicking through to the football example sees the 48 extracted facts in their raw form, keyed by aperture, with the per-aperture errors preserved so the underlying LLM behaviour is legible.

Failures to write the audit row are logged and swallowed; the user-visible API is never blocked by audit-log unavailability.

14Documentation surfaces

The live deployment ships four orthogonal documentation surfaces, all generated at build time from assets baked into the binary:

GET / — long-form homepage, 16 sections, ~2,800 words. The canonical entry point for a human reader.
GET /agent.md / GET /llms.txt — the agent guide. A self-contained markdown document (~3,400 words) aimed at an AI agent that needs to implement memory storage and recall. Includes the contract version, the base URL, the two endpoints, all 16 actions, all five apertures, cost expectations, and cookbook patterns. The /llms.txt variant is served as text/plain per the emerging convention; the /agent.md variant is served as text/markdown for tools that prefer it.
GET /openapi.json / GET /docs — OpenAPI 3.1 and Swagger UI. Machine-readable schemas for every endpoint.
GET /jobs — the observability surface described in §13. The "what the system actually did" complement to the contract-defining documentation above.

The four surfaces are deliberately redundant. A human reader stays on /. A static-analysis tool reads /openapi.json. An AI agent reads /agent.md or /llms.txt. An operator diagnosing a failure reads /jobs/:id. Each surface is generated from the same source-of-truth — the OpenAPI doc is hand-written; the homepage and agent guide are hand-written but kept in sync — so the cost of redundancy is editorial discipline rather than codegen complexity.

15Live measurements

The system has been live for several hours as of the report date. Quantitative state of the deployment:

39.29Msubstrate statements (shared with genes)

939 Kdistinct substrate predicates

19,941substrate contexts

752memory-context statements (ctx:memory/*)

357memory records across 3 modules

318semantic-claim records

36episodic-chunk records

3preference records

6overlay tables registered

~7,200Rust LoC

31Rust files in the workspace

0.1.0-m10substrate contract floor

15.1. The football example

A representative POST /memorize call captured for this report. Input:

Bob suggested I build a football app to track Premier League fixtures. I coded it in TypeScript with React frontend, Node.js backend, and the football-data.org API. Bob used it during the 2024 season to follow Arsenal.

Result (job b03baee0-…):

Mode	`exhaustive` (5 apertures)
Facts extracted	48
Facts ingested	48
Dedup collisions	2
Total tokens	7,240
Elapsed	181.4 s (3 minutes)
Model	`z-ai/glm-5` via OpenRouter
Surface aperture	0 facts (LLM JSON decode error)
Linguistic aperture	0 facts (LLM JSON decode error)
Presupposition aperture	42 facts in 24.2 s
Inferential aperture	8 facts in 96.6 s
Conceivable aperture	0 facts (LLM JSON decode error)

A sample of the 48 extracted statements:

(ex:Bob,    rdf:type,     ex:Person)
(ex:Bob,    ex:hasName,   "Bob")
(ex:Ajax,   rdf:type,     ex:Person)
(ex:Ajax,   ex:hasName,   "Ajax")
(ex:Ajax,   ex:capableOf, ex:BuildingSoftware)
(ex:Ajax,   ex:capableOf, ex:Coding)
(ex:Ajax,   ex:hasSkill,  ex:TypeScript)
(ex:Ajax,   ex:hasSkill,  ex:React)
(ex:Ajax,   ex:hasSkill,  ex:NodeJS)
…

Three observations from this single call:

The presupposition aperture is the workhorse. With three of five apertures returning zero, the captured count still cleared the "explicit-only" baseline by a wide margin — the presuppositional decomposition alone produced 42 facts from three sentences of input.
The aperture-level errors are visible, not hidden. The response and the audit log both preserve the per-aperture error field. An operator diagnosing extraction yield can see immediately that surface, linguistic, and conceivable apertures failed to return parseable JSON on this call.
The elapsed time (~3 minutes) exceeds Cloudflare's 100-second free-tier proxy timeout. Long memorize calls must hit the host on its private IP or fall back to mode: "single"; this is a deployment limitation that §18 discusses.

15.2. Recall behaviour

Recall against the same session — holder=agent:ajax, session_id=football-app-bob-2026, no other filters — with the default fail-closed policy returns 0 rows under read_content (no attestation), 15+ rows under read_metadata (always-permitted floor), with the recalled rows spanning all three modules. Fusion scores hover around 1/(60+rank) per RRF; cross-module duplicates are rare in the current dataset because each fact is anchored to exactly one module.

16Deployment

The full stack on donto-db at the time of writing:

Component	Process	Port
donto-memory API	`donto-memory-api.service`	127.0.0.1:7900
Caddy (TLS + reverse-proxy)	`caddy.service`	:443 → :7900
dontosrv (substrate gateway)	`dontosrv.service`	127.0.0.1:7879
donto-pg (Postgres 16)	Docker container	127.0.0.1:5432
OpenRouter (LLM)	external HTTPS	—

Environment (chmod 600 at /etc/donto-memory/env):

DONTO_MEMORY_DONTOSRV_URL=http://localhost:7879
DONTO_MEMORY_DONTO_DSN=postgres://donto:…@127.0.0.1:5432/donto
DONTO_MEMORY_CONSUMER_IRI=ctx:memory
DONTO_MEMORY_API_BIND=127.0.0.1:7900
DONTO_MEMORY_LLM_BASE_URL=https://openrouter.ai/api/v1
DONTO_MEMORY_LLM_API_KEY=sk-or-v1-…
DONTO_MEMORY_LLM_MODEL=z-ai/glm-5
DONTO_MEMORY_LLM_TEMPERATURE=0.2
DONTO_MEMORY_EXTRACT_MODE=exhaustive

Caddy routes memories.apexpots.com (TLS internal, terminated again by Cloudflare in front) to the local :7900. The genealogy and memory surfaces share Cloudflare zone settings; both run in Cloudflare Full mode, not Full-strict.

A migration run is two commands. Apply SQL to Postgres and register overlays with the substrate:

donto-memory migrate \
    --dontosrv-url $DONTO_MEMORY_DONTOSRV_URL \
    --dsn          $DONTO_MEMORY_DONTO_DSN \
    --dir          /home/ajax/donto-memory/migrations

A typical iteration loop is: edit Rust source under /mnt/donto-data/workspace/donto-memory/ as user ajaxdavis, cargo build --release into the shared target directory at /mnt/donto-data/cargo-target-memory, install to /usr/local/bin/donto-memory, systemctl restart donto-memory-api, verify via https://memories.apexpots.com/health.

The agentic-memory literature falls into three rough camps:

17.1. Vector-DB memory (LangChain Memory, mem0, Letta/MemGPT)

These systems treat memory as semantic search over embeddings of chat turns (and sometimes "extracted facts"). The embeddings layer is responsible for surface-form normalisation; retrieval is approximate-nearest-neighbour over a vector index. The strength is single-call simplicity: write, embed, retrieve. The weakness is structural: contradictions are not first-class, sources are not first-class, time is not first-class. Vector-DB memory is also opinionated about updates (overwrite or coexist) and rarely preserves the lineage from an extracted fact back to the source chunk in a queryable way.

donto-memory inherits the surface-form normalisation problem (it solves it via identity lenses + predicate alignment, not embeddings) and adds structure, provenance, contradiction handling, and time. The cost is that retrieval is not yet semantic-similarity-based; permitted_only=false queries combine subject / predicate / session / module filters plus an optional free-text query. Adding vector retrieval on top of the existing module dispatch is a roadmap item, not a v0.1.0 commitment.

17.2. Knowledge-graph memory (Zep, Cognee, neo4j-based stacks)

Graph-based memory systems share donto-memory's commitment to typed triples and explicit relations. They differ in three ways. First, they lack an explicit substrate-vs-consumer split — the graph schema and the memory schema are the same thing. Second, they are not bitemporal in the strict valid_time × tx_time sense; most are single-time. Third, they almost universally lack a policy gate of the kind the substrate's Trust Kernel provides.

donto-memory's split is deliberate: the consumer code is small and focused on the memory abstraction; everything to do with belief, provenance, contradictions, identity, time, and policy lives in the substrate and is shared with the genealogy workload and any other consumer that bootstraps against the same contract.

17.3. Stateful-agent frameworks (CAMEL, MetaGPT, AutoGen)

These frameworks embed memory inside the agent's runtime — usually as in-process Python objects checkpointed to disk. They optimise for ergonomics; persistence is a serialisation problem, not a database problem. donto-memory occupies a different niche: it expects to outlive any single agent, to be queried by multiple agents, and to interoperate with other consumers via its substrate.

18Limitations

v0.1.0 deliberately ships an incomplete picture. The known limitations:

18.1. LLM extraction is brittle at the boundaries

The football-example measurements (§15) show three of five apertures failing JSON decode on a 3-sentence input. z-ai/glm-5 is reliable for tightly-structured outputs when the prompt fits in a comfortable token budget; aperture-level JSON-parse failures need investigation. Mitigations under consideration: per-aperture retry with a stricter response_format schema, fallback to a smaller-output prompt when the first attempt fails to parse, or replacing the JSON-mode contract with a tool-use contract that the model is more heavily trained on. None of these is implemented in v0.1.0.

18.2. Latency exceeds the Cloudflare proxy timeout

Cloudflare's free tier disconnects after 100 seconds. Exhaustive-mode extraction routinely exceeds this. Public-facing /memorize?mode=exhaustive calls therefore return 524. Callers either fall back to mode: "single", hit the host directly on its private IP, or wait for a queue-based /memorize/async endpoint that returns a job ID immediately and finishes work in the background — designed but not implemented.

18.3. No vector retrieval yet

There is no embedding column on the overlay tables and no nearest- neighbour fallback in the recall composer. Free-text recall is filter- based, not similarity-based. For agentic workloads where the user is likely to recall under a different surface form than the original save, this is a real gap. The roadmap item is to add derive_embeddings to the standard policy capsule and a pgvector-backed extension to the semantic-claim module.

18.4. Default policy is fail-closed

On the live deployment, where most sources have the substrate's default policy with no holder attestations, read_content recalls return rows with action_allowed=false. This is the substrate's correct behaviour but it does mean a fresh deployment needs an attestation to be useful for agents that need to quote content back to users. The bootstrap flow ("how do I get an attestation for my agent?") is currently a manual process and is the highest-priority documentation gap.

18.5. The sleep-path worker is structural, not functional

donto-memory worker exists and drains the reconsolidation queue, but the only "work" it does is mark items completed. The reflection step that would derive new claims from recalled ones — "every recall touched the predicate ex:codedWith, so we should add an inferred preference 'agent:ajax ex:prefersTechStack ex:react'" — is designed (the DontoDelta vocabulary is in donto-memory-core/src/types.rs) but not yet implemented.

18.6. Identity-lens authoring is out of scope

v0.1.0 reads existing lenses but does not author new ones. A production memory system needs the ability to say "these two IRIs refer to the same entity" — at the moment that has to go through the substrate's separate predicate-alignment pipeline.

19Roadmap

The v0.2.0 milestones, in rough priority order:

Async memorize. Return a job ID immediately and process the LLM extraction in a background tokio task. Solves the Cloudflare-timeout problem and enables long inputs.
Vector recall. pgvector embeddings on the semantic-claim module, served behind an opt-in flag so the default behaviour (filter-based recall) remains deterministic.
Reflection. Wire the sleep-path worker to actually derive new claims from recalled ones, write them as DontoDelta steps, and apply via the substrate's existing delta machinery.
Identity-lens authoring. A POST /identity/merge endpoint that an agent can call when it has decided two IRIs refer to the same entity.
Bootstrap attestations. A first-run wizard that grants the deployment's default agent IRI a read_content attestation, with clear documentation on what it would take to revoke or audit.
Multi-tenant boundaries. Today, every memory is filed under the same consumer IRI. The substrate already supports per-tenant policy capsules; donto-memory needs a convention for binding a holder IRI to a tenant scope at bootstrap.

20Conclusion

donto-memory v0.1.0 is a deliberately small consumer over a deliberately rich substrate. The library and the binary together come to roughly 7,200 lines of Rust. Six overlay tables sit alongside the substrate's core schema, all registered through the M10 donto_overlay_register function so the substrate's bitemporal and policy invariants extend to consumer-owned data without modification. Three default memory modules — episodic, semantic-claim, preference — cover the bulk of what an AI-agent memory layer needs, and the architecture supports additional modules without touching the binary.

The contribution is less in any single piece — five-aperture LLM extraction, RRF fusion across modules, identity-lens-aware recall — and more in their integration on top of an evidence substrate that already supports the hard parts. Saving a fact preserves its provenance. Recalling it goes through a policy gate. Contradictions between sources both live forever. Bitemporal queries answer "what did we know on date X." All of these are behaviours that an LLM-and-vector-DB memory system would have to design, implement, and maintain. By the choice to be a thin consumer over donto, the entire class of problems goes away.

The system is live at memories.apexpots.com. The current state of every /memorize and /recall call is browsable at /jobs. The agent-facing contract is documented at /agent.md. The source is at github.com/thomasdavis/donto-memory, Apache-2.0 OR MIT.

donto-memory v0.1.0 — A persistent memory layer for AI agents on the donto evidence substrate