A technical report
donto-memory v0.1.0 —
A persistent memory layer for AI agents on the donto evidence substrate
memories.apexpots.com, shares a Postgres instance with
the 39-million-statement genealogy corpus that exercises the substrate,
and is operated as five Rust crates totalling roughly 7,200 lines, two
SQL migrations registering six overlay tables, and a 16-action Trust
Kernel that the consumer inherits unchanged from the substrate. The
report explains the architecture, the save and recall contracts, the
five extraction apertures, the substrate handshake at contract
0.1.0-m10, the audit-log observability surface, and the
limitations that the v0.1.0 implementation deliberately leaves for
later milestones.
Try the live system Agent guide Browse jobs Source code
1Motivation
Modern AI agents are stateless. Each conversation begins with a blank prompt; everything an agent learns about a user, a task, or the world is lost unless an application developer writes their own persistence layer. The current de-facto answer is a vector database with two collections: chat transcripts and "extracted facts." That approach works for surface-level recall but discards the structure that makes the captured information durable across versions of the agent: source attestations, contradictions between sources, the difference between belief and access, the difference between when a fact was true in the world and when we came to believe it, and the policy gate that decides whether a downstream caller is even allowed to see the row.
donto-memory was built to ask a different question. The
donto substrate is a bitemporal paraconsistent quad store
already deployed in production at genes.apexpots.com,
where it holds 39 million statements about North-Queensland
genealogy, Aboriginal history, DNA matches, and oral-history corpora.
The substrate already supports contradictions as data (two sources
disagreeing about Annie Davis's birth year both live forever),
bitemporal provenance (valid_time × tx_time),
identity lenses, 16-action policy capsules, and an HTTP gateway
(dontosrv) with a published M10 contract version. The
question is: what does an agentic memory layer look like if it
inherits all of that for free?
This report is the v0.1.0 answer. donto-memory is a single Rust binary
that exposes a POST /memorize endpoint (raw text in, plus
typically 50–300 LLM-extracted ontological statements out) and a
POST /recall endpoint (a Memory Evidence Bundle out).
Every fact it captures is a real donto_statement row
sharing the same paraconsistent + bitemporal + policy-aware semantics
as the genealogy corpus next to it. The agent gets a memory layer with
the audit, the contradiction handling, and the access governance
already wired in.
2Design principles
donto-memory makes three commitments that distinguish it from a typical vector-database-plus-LLM memory stack:
2.1. No silent rewrite
When you re-memorise a contradicting fact, the prior fact is not
overwritten. Both live forever as separate donto_statement
rows under distinct tx_time ranges. Recall surfaces the
latest by default and flags the contradiction. The application
chooses what to do. This is identical to the substrate's posture
elsewhere: the genealogy corpus stores both "Annie Davis born
1879" and "Annie Davis born 1881" if two reputable
sources disagree, and neither belief is silently destroyed.
2.2. Read events are not belief events
Calling /recall does not affect the truth value of any
claim. It writes an access event to a separate overlay
(donto_x_memory_access), bumps a private recall counter
in donto_x_memory_state, and optionally enqueues a
reconsolidation task. None of these touch donto_statement
or the substrate's tx_time discipline. Reading what we
believe is not the same as changing what we believe.
2.3. Policy-aware by default
Every recall asks for a specific action (one of 16 — see §12) and
every returned row carries an effective_actions map plus
an action_allowed shortcut for the requested action.
Rows are not silently dropped when the holder is not attested:
they appear with action_allowed=false and (if the action
is content-exposing) their values redacted. The default policy is
fail-closed for content-exposing actions, read_metadata
is the always-permitted floor. The whole gate runs inside the
substrate's existing POST /recall Trust Kernel pipeline.
3Architecture
┌──────────────────────────────────────────────────────────────────┐
│ donto-memory (single Rust binary, four clap subcommands) │
│ │
│ donto-memory migrate donto-memory api │
│ donto-memory substrate donto-memory worker │
│ │ │ │
│ ▼ ▼ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ donto_memory_core (library) │ │
│ │ ─────────────────────────── │ │
│ │ modules: episodic / semantic-claim / preference │ │
│ │ hot_path: recall composer + RRF fusion │ │
│ │ sleep_path: reflect + apply DontoDelta │ │
│ │ substrate: reqwest → dontosrv │ │
│ │ overlays: tokio-postgres helpers │ │
│ │ extract: 5-aperture LLM via OpenAI-compatible │ │
│ └────────────────────────────┬───────────────────────────┘ │
└────────────────────────────────┼─────────────────────────────────┘
▼
┌────────────────────────────┐
│ donto substrate │
│ dontosrv :7879 │
│ contract 0.1.0-m10 │
│ (any donto instance) │
└────────────────────────────┘
donto-memory-core) is the only place that knows the substrate's protocol; the binary wraps it in axum routes and a tokio loop.
The deployment topology on the host VM donto-db
mirrors the genealogy stack:
- donto-memory-api (systemd) listens on
127.0.0.1:7900and serves the API + documentation surface. - Caddy terminates TLS for
memories.apexpots.com(Cloudflare in front, Full SSL mode), reverse-proxying to:7900. - dontosrv on
:7879is the same sidecar that servesgenes.apexpots.com; it points at thedonto-pgDocker container on:5432. - Postgres is shared. The six donto-memory overlay tables sit alongside the substrate's core schema; an M10 overlay-naming lint guarantees the consumer cannot accidentally shadow substrate identifiers.
The binary's four subcommands are:
migrate— applies the SQL files inmigrations/in lexical order, then callsdonto_overlay_registeron the substrate for each overlay so the M10 lint accepts them.api— the long-running axum server.worker— the sleep-path tokio loop that drains the reconsolidation queue (default poll 5 s).substrate— the handshake utility that echoesGET /discovery/contract-versionand/discovery/substrate-health.
4The save contract
POST /memorize is the single entry point most callers
will use. The handler runs four steps in strict order:
4.1. Episodic storage (always)
The raw text is asserted verbatim as a
mem:episodic/chunk statement filed under
ctx:memory/episodic/session/<session_id>. This is
the canonical bytes-on-disk record of what the caller sent. It
happens before any LLM call so that the chunk is durable even if
extraction fails.
4.2. LLM extraction (optional)
If the runtime has DONTO_MEMORY_LLM_API_KEY configured
(production currently routes through OpenRouter to
z-ai/glm-5), the chunk is sent to an extractor that
asks for typed ontological statements as JSON. Two modes are
supported:
single— one LLM call; expects 20–40 facts; latency 5–10 s; cost approximately $0.005–$0.015 per call on the current model.exhaustive— five aperture prompts (§5) run in parallel viafutures::future::join_all; expects 80–300 facts; latency 30–180 s depending on input length; cost approximately $0.04–$0.07.
The default is exhaustive, controllable per-request via
"mode" and per-deployment via
DONTO_MEMORY_EXTRACT_MODE.
4.3. Semantic ingest
Each extracted fact becomes a typed statement
(subject, predicate, object) filed under
ctx:memory/claims/session/<session_id> with a
source_record_iri link back to the episodic chunk.
Content-hash deduplication runs across the union of aperture outputs;
duplicates from later apertures increment a
dedup_collisions counter rather than producing a second
row.
4.4. Receipt
The caller receives the episodic record IDs, every semantic record
ID, the per-aperture yields (with elapsed time and any aperture-level
errors), token usage, and — as of the 2026-05-28 release — the full
list of extracted facts in the response body. The facts payload makes
the response self-contained: an agent does not need a follow-up
/recall just to see what was captured.
5Five apertures
The exhaustive extractor runs five LLM calls in parallel, each with a different system prompt that asks for a different kind of statement. The apertures span an increasing range of speculation:
| Aperture | Captures | Confidence band | Modality tag |
|---|---|---|---|
| surface | What the text explicitly states. "The user lives in Brooklyn" yields (user, ex:residesIn, brooklyn). |
0.95–1.0 | asserted |
| linguistic | Clause-by-clause decomposition. Every noun phrase → entity, verb phrase → event, modifier → property. Pulls 2–3× the fact count of surface alone. | 0.85–1.0 | asserted |
| presupposition | What the text takes for granted. "told me" presupposes that user and agent exist and communicate. | 0.70–0.95 | hypothesis_only |
| inferential | Common-knowledge consequences of stated facts. "lives in Brooklyn" yields "lives in New York City", "lives in USA". | 0.40–0.70 | inferred |
| conceivable | Claims that could plausibly hold given entity types. "the user has fingers", "the restaurant has a menu". | ~0.85 | hypothesis_only |
Apertures run independently and one aperture's failure (LLM JSON
parse error, timeout, etc.) does not abort the others. The per-aperture
error field is preserved in the response and the
job log, so the caller can see which slice of the input the model
failed on. In the measured football example (§15) three of five
apertures succeeded and the response still contained 48 deduplicated
facts.
5.1. Why aperture decomposition
A single "extract every fact you can" prompt under-performs even
modest decomposition. The phenomenon is similar to the
extraction maximalism work in donto's own extraction
pipeline (Davis 2026, internal): asking one model one question
implicitly biases it toward surface assertions. Asking the same model
five different questions, each scoped to a particular kind of claim,
reliably produces an order of magnitude more captured statements
without obviously degrading precision. The confidence bands and
hypothesis_only tagging give downstream filters
(maturity floors, polarity choices) a way to throw out the speculative
tail when precision matters.
5.2. Content-key deduplication
Every ExtractedFact exposes a stable
content_key() computed as
sha256(subject ‖ 0x1f ‖ predicate ‖ 0x1f ‖ object).
Apertures are merged into a single ordered output by walking them in
order and skipping any fact whose content key has already been seen.
Confidence, modality, notes, and the aperture tag are
ignored for purposes of dedup — the rationale is that two
apertures producing the same triple is more useful than two stored
rows that say the same thing under different metadata.
6The recall contract
POST /recall produces a Memory Evidence Bundle:
a ranked list of statements with their provenance, gated by policy,
optionally resolved through an identity lens, optionally time-shifted
to a past tx_time. The handler performs six steps:
- Module dispatch. Each enabled memory module
(episodic, semantic-claim, preference) runs its own retrieval
against the substrate via
POST /recallondontosrv. The substrate returns its own rows; the module wraps them as candidateBundleRows. - Policy gate. Every candidate row passes through
the substrate's Trust Kernel. If the requesting holder is not
attested for the requested action on the row's source policy,
action_allowedis false.permitted_only=true(default) filters denied rows. - Identity-lens resolution. If
lens_nameis set, the substrate returns the cluster representative for each subject / object IRI. With no lens, every IRI is returned verbatim — useful when you want strict identity semantics. - Bitemporal time-travel. If
as_of_txis set, the substrate returns the rows it currently believed at that timestamp. A claim retracted last week is still visible to anas_of_txquery pointing at "before last week." - Fusion. Candidate rows from each module are
merged via Reciprocal Rank Fusion with
k=60. A row surfaced by multiple modules ranks higher than a row found by only one. - Side effects. For each returned row,
donto-memory writes a row to
donto_x_memory_access, bumps recall state indonto_x_memory_state, and (if enabled) enqueues a reconsolidation task indonto_x_memory_reconsolidation_queue. None of these side effects touchdonto_statement.
The response shape is fully serialised: every row carries the
substrate's statement_id, subject,
predicate, both object_iri and
object_lit, context, polarity,
maturity, the
tx_lo/tx_hi range, the lens-resolved
subject and object, the full effective_actions map, the
action_allowed shortcut, the record_iri,
the module_iri that surfaced it, and the fused
score + rank. There is no hidden
post-processing.
7Memory modules
A memory module is a (form, function, version) tuple
implementing two methods: ingest(input) → MemoryRecord
and retrieve(query) → Vec<BundleRow>. The default
registry ships three modules:
| module_iri | form | function | Role |
|---|---|---|---|
mem:module/episodic |
token | experiential | Verbatim event / chunk recall. Each ingest writes one
mem:episodic/chunk statement with the raw text as
the object literal. Used for raw user utterances and the
always-saved bytes-on-disk record of a memorize call. |
mem:module/semantic-claim |
structured | factual | Extracted typed claims. Each fact becomes one substrate
statement; the record anchors to the statement's
statement_id. The vast majority of post-memorize
rows are here. |
mem:module/preference |
structured | preference | Append-only key/value. A subsequent preference on the same key
with a different value creates a new statement plus a
supersedes argument edge to the prior. Recall
returns the most recent; the older value remains queryable
under as_of_tx. |
The registry is open: a consumer can add additional modules by
inserting a row into donto_x_memory_module and providing
a runtime implementation. The substrate cares about the overlay table
naming convention, not the module count.
8Overlay tables
donto-memory adds six tables to the shared Postgres database, all
prefixed donto_x_memory_* per the substrate's M10
§6.1 overlay-naming lint. The lint requires that each overlay
carry a tx_time tstzrange with a
lower_inc check and reference at least one substrate
(non-overlay) primary key — guarantees that the substrate's
bitemporal contract is preserved even when consumer code mutates the
overlay.
| Table | Role |
|---|---|
donto_x_memory_module |
Registered modules. form, function,
label, description,
config jsonb, enabled boolean. |
donto_x_memory_record |
One row per unit of memory, anchored to exactly one of
statement_id / frame_id /
context_iri. Carries holder_iri,
session_iri, expected_policy_iri. |
donto_x_memory_access |
Append-only access events. Five kinds — retrieved, surfaced, cited, ignored, corrected. Powers the read / belief distinction. |
donto_x_memory_state |
Per-record derived state: salience, recall count,
last_accessed_at, next_review_at,
decay clock. Bitemporal: each state update closes
tx_time on the prior row and opens a new one. |
donto_x_memory_reconsolidation_queue |
Sleep-path work items with five reasons — recall, contradiction, policy_change, scheduled_review, explicit. Coalesced inside a configurable window to avoid duplicate processing. |
donto_x_memory_job_log |
(Added 2026-05-28.) Per-request audit row capturing full
request, full response, endpoint, holder, session id, status
code, elapsed ms, model usage. Powers the
/jobs observability page (§13). |
All six are registered with the substrate via
donto_overlay_register at migrate time.
After registration the substrate's lint considers them part of its
own bookkeeping for purposes of contract-version assertions.
9Substrate handshake
donto-memory speaks the substrate's M10 contract floor and refuses to
run against an older substrate. At process startup the binary calls
GET /discovery/contract-version on the configured
dontosrv and aborts unless the reported version is at
or above 0.1.0-m10:
2026-05-28T09:35:53Z INFO donto_memory_core::substrate:
substrate contract floor satisfied
actual=0.1.0-m10 floor=0.1.0-m10
The handshake establishes a small set of invariants the runtime can
rely on: the existence of POST /recall, POST
/arguments/assert, POST /ingest/batch, the 16
named actions, the bitemporal contract for
donto_statement.tx_time, and the overlay-registration
function. None of these are duck-typed at request time — the floor
covers them all up-front.
The diagnostic GET /substrate endpoint on donto-memory
echoes the upstream handshake so an operator can verify the binding
without having to know dontosrv's URL. Operators monitoring the
binding can call this in a loop.
10Identity lenses
The substrate stores every entity reference verbatim. When two
references actually refer to the same real-world entity, that fact is
recorded as a weighted identity edge — never collapsed at storage.
At query time, an identity lens parameter controls how
strict the equivalence judgement is. donto-memory passes
lens_name through unchanged; the substrate does the
clustering work.
Default seeded lenses:
strict_identity_v1— only edges with confidence ≥ 0.98.likely_identity_v1— ≥ 0.85.exploratory_identity_v1— ≥ 0.60.
For most agent workloads, null (no lens) is the right
default — every IRI is itself, no expansion. The lens parameter
becomes interesting when the agent is recalling memories about a
person whose name varies across sources, the canonical case in the
genealogy testbed: "Annie Davis", "Mrs Watson",
"Mary Watson" all index back to the same person if the
substrate has accumulated enough corroborating identity edges. With
likely_identity_v1, a query about
ex:annie-davis returns rows about all three names.
The lens model is the project's answer to the question that LLM-only memory systems answer with vector similarity: "what other memories are about the same thing?" The substrate's answer is more conservative — it requires evidence-backed identity edges to cross between names — but it preserves the ability to disagree about identity, to mark some edges as low-confidence, and to time-shift through periods when the system did not yet believe two names were the same person.
11Bitemporal time-travel
Every claim has two times:
valid_time— when the fact was true in the world.tx_time— when we believed it.
Recall with "as_of_tx": "2026-05-01T00:00:00Z" returns
the rows the system currently believed on that date. If a
fact was retracted on 2026-05-15, an as_of_tx=2026-05-10
query still sees it. This is the "what did we know on date X?"
pattern, useful for agent self-explanation, regression analysis after
a corrective ingest, and forensic debugging of "why did I say
that two weeks ago?"
The discipline carries through donto-memory's own overlays. The
donto_x_memory_state table is bitemporal: each
salience update closes tx_time on the prior state row and
opens a new one. So an agent asking "what was this memory's salience
on the day I responded to that question?" gets a real answer instead
of a current snapshot.
Importantly, the audit-log overlay (donto_x_memory_job_log)
is also bitemporal-compatible — a constraint that
mattered for the M10 overlay lint to accept it.
12Policy & Trust Kernel
Every recall asks for a specific action. The substrate gates each row based on the source's policy capsule plus the holder's attestation. The 16 actions:
| Action | When to ask |
|---|---|
read_metadata | See that the row exists. Default-permitted, the always-allowed floor. |
read_content | Read the actual values. The common agent recall case. |
quote | Include verbatim in a user-visible answer. |
view_anchor_location | See where in the source the claim was extracted. |
derive_claims | Extract new derived statements. |
derive_embeddings | Generate embeddings. |
translate / summarize | Translate or summarise content. |
export_claims / export_sources / export_anchors | Include in a release. |
train_model | Use in model training. |
publish_release | Include in a citable release. |
share_with_third_party | Pass to another agent or system. |
federated_query | Answer a federated query against another donto instance. |
request_deletion | Initiate tombstoning. Heavyweight; requires its own attestation. |
The Trust Kernel is fail-closed for content-exposing actions. In the
live deployment, where most sources have the substrate's default
policy capsule with no holder-specific attestations, a recall asking
read_content sees rows but with values redacted and
action_allowed=false. The same recall asking
read_metadata sees the same rows with values intact.
This produces the "transparent about denial" behaviour described in
the M10 PRD: the system never silently drops a row, it always
explains which actions it would and would not permit.
A consumer that wants permissive memory semantics (for example a
single-user agent reading back its own memories) attaches a
donto_attestation to its own holder IRI covering the
actions it needs. This is the same attestation mechanism the
genealogy stack uses to grant fieldworkers different action sets than
external readers.
13Audit log
Every POST /memorize, POST /memorize/batch,
POST /recall, and POST /ingest/<module>
writes one row to donto_x_memory_job_log on return. The
row captures: the full request body, the full response body,
endpoint, holder, session id, status code, elapsed milliseconds,
and (when applicable) facts-extracted / facts-ingested / rows-returned /
model name / token usage.
The log feeds a small observability surface:
GET /jobs— HTML list view. Filter by endpoint or holder. Sortable columns for ms / facts / rows / tokens.GET /jobs/list.json— JSON variant for programmatic consumers.GET /jobs/:id— HTML detail page. Renders every extracted fact (for memorize jobs) as a sortable table showing subject, predicate, object, polarity, modality, confidence, and aperture.GET /jobs/:id/raw— JSON for the same detail.
The detail page is the answer to the question that opened this report: "what did the system actually do?" An operator clicking through to the football example sees the 48 extracted facts in their raw form, keyed by aperture, with the per-aperture errors preserved so the underlying LLM behaviour is legible.
Failures to write the audit row are logged and swallowed; the user-visible API is never blocked by audit-log unavailability.
14Documentation surfaces
The live deployment ships four orthogonal documentation surfaces, all generated at build time from assets baked into the binary:
GET /— long-form homepage, 16 sections, ~2,800 words. The canonical entry point for a human reader.GET /agent.md/GET /llms.txt— the agent guide. A self-contained markdown document (~3,400 words) aimed at an AI agent that needs to implement memory storage and recall. Includes the contract version, the base URL, the two endpoints, all 16 actions, all five apertures, cost expectations, and cookbook patterns. The/llms.txtvariant is served astext/plainper the emerging convention; the/agent.mdvariant is served astext/markdownfor tools that prefer it.GET /openapi.json/GET /docs— OpenAPI 3.1 and Swagger UI. Machine-readable schemas for every endpoint.GET /jobs— the observability surface described in §13. The "what the system actually did" complement to the contract-defining documentation above.
The four surfaces are deliberately redundant. A human reader stays
on /. A static-analysis tool reads
/openapi.json. An AI agent reads
/agent.md or /llms.txt. An operator
diagnosing a failure reads /jobs/:id. Each surface is
generated from the same source-of-truth — the OpenAPI doc is
hand-written; the homepage and agent guide are hand-written but kept
in sync — so the cost of redundancy is editorial discipline rather
than codegen complexity.
15Live measurements
The system has been live for several hours as of the report date. Quantitative state of the deployment:
15.1. The football example
A representative POST /memorize call captured for this
report. Input:
Bob suggested I build a football app to track Premier League fixtures. I coded it in TypeScript with React frontend, Node.js backend, and the football-data.org API. Bob used it during the 2024 season to follow Arsenal.
Result (job b03baee0-…):
| Mode | exhaustive (5 apertures) |
|---|---|
| Facts extracted | 48 |
| Facts ingested | 48 |
| Dedup collisions | 2 |
| Total tokens | 7,240 |
| Elapsed | 181.4 s (3 minutes) |
| Model | z-ai/glm-5 via OpenRouter |
| Surface aperture | 0 facts (LLM JSON decode error) |
| Linguistic aperture | 0 facts (LLM JSON decode error) |
| Presupposition aperture | 42 facts in 24.2 s |
| Inferential aperture | 8 facts in 96.6 s |
| Conceivable aperture | 0 facts (LLM JSON decode error) |
A sample of the 48 extracted statements:
(ex:Bob, rdf:type, ex:Person)
(ex:Bob, ex:hasName, "Bob")
(ex:Ajax, rdf:type, ex:Person)
(ex:Ajax, ex:hasName, "Ajax")
(ex:Ajax, ex:capableOf, ex:BuildingSoftware)
(ex:Ajax, ex:capableOf, ex:Coding)
(ex:Ajax, ex:hasSkill, ex:TypeScript)
(ex:Ajax, ex:hasSkill, ex:React)
(ex:Ajax, ex:hasSkill, ex:NodeJS)
…
Three observations from this single call:
- The presupposition aperture is the workhorse. With three of five apertures returning zero, the captured count still cleared the "explicit-only" baseline by a wide margin — the presuppositional decomposition alone produced 42 facts from three sentences of input.
- The aperture-level errors are visible, not hidden. The
response and the audit log both preserve the per-aperture
errorfield. An operator diagnosing extraction yield can see immediately that surface, linguistic, and conceivable apertures failed to return parseable JSON on this call. - The elapsed time (~3 minutes) exceeds Cloudflare's 100-second
free-tier proxy timeout. Long memorize calls must hit the host
on its private IP or fall back to
mode: "single"; this is a deployment limitation that §18 discusses.
15.2. Recall behaviour
Recall against the same session — holder=agent:ajax,
session_id=football-app-bob-2026, no other filters — with
the default fail-closed policy returns 0 rows under
read_content (no attestation), 15+ rows under
read_metadata (always-permitted floor), with the
recalled rows spanning all three modules. Fusion scores hover around
1/(60+rank) per RRF; cross-module duplicates are rare in the
current dataset because each fact is anchored to exactly one module.
16Deployment
The full stack on donto-db at the time of writing:
| Component | Process | Port |
|---|---|---|
| donto-memory API | donto-memory-api.service | 127.0.0.1:7900 |
| Caddy (TLS + reverse-proxy) | caddy.service | :443 → :7900 |
| dontosrv (substrate gateway) | dontosrv.service | 127.0.0.1:7879 |
| donto-pg (Postgres 16) | Docker container | 127.0.0.1:5432 |
| OpenRouter (LLM) | external HTTPS | — |
Environment (chmod 600 at /etc/donto-memory/env):
DONTO_MEMORY_DONTOSRV_URL=http://localhost:7879
DONTO_MEMORY_DONTO_DSN=postgres://donto:…@127.0.0.1:5432/donto
DONTO_MEMORY_CONSUMER_IRI=ctx:memory
DONTO_MEMORY_API_BIND=127.0.0.1:7900
DONTO_MEMORY_LLM_BASE_URL=https://openrouter.ai/api/v1
DONTO_MEMORY_LLM_API_KEY=sk-or-v1-…
DONTO_MEMORY_LLM_MODEL=z-ai/glm-5
DONTO_MEMORY_LLM_TEMPERATURE=0.2
DONTO_MEMORY_EXTRACT_MODE=exhaustive
Caddy routes memories.apexpots.com (TLS internal, terminated
again by Cloudflare in front) to the local
:7900. The genealogy and memory surfaces share Cloudflare
zone settings; both run in Cloudflare Full mode, not
Full-strict.
A migration run is two commands. Apply SQL to Postgres and register overlays with the substrate:
donto-memory migrate \
--dontosrv-url $DONTO_MEMORY_DONTOSRV_URL \
--dsn $DONTO_MEMORY_DONTO_DSN \
--dir /home/ajax/donto-memory/migrations
A typical iteration loop is: edit Rust source under
/mnt/donto-data/workspace/donto-memory/ as user
ajaxdavis, cargo build --release into the
shared target directory at
/mnt/donto-data/cargo-target-memory,
install to /usr/local/bin/donto-memory,
systemctl restart donto-memory-api, verify via
https://memories.apexpots.com/health.
17Related work
The agentic-memory literature falls into three rough camps:
17.1. Vector-DB memory (LangChain Memory, mem0, Letta/MemGPT)
These systems treat memory as semantic search over embeddings of chat turns (and sometimes "extracted facts"). The embeddings layer is responsible for surface-form normalisation; retrieval is approximate-nearest-neighbour over a vector index. The strength is single-call simplicity: write, embed, retrieve. The weakness is structural: contradictions are not first-class, sources are not first-class, time is not first-class. Vector-DB memory is also opinionated about updates (overwrite or coexist) and rarely preserves the lineage from an extracted fact back to the source chunk in a queryable way.
donto-memory inherits the surface-form normalisation problem (it
solves it via identity lenses + predicate alignment, not embeddings)
and adds structure, provenance, contradiction handling, and time. The
cost is that retrieval is not yet semantic-similarity-based;
permitted_only=false queries combine subject /
predicate / session / module filters plus an optional free-text
query. Adding vector retrieval on top of the existing module dispatch
is a roadmap item, not a v0.1.0 commitment.
17.2. Knowledge-graph memory (Zep, Cognee, neo4j-based stacks)
Graph-based memory systems share donto-memory's commitment to typed triples and explicit relations. They differ in three ways. First, they lack an explicit substrate-vs-consumer split — the graph schema and the memory schema are the same thing. Second, they are not bitemporal in the strict valid_time × tx_time sense; most are single-time. Third, they almost universally lack a policy gate of the kind the substrate's Trust Kernel provides.
donto-memory's split is deliberate: the consumer code is small and focused on the memory abstraction; everything to do with belief, provenance, contradictions, identity, time, and policy lives in the substrate and is shared with the genealogy workload and any other consumer that bootstraps against the same contract.
17.3. Stateful-agent frameworks (CAMEL, MetaGPT, AutoGen)
These frameworks embed memory inside the agent's runtime — usually as in-process Python objects checkpointed to disk. They optimise for ergonomics; persistence is a serialisation problem, not a database problem. donto-memory occupies a different niche: it expects to outlive any single agent, to be queried by multiple agents, and to interoperate with other consumers via its substrate.
18Limitations
v0.1.0 deliberately ships an incomplete picture. The known limitations:
18.1. LLM extraction is brittle at the boundaries
The football-example measurements (§15) show three of five
apertures failing JSON decode on a 3-sentence input.
z-ai/glm-5 is reliable for tightly-structured outputs
when the prompt fits in a comfortable token budget; aperture-level
JSON-parse failures need investigation. Mitigations under
consideration: per-aperture retry with a stricter
response_format schema, fallback to a smaller-output
prompt when the first attempt fails to parse, or replacing the
JSON-mode contract with a tool-use contract that the model is more
heavily trained on. None of these is implemented in v0.1.0.
18.2. Latency exceeds the Cloudflare proxy timeout
Cloudflare's free tier disconnects after 100 seconds. Exhaustive-mode
extraction routinely exceeds this. Public-facing
/memorize?mode=exhaustive calls therefore return 524.
Callers either fall back to mode: "single", hit the
host directly on its private IP, or wait for a queue-based
/memorize/async endpoint that returns a job ID
immediately and finishes work in the background — designed but not
implemented.
18.3. No vector retrieval yet
There is no embedding column on the overlay tables and no nearest-
neighbour fallback in the recall composer. Free-text recall is filter-
based, not similarity-based. For agentic workloads where the user is
likely to recall under a different surface form than the original
save, this is a real gap. The roadmap item is to add
derive_embeddings to the standard policy capsule and a
pgvector-backed extension to the semantic-claim module.
18.4. Default policy is fail-closed
On the live deployment, where most sources have the substrate's
default policy with no holder attestations, read_content
recalls return rows with action_allowed=false. This is
the substrate's correct behaviour but it does mean a fresh
deployment needs an attestation to be useful for agents that need
to quote content back to users. The bootstrap flow ("how do I get
an attestation for my agent?") is currently a manual process and is
the highest-priority documentation gap.
18.5. The sleep-path worker is structural, not functional
donto-memory worker exists and drains the
reconsolidation queue, but the only "work" it does is mark items
completed. The reflection step that would derive new claims from
recalled ones — "every recall touched the predicate
ex:codedWith, so we should add an inferred preference
'agent:ajax ex:prefersTechStack ex:react'" — is designed (the
DontoDelta vocabulary is in donto-memory-core/src/types.rs)
but not yet implemented.
18.6. Identity-lens authoring is out of scope
v0.1.0 reads existing lenses but does not author new ones. A production memory system needs the ability to say "these two IRIs refer to the same entity" — at the moment that has to go through the substrate's separate predicate-alignment pipeline.
19Roadmap
The v0.2.0 milestones, in rough priority order:
- Async memorize. Return a job ID immediately and process the LLM extraction in a background tokio task. Solves the Cloudflare-timeout problem and enables long inputs.
- Vector recall. pgvector embeddings on the semantic-claim module, served behind an opt-in flag so the default behaviour (filter-based recall) remains deterministic.
- Reflection. Wire the sleep-path worker to actually derive new claims from recalled ones, write them as DontoDelta steps, and apply via the substrate's existing delta machinery.
- Identity-lens authoring. A
POST /identity/mergeendpoint that an agent can call when it has decided two IRIs refer to the same entity. - Bootstrap attestations. A first-run wizard
that grants the deployment's default agent IRI a
read_contentattestation, with clear documentation on what it would take to revoke or audit. - Multi-tenant boundaries. Today, every memory is filed under the same consumer IRI. The substrate already supports per-tenant policy capsules; donto-memory needs a convention for binding a holder IRI to a tenant scope at bootstrap.
20Conclusion
donto-memory v0.1.0 is a deliberately small consumer over a
deliberately rich substrate. The library and the binary together come
to roughly 7,200 lines of Rust. Six overlay tables sit alongside the
substrate's core schema, all registered through the M10
donto_overlay_register function so the substrate's
bitemporal and policy invariants extend to consumer-owned data
without modification. Three default memory modules — episodic,
semantic-claim, preference — cover the bulk of what an AI-agent
memory layer needs, and the architecture supports additional modules
without touching the binary.
The contribution is less in any single piece — five-aperture LLM extraction, RRF fusion across modules, identity-lens-aware recall — and more in their integration on top of an evidence substrate that already supports the hard parts. Saving a fact preserves its provenance. Recalling it goes through a policy gate. Contradictions between sources both live forever. Bitemporal queries answer "what did we know on date X." All of these are behaviours that an LLM-and-vector-DB memory system would have to design, implement, and maintain. By the choice to be a thin consumer over donto, the entire class of problems goes away.
The system is live at
memories.apexpots.com.
The current state of every /memorize and
/recall call is browsable at
/jobs.
The agent-facing contract is documented at
/agent.md.
The source is at
github.com/thomasdavis/donto-memory,
Apache-2.0 OR MIT.