genes.apexpots.com / research source: donto-memory-deep-mode-engine-2026-05-31.md

donto-memory deep-mode — engine reference

Date: 2026-05-31 Scope: Everything the mode: "deep" extraction pipeline does, from a POST /memorize request to the rows that land in the substrate. Covers code paths, prompts, dedup, salvage, token usage, the async queue, the audit log, the job-detail surface, observed empirics, and the known limits.

This document complements the empirical case study at /research/donto-deep-mode-eternal-recurrence-2026-05-31.html. Where that report reads results, this one explains the machinery that produced them.

1. What deep mode is

mode: "deep" is the iterative-novelty extraction lane inside donto-memory. Where mode: "single" does one LLM call with a maximalist prompt (~30 facts target) and mode: "exhaustive" does five parallel calls under different rhetorical “apertures” (surface / linguistic / presupposition / inferential / conceivable), mode: "deep" does N sequential calls of the same prompt, each shown a list of the facts the prior passes already produced. The model’s task each pass is only to find things the earlier passes missed.

The design intent:

No rigid per-pass prompt rotation. The user explicitly rejected a fixed sequence (“avoid any rigid prompts”). The model picks its own divergence direction, prompted only by what it has and hasn’t already said.
One call at a time. The next pass can’t be cached or parallelised because it depends on the cumulative fact set so far.
Stop when told (passes param, default 3, currently 7 from omega-bot). No automatic saturation detection — the operator decides how hard to push.
Hard dedup at the end via content-key hashing. The model’s incentive is breadth + novelty; duplicates get dropped silently.

Configuration is a single HTTP knob:

{
  "mode": "deep",
  "passes": 7,
  "modality": "descriptive",
  "holder": "agent:omega-bot",
  "session_id": "discord:1349727923434815519:1497274794586931220",
  "text": "...",
  "images": []
}

Synonyms: "sequential", "iterative" route to the same code path.

2. Request lifecycle

omega-bot
   │  POST https://memories.apexpots.com/memorize
   ▼
caddy (TLS, gzip)
   │
   ▼
donto-memory-api (127.0.0.1:7900, axum)
   │
   ├─► should_defer(req, default_mode)?
   │     true if mode ∈ {deep, exhaustive, sequential, iterative, multi, apertures}
   │     OR req.r#async == Some(true)
   │
   │  ── deferred path ──
   ├─► write "POST /memorize (queued)" audit row     (immediate)
   ├─► return HTTP 202 {status: "queued", queue_id, ...}  (immediate)
   ├─► spawn tokio task
   │       │
   │       ▼
   │   acquire AppState.async_memorize_lock  (single-permit Mutex)
   │       │
   │       ▼
   │   memorize_one(...)  ← see §3
   │       │
   │       ▼
   │   write "POST /memorize (async)" audit row with final stats
   │
   │  ── sync path ──
   └─► memorize_one(...) inline, return 200 with full body

Why the deferred path exists

Cloudflare in front of memories.apexpots.com cuts proxied HTTP at ~100 s. The Nietzsche run took 854 s end-to-end; deep mode cannot be served synchronously over a Cloudflare-fronted endpoint without 524-ing. The deferred path returns 202 immediately so the client never blocks, and the actual work runs to completion in the background.

Why a single-permit Mutex

async_memorize_lock: Arc<tokio::sync::Mutex<()>> lives on the AppState. Each spawned task acquires it before running memorize_one. The lock is intentionally narrow — it serialises deep extractions, not all memorize traffic, because:

Concurrent deep runs against the same LLM endpoint blow rate-limit quota with little wall-time gain.
Concurrent ingest into the same substrate hits hot-spot row locks on the predicate/holder indexes.
Single-permit + simple = no priority inversions, no rebalancing.

The cost: queue is a Mutex, not a real queue. Multiple queued tasks pile up as parked futures inside the running process. Restarting the binary loses all in-flight + queued work. We’ve seen this twice: the binary was restarted at 14:44:58 mid-Nietzsche-rerun and again later, marking 2 orphaned (queued) rows as (lost) in the audit log. The startup path does the right thing here — it stamps surviving (queued) rows with a (lost) endpoint label and status_code=500 so they don’t ghost in the job list.

3. `memorize_one` — the inner engine

memorize_one(s: &AppState, req: &MemorizeReq)
│
├─► §3.1  OCR (if images attached)
├─► §3.2  Episodic ingest (always — raw text → substrate)
├─► §3.3  Optional LLM extraction (single | exhaustive | deep)
└─► §3.4  Semantic-claim ingest (per-fact ingest into substrate)

3.1 OCR

If req.images is non-empty and s.settings.ocr_enabled, a separate LLM call is made before extraction:

Endpoint: same OpenAI-compatible chat endpoint
System prompt: OCR_SYSTEM_PROMPT (transcribe every visible word, return {"transcripts": [...]} per image in order)
temperature: 0.0, max_tokens: 4000, response_format: { type: "json_object" }
Result: an OCR’d text block per image, joined with [OCR text from image #N] headers and prepended to req.text.

The augmented effective_text becomes both the episodic record body and the extractor input. An OCR failure is not fatal — it logs a warning and proceeds with the original text. There are no images in either of the runs analysed here, so the OCR path didn’t run.

3.2 Episodic ingest

The raw text is always written as an episodic record, regardless of extraction mode. This is the substrate’s atomic “something happened” anchor.

let episodic_input = IngestInput {
    holder: req.holder,           // "agent:omega-bot"
    session_id: req.session_id,   // "discord:<guild>:<channel>"
    text: effective_text,         // raw + OCR
    modality: req.modality,       // "descriptive"
    ...
};
episodic.ingest(substrate, pool, consumer_iri, &episodic_input)
    → (episodic_record_id, episodic_record_iri)

The episodic record gets a stable IRI like donto:record:<uuid> that is then handed to every subsequent semantic-claim ingest as source_record_iri so the provenance chain is episodic_record ← semantic_claim ← fact.

3.3 Extraction dispatch

req.mode (or s.settings.extract_mode if unset) is lowercased and matched:

Mode keyword	Function	Concurrency	Default usage
`single` (default)	`extract_single`	1 call	omega-bot historic
`exhaustive` / `multi` / `apertures`	`extract_exhaustive`	5 parallel	research/testing
`deep` / `sequential` / `iterative`	`extract_deep`	N sequential	omega-bot current

Unrecognised modes fall through to single. Deep mode also accepts req.passes (default 3, clamped to 1..=10).

3.4 Semantic-claim ingest

Each surviving fact (post-dedup) is written via the mem:module/semantic-claim module. The ingest is per-fact and sequential; there is no batch. Progress is logged every 5 s with ingest progress lines (239/697 ingested, 0 errors). Errors are accumulated but do not abort the run — a failed fact gets logged and the loop continues.

Each semantic claim ingest produces a row at the substrate level, anchored to episodic_record.record_iri, typed by the holder’s overlay, and bitemporal — visible from now() onward through the substrate’s tx_time machinery.

4. `extract_deep` — the orchestrator

The full function is ~140 lines at crates/donto-memory-core/src/extract.rs:507. The control flow:

extract_deep(text, holder, session_id, source_record_iri, images, passes)
│
│   seen: BTreeSet<content_key>  — global dedup
│   all_facts: Vec<ExtractedFact>  — running fact list
│   pass_yields: Vec<ApertureYield>  — per-pass audit
│   merged_usage: ChatUsage  — accumulated tokens
│
└─► for pass_n in 1..=passes {
        prior_block = if all_facts.is_empty() {
            None
        } else {
            Some(format_prior_facts_block(&all_facts))   ← see §5
        }

        call_one_with_context(SINGLE_PROMPT, pass_id, text, ..., prior_block)
        │
        ├─► on Ok(yield):
        │      for fact in yield.facts:
        │          fact.aperture = Some(pass_id)         ← authoritative pass label
        │          key = sha256(subject | predicate | object_iri_or_lit)
        │          if seen.insert(key):
        │              all_facts.push(fact); added += 1
        │          else:
        │              dedup_collisions += 1; collided += 1
        │      merge usage; log "deep pass complete"
        │
        └─► on Err(e):
               pass_yields.push(ApertureYield { error: Some(e) });
               log "deep pass failed"
               continue with next pass  ← failure of one pass does not abort
    }

The orchestrator does not retry failed passes. A pass that fails (e.g. pass_2 prose-not-JSON on the Nietzsche run) contributes zero facts and the loop moves on. This is intentional — the cost of a wasted pass is bounded — but as recommended in the previous report, a single retry on JSON parse failure would recover ~14% of capacity on a 7-pass run.

5. The prior-facts block

This is the only piece of the prompt that varies between passes. It’s prepended to the user prompt as a single block, formatted like:

Earlier passes over this same chunk already extracted the facts below.
Your job in this pass is to find EVERY remaining fact the previous passes
missed. Do NOT repeat anything in the list — content-hash dedup will drop
repeats anyway, so your job is pure novelty. Push harder: deeper inferences,
unstated assumptions, additional entities (including abstract/conceptual
ones, time/place anchors, counterfactuals), alternate framings,
finer-grained properties, temporal and spatial nuance, causal and dependency
links, contrastive readings, parts of named entities, generic-class facts
("X is a Y", "Y has property Z"), metalinguistic facts about the utterance
itself (sentence count, mood, register, sentiment, politeness, addressee,
speech act), pragmatic implicatures, conventional implicatures, scalar
implicatures, conversational maxims, intent, plan, prerequisite,
consequence, related concepts in the same domain, related practitioners,
related tools/standards/formats, the user's evident expertise level, the
user's evident emotional state, the user's evident workflow, the user's
evident dependencies, the user's evident substitutes-avoided, the user's
evident counterfactual world ("would be lost without X"), domain knowledge
implied. Aim for 30-60+ NEW facts in this pass. Repeat content will be
dropped — your incentive is breadth + novelty. Only return {"facts": []}
if you genuinely cannot think of one more angle.

ALREADY EXTRACTED (subject | predicate | object):
- discord:user:ajaxdavis | rdf:type | donto:DiscordUser
- discord:channel:donto | rdf:type | donto:DiscordChannel
- donto:Song | rdf:type | donto:ArtisticWork
- ...

Key behaviours:

Window: the block always contains at most the last 300 facts (start = facts.len().saturating_sub(300)). On a 7-pass run that exits with 697 facts, pass 7 sees facts 397–697. Earlier facts fall out of the model’s context window. This is a deliberate cap to bound prompt size, but it means the model can re-derive a fact already extracted in pass 2 if it didn’t make the cutoff for pass 7.
The “find new angles” laundry list is deliberately exhaustive. It enumerates the conceptual moves we want the model to make — speech-act analysis, modal inference, parts-of-entity, idiomatic expression analysis, presupposition, scalar implicature. Empirically pass 6–7 reach into these later items (face-saving, illocutionary force, epistemic-reputation management).
Aim for 30-60+ NEW facts in this pass is a soft target. Pass 4 on the Nietzsche run actually delivered 204; pass 3 on cat-is-red delivered only 42 unique (108/150 collisions).
Empty result is allowed ({"facts": []}). The model is told it’s a legitimate output if it really has nothing left. In practice it never returns this — the model finds something to extract even when the input is 3 words.

6. `call_one_with_context` — the LLM call

The function builds an OpenAI-compatible chat-completion request body:

{
  "model": "z-ai/glm-5",
  "temperature": 0.2,
  "max_tokens": 8000,
  "response_format": { "type": "json_object" },
  "messages": [
    { "role": "system", "content": SINGLE_PROMPT },
    { "role": "user", "content":
        "holder: agent:omega-bot\n" +
        "session_id: discord:...\n" +
        "source_record_iri: donto:record:...\n\n" +
        prior_facts_block +              ← only present pass 2+
        "chunk:\n" + text +
        "\n\n" + COMMON_FRAGMENT          ← JSON schema reminder
    }
  ]
}

Notes:

temperature: 0.2 is a configurable default (DONTO_MEMORY_LLM_TEMPERATURE). Low but not zero. We could justify pushing this up to 0.5–0.7 for later passes specifically — the model is being asked for novelty, which is exactly what temperature is for.
max_tokens: 8000 is hardcoded in the request body. This is the choke point that produced the salvage cases across both runs. Bumping to 12000 is the standing recommendation.
response_format: { "type": "json_object" } is the OpenAI-style JSON-mode hint. Z.AI’s GLM-5 honours it most of the time but pass_2 on the Nietzsche run still returned prose.
HTTP client: reqwest::Client with a 900-second per-request timeout (bumped from the original 180 s after a pass-1 hit elapsed_ms=180002 mid-Pandoc experiment). The right limit is “longer than the worst observed pass plus margin”; 900 s is comfortable.
Image inputs (if any) are encoded as { "type": "image_url", "image_url": { "url": "..." } } content parts on the user message, same shape as OpenAI vision. Deep mode hasn’t been exercised with images yet.

The response is parsed into ChatCompletion { choices, usage, model }. The choices[0].message.content string is parsed as { "facts": [ ExtractedFact, ... ] }.

7. JSON salvage

max_tokens: 8000 is a hard truncation ceiling, and the model exhausts it on roughly half of all passes. When the JSON is structurally invalid (because the closing ] and } got cut off), the orchestrator does not throw away the entire pass. Instead:

1. Try strict JSON parse.
2. On failure, walk forward through the string looking for the
   "facts": [ marker, then scan element by element using a small
   bracket-balance state machine.
3. For each well-formed object found before the EOF point, parse it
   individually as ExtractedFact.
4. Discard the malformed tail (the last partial fact).
5. Return the salvaged Vec<ExtractedFact> with the original raw count,
   logged as `WARN  LLM JSON truncated; recovered partial facts`.

Empirically this saves enormous amounts of pass yield. On cat-is-red’s pass_1, the model output truncated at position 1:1 (extreme — almost the whole output was malformed prose-prefix) and the salvager still recovered 10 facts. On the Nietzsche pass_4 the salvager recovered 202 of an attempted 204+. Without this path each truncation would dump an entire 100+-fact pass.

The salvager has its own test (assert_eq!(out.len(), 2, ...) at extract.rs:1237) and is unit-tested against synthetic truncation cases.

8. Dedup — content-key hashing

After each pass the orchestrator computes a SHA256 of the fact’s content tuple:

content_key = SHA256(
    subject.bytes
    | 0x1f
    | predicate.bytes
    | 0x1f
    | (object_iri OR JSON-serialised object_lit).bytes
)

The key is then BTreeSet.insert(key) — first-write-wins. Confidence, modality, hypothesis_only, aperture label, and notes are deliberately excluded from the key so that a second pass restating a known fact with higher confidence still collides (we keep the earlier copy with its lower confidence).

Limitations of string-key dedup:

Synonymous IRIs are not collapsed. discord:user:ajaxdavis and donto:AjaxDavis are different keys; both land. Same for rdf:type vs donto:isA. This is the identity-collapse phenomenon called out in the case-study report.
Sub/super class duplicates are not collapsed. (donto:Cat, rdf:type, donto:Animal) and (donto:Cat, rdf:type, donto:DomesticAnimal) are both kept. This shows up as suspiciously-zero collision rates in passes where the model is just generating finer-grained type assertions of already-known entities.
Predicate aliases are not collapsed. donto:requires vs donto:needs look distinct to the hash.

These are not bugs of the dedup function — they’re a known limit of going content-key-only. The standing recommendation is a semantic-dedup pass at the end, before substrate ingest.

9. Token + cost accounting

When the LLM endpoint returns usage in the response body, the orchestrator merges it into merged_usage. The substrate stores the final tally in the audit row’s columns:

Column	Source	Notes
`prompt_tokens`	sum of `usage.prompt_tokens` across passes	accurate when endpoint returns usage
`completion_tokens`	sum of `usage.completion_tokens` across passes	on Z.AI endpoint, this is an estimate, currently `passes_succeeded × max_tokens`
`total_tokens`	sum	accurate when individual sums are accurate
`model`	`choices[0].model` (echoed by endpoint)	`z-ai/glm-5-20260211` for current runs

The “completion_tokens looks like an upper bound” footnote in the cost analysis is exactly this — when the endpoint omits usage.completion_tokens, we fall back to max_tokens × successful_passes, which is a pessimistic estimate.

Pricing (OpenRouter `z-ai/glm-5`)

Input: $0.60 / M tokens
Output: $1.92 / M tokens

Empirical per-run cost

Job	Input	Prompt tk	Completion tk*	Cost (worst case)	Facts	$/fact
Nietzsche, 7 passes (6 succeeded)	109 words	33,644	48,000*	$0.112	1000	$0.000113
cat is red, 7 passes (7 succeeded)	3 words	45,839	56,000*	$0.135	697	$0.000194

* completion_tokens estimated at 8000 × passes_succeeded; real value likely 60–80%.

A 7-pass deep run costs roughly $0.10–0.13 per message at worst-case GLM-5 prices. The cost is dominated by output tokens (~85% of total). Reducing passes for short messages, raising max_tokens (so fewer passes truncate-and-redo), and caching the prior-facts prefix are the three biggest cost levers.

10. Audit log — `donto_x_memory_job_log`

Every memorize and recall touches this table. The deep mode pipeline writes two rows per request:

-- Row 1: "queued" placeholder, written immediately after defer decision
INSERT INTO donto_x_memory_job_log (
    job_id, endpoint='POST /memorize (queued)',
    status_code=202, elapsed_ms=0,
    request=<full request body>, response=<{status:queued,queue_id:...}>,
    holder, session_id
)

-- Row 2: "async" final row, written when memorize_one returns
INSERT INTO donto_x_memory_job_log (
    job_id, endpoint='POST /memorize (async)',
    status_code=200 (or 500), elapsed_ms=<final>,
    request=<full request body>, response=<full response body>,
    facts_extracted, facts_ingested,
    model, prompt_tokens, completion_tokens, total_tokens,
    error=NULL (or message),
    holder, session_id
)

If the binary restarts mid-run, the startup path stamps any orphaned (queued) rows with endpoint POST /memorize (lost) and status_code=500 so they don’t appear “still running” forever. This is the marker we saw at 14:44:58 in this session.

Sync-mode requests write one row with endpoint POST /memorize (sync) or POST /memorize depending on path.

The audit table is the source of truth for the /jobs index and /jobs/<id> detail page.

11. The `/jobs` UI surface

There are three routes:

Route	Purpose
`GET /jobs`	Paginated index of recent jobs (status, route, holder, text preview, elapsed)
`GET /jobs/<id>`	HTML detail page: request payload, response payload, per-pass yields, per-fact table
`GET /jobs/<id>/raw`	JSON detail (same shape as the HTML page’s underlying data)

Detail-page features specific to deep mode:

Per-pass section headers — when facts come from multiple apertures (passes), the fact table renders a section header between each aperture group: ── pass_1 (132 facts) ──.
“Facts by pass” summary line — counts per pass at the top of the table for quick scanning.
Aperture column — every fact shows its source pass (pass_3, pass_5, etc.).
Confidence + modality columns — to help spot the “low-confidence / inferred” speculation cluster.
Salvage warnings — surfaced in the pass-yields panel when a pass had JSON truncated in its logs (TODO: this is currently in journal only; should be in the response body).

The operator endpoint /jobs was previously gated by DONTO_MEMORY_OPS_TOKEN. It is currently set to empty in /etc/donto-memory/env so the page is publicly browsable. Anyone with the URL can read every memorize that has ever happened — fine for the current single-user deployment, will need a real auth model before multi-tenant.

12. What lands in the substrate

Per memorize, the substrate receives:

One episodic record (donto:record:<uuid>) typed via mem:module/episodic. The full text + modality + holder + session_id. Bitemporal, visible from now().
N semantic-claim records (one per fact, post-dedup, post-substrate-policy-gating) typed via mem:module/semantic-claim. Each carries:
- subject, predicate, object_iri or object_lit
- confidence, modality, hypothesis_only
- aperture (= the pass label, e.g. pass_3)
- source_record_iri (= the episodic record’s IRI from step 1)
- holder, session_id, consumer_iri (always ctx:memory)

The substrate’s policy gate can reject a fact (e.g. predicate-domain violation, identity policy violation). The orchestrator counts these as errors in the ingest progress log and the audit row’s facts_ingested will be lower than facts_extracted. On the Nietzsche run this gap was 2 facts (998/1000); on the cat-is-red run it was 0.

The substrate also handles federation, sleep-path reconsolidation, and the trust kernel — none of which are deep-mode-specific. They apply to every memorize regardless of mode.

13. Recall — the other side

Deep mode is a write mode. The complementary read mode is POST /recall, which:

Embeds the query.
Runs Reciprocal Rank Fusion across multiple retrievers (BM25, vector, IRI-prefix, holder).
Filters by identity lens (which subset of statements is this holder allowed to see).
Returns a ranked list with provenance back to the originating episodic record.

The more facts deep mode ingests per message, the higher the recall precision for queries that touch those facts — provided the dedup and identity work properly. Both runs in this analysis have been writes; we have not yet exercised recall against either output corpus.

14. Observed empirics so far

Two end-to-end deep-mode runs at production parameters (passes=7, modality=descriptive, holder=agent:omega-bot):

Run A — Nietzsche / Eternal Recurrence (109 words)

Pass	Raw	New unique	Collisions	Elapsed	Notes
1	132	132	0	124 s	clean cold start
2	FAIL	0	0	<1 s	prose-not-JSON
3	202	202	0	62 s	prior-facts redirect; fastest pass
4	204	202	2	127 s	truncated, salvaged
5	159	157	2	109 s
6	158	156	2	142 s
7	158	151	7	124 s	highest collision rate
Σ	1013	1000	13	854 s	998 ingested

Run B — “cat is red” (3 words)

Pass	Raw	New unique	Collisions	Elapsed	Notes
1	10	10	0	46 s	early truncation, salvaged
2	121	121	0	112 s	prior-facts redirect
3	150	42	108	100 s	72% collision — model stuck
4	127	127	0	105 s	re-energised
5	139	139	0	105 s	re-energised
6	103	94	9	96 s
7	165	164	1	120 s
Σ	815	697	118	703 s	697 ingested

Key contrasts

	Run A	Run B
facts/input-word	9.2	232
cost	$0.112	$0.135
%-truncations (passes truncated)	~30%	~86%
pass_2 outcome	prose-fail	121 unique
substrate rejections	2/1000	0/697

The non-linearity in facts/word (3-word input produces 25× more facts per word than the 109-word input) is the headline finding from the empirics: the model’s elaboration capacity is decoupled from the input. Deep mode on tiny inputs is largely hallucinating consistent ontology. This is a feature for some use cases (priming recall on common concepts) and a bug for others (signal-to-noise in the substrate).

15. Known issues & quirks

In rough priority order:

No retry on JSON parse failure. A single prose-not-JSON failure forfeits the entire pass (14% capacity loss on a 7-pass run).
max_tokens: 8000 is too low. Most passes truncate; salvage works but loses the malformed tail. Recommend 12000.
Completion-token usage isn’t reported by Z.AI endpoint. Cost accounting uses passes × max_tokens as an upper-bound estimate. Underreport real cost, overreport per-message cost ceiling.
String-key dedup misses semantic duplicates. (Cat, rdf:type, Animal) and (Cat, rdf:type, Mammal) both land. Identity collapse (discord:user:ajax vs donto:Ajax) similarly slips through.
Prior-facts window is hard-capped at 300. Beyond 300 cumulative facts, older facts fall out of the model’s context and can be re-derived (then dedup-dropped). For 7-pass runs producing 700-1000 facts this is a meaningful blind spot.
Queue is an in-memory Mutex, not a durable table. Restart loses in-flight + queued work. Startup marks orphaned (queued) rows as (lost) but they’re gone.
Per-pass tracing visibility lives at info! level. If the binary’s RUST_LOG defaults to WARN, journalctl shows nothing useful during a run. Verified this session — the original Nietzsche run was silent in the journal because the level filter was promoting.
The format_prior_facts_block function has dead code (let take = facts.len().saturating_sub(facts.len().saturating_sub(300)); is a no-op preceded by let _ = take;). Cosmetic but should be cleaned up.
No backpressure signal to clients. The 202 response gives a queue_id but no estimated_position or estimated_wait. With a single-permit Mutex and 7-pass runs taking 12–14 min, the 5th queued job will wait ~70 min with no signal.
/jobs page is unauthenticated. Acceptable for single-user; needs auth before multi-tenant.

16. Roadmap (next concrete changes)

In order from cheapest-fastest-win to deepest:

JSON-mode retry on parse failure — 5 lines of code, recovers the prose-not-JSON case.
Bump max_tokens to 12000 — one constant change in build_request_body.
Input-length-aware default passes on the bot side — passes = clamp(2, ceil(words/25), 7). Saves 70-80% on short messages.
Promote tracing level filter — explicit RUST_LOG=info,donto_memory=info,donto_memory_core=info in the systemd unit.
Prompt caching on the prior-facts prefix — Anthropic-style cache_control or OpenAI prefix caching. Cuts input cost ~60% on later passes.
Semantic dedup post-pass — collapse class hierarchies + identity synonyms before substrate ingest. Saves storage + improves recall precision.
Quality-grounding score per fact — cheap LLM pass scoring 0–1 for direct-support vs speculation; route below-threshold facts to a candidate overlay.
Real queue table — donto_x_memory_extract_queue with FOR UPDATE SKIP LOCKED, multiple workers, restart-safe.
Per-pass temperature ramp — start at 0.2 for pass 1, climb to 0.6 for pass 7. Push novelty harder where novelty is the explicit goal.
Identity-resolution module — proper alias graph in the substrate so discord:user:ajaxdavis ↔︎ donto:AjaxDavis ↔︎ donto:Reader collapse at query time.
Multi-tenant auth on /jobs and /memorize.

17. File map

If you’re reading source:

Concern	File	Lines
Deep orchestrator	`donto-memory-core/src/extract.rs`	507–646
Prior-facts block formatter	`donto-memory-core/src/extract.rs`	961–1003
Per-call HTTP + salvage	`donto-memory-core/src/extract.rs`	676–870
SINGLE_PROMPT constant	`donto-memory-core/src/extract.rs`	863–872
Content-key dedup	`donto-memory-core/src/extract.rs`	112–135 (`content_key`)
Route handler + dispatch	`donto-memory/src/api/routes/memorize.rs`	100–340
`memorize_one` inner engine	`donto-memory/src/api/routes/memorize.rs`	333–560
Ingest progress logging	`donto-memory/src/api/routes/memorize.rs`	497–540
Async lock initialisation	`donto-memory/src/main.rs`	106–133
Job audit log table	migrations: `*_donto_x_memory_job_log.sql`	—
Job detail page rendering	`donto-memory/src/api/routes/jobs.rs`	—

Operator endpoints in production:

API binds 127.0.0.1:7900 on donto-db VM (us-central1-a, apex-494316).
Caddy fronts it as https://memories.apexpots.com/....
systemd unit: donto-memory-api.service (loaded, enabled, autorestart).
Audit DB DSN: postgres://donto:***@127.0.0.1:5432/donto_db.

18. Glossary

Aperture — historically a per-perspective extraction prompt (surface, linguistic, presupposition, etc.); in deep mode the term is reused as the pass label (pass_1 … pass_7).
Content key — SHA256 of (subject, predicate, object) used for first-write-wins dedup across passes.
Episodic record — the always-stored “this text happened” anchor that semantic claims hang off.
Holder — the identity that owns the memory. agent:omega-bot for the Discord bot’s auto-memorize.
Modality — descriptive, imperative, interrogative, etc. Currently always descriptive from omega-bot.
Salvage — recovering partial JSON when the model truncates at max_tokens.
Substrate — the underlying donto data store (statements, predicates, holders, overlays, lenses).
Trust kernel / sleep path — substrate-level subsystems that reconcile and consolidate facts after ingest. Not deep-mode-specific.
Aperture yield — per-pass record: (aperture, raw_facts, elapsed_ms, error). Forms the input to the /jobs/<id> page’s per-pass breakdown.

Related reports:

Deep-mode extraction on a 109-word Discord message yields 1000 facts — the empirical case study this engine doc complements
donto-memory omega-bot corpus audit (2026-05-30) — first-light qualitative read
donto — Substrate PRD (2026-05-28) — where deep mode sits in the broader substrate architecture
donto-memory early activation report (2026-05-28)