# donto-memory deep-mode — engine reference

**Date:** 2026-05-31
**Scope:** Everything the `mode: "deep"` extraction pipeline does, from a `POST /memorize` request to the rows that land in the substrate. Covers code paths, prompts, dedup, salvage, token usage, the async queue, the audit log, the job-detail surface, observed empirics, and the known limits.

This document complements the empirical case study at [`/research/donto-deep-mode-eternal-recurrence-2026-05-31.html`](/research/donto-deep-mode-eternal-recurrence-2026-05-31.html). Where that report reads results, this one explains the machinery that produced them.

---

## 1. What deep mode is

`mode: "deep"` is the **iterative-novelty** extraction lane inside donto-memory. Where `mode: "single"` does one LLM call with a maximalist prompt (~30 facts target) and `mode: "exhaustive"` does five parallel calls under different rhetorical "apertures" (surface / linguistic / presupposition / inferential / conceivable), `mode: "deep"` does **N sequential calls of the *same* prompt**, each shown a list of the facts the prior passes already produced. The model's task each pass is *only* to find things the earlier passes missed.

The design intent:

- No rigid per-pass prompt rotation. The user explicitly rejected a fixed sequence ("avoid any rigid prompts"). The model picks its own divergence direction, prompted only by what it has and hasn't already said.
- One call at a time. The next pass can't be cached or parallelised because it depends on the cumulative fact set so far.
- Stop when told (`passes` param, default 3, currently 7 from omega-bot). No automatic saturation detection — the operator decides how hard to push.
- Hard dedup at the end via content-key hashing. The model's incentive is breadth + novelty; duplicates get dropped silently.

Configuration is a single HTTP knob:

```json
{
  "mode": "deep",
  "passes": 7,
  "modality": "descriptive",
  "holder": "agent:omega-bot",
  "session_id": "discord:1349727923434815519:1497274794586931220",
  "text": "...",
  "images": []
}
```

Synonyms: `"sequential"`, `"iterative"` route to the same code path.

---

## 2. Request lifecycle

```
omega-bot
   │  POST https://memories.apexpots.com/memorize
   ▼
caddy (TLS, gzip)
   │
   ▼
donto-memory-api (127.0.0.1:7900, axum)
   │
   ├─► should_defer(req, default_mode)?
   │     true if mode ∈ {deep, exhaustive, sequential, iterative, multi, apertures}
   │     OR req.r#async == Some(true)
   │
   │  ── deferred path ──
   ├─► write "POST /memorize (queued)" audit row     (immediate)
   ├─► return HTTP 202 {status: "queued", queue_id, ...}  (immediate)
   ├─► spawn tokio task
   │       │
   │       ▼
   │   acquire AppState.async_memorize_lock  (single-permit Mutex)
   │       │
   │       ▼
   │   memorize_one(...)  ← see §3
   │       │
   │       ▼
   │   write "POST /memorize (async)" audit row with final stats
   │
   │  ── sync path ──
   └─► memorize_one(...) inline, return 200 with full body
```

### Why the deferred path exists

Cloudflare in front of `memories.apexpots.com` cuts proxied HTTP at ~100 s. The Nietzsche run took 854 s end-to-end; deep mode cannot be served synchronously over a Cloudflare-fronted endpoint without 524-ing. The deferred path returns 202 immediately so the client never blocks, and the actual work runs to completion in the background.

### Why a single-permit Mutex

`async_memorize_lock: Arc<tokio::sync::Mutex<()>>` lives on the `AppState`. Each spawned task acquires it before running `memorize_one`. The lock is intentionally narrow — it serialises *deep extractions*, not all memorize traffic, because:

- Concurrent deep runs against the same LLM endpoint blow rate-limit quota with little wall-time gain.
- Concurrent ingest into the same substrate hits hot-spot row locks on the predicate/holder indexes.
- Single-permit + simple = no priority inversions, no rebalancing.

The cost: queue is a Mutex, not a real queue. Multiple queued tasks pile up as parked futures inside the running process. **Restarting the binary loses all in-flight + queued work.** We've seen this twice: the binary was restarted at 14:44:58 mid-Nietzsche-rerun and again later, marking 2 orphaned `(queued)` rows as `(lost)` in the audit log. The startup path does the right thing here — it stamps surviving `(queued)` rows with a `(lost)` endpoint label and `status_code=500` so they don't ghost in the job list.

---

## 3. `memorize_one` — the inner engine

```
memorize_one(s: &AppState, req: &MemorizeReq)
│
├─► §3.1  OCR (if images attached)
├─► §3.2  Episodic ingest (always — raw text → substrate)
├─► §3.3  Optional LLM extraction (single | exhaustive | deep)
└─► §3.4  Semantic-claim ingest (per-fact ingest into substrate)
```

### 3.1 OCR

If `req.images` is non-empty and `s.settings.ocr_enabled`, a separate LLM call is made *before* extraction:

- Endpoint: same OpenAI-compatible chat endpoint
- System prompt: `OCR_SYSTEM_PROMPT` (transcribe every visible word, return `{"transcripts": [...]}` per image in order)
- `temperature: 0.0`, `max_tokens: 4000`, `response_format: { type: "json_object" }`
- Result: an OCR'd text block per image, joined with `[OCR text from image #N]` headers and prepended to `req.text`.

The augmented `effective_text` becomes both the episodic record body and the extractor input. An OCR failure is *not* fatal — it logs a warning and proceeds with the original text. There are no images in either of the runs analysed here, so the OCR path didn't run.

### 3.2 Episodic ingest

The raw text is always written as an episodic record, regardless of extraction mode. This is the substrate's atomic "something happened" anchor.

```rust
let episodic_input = IngestInput {
    holder: req.holder,           // "agent:omega-bot"
    session_id: req.session_id,   // "discord:<guild>:<channel>"
    text: effective_text,         // raw + OCR
    modality: req.modality,       // "descriptive"
    ...
};
episodic.ingest(substrate, pool, consumer_iri, &episodic_input)
    → (episodic_record_id, episodic_record_iri)
```

The episodic record gets a stable IRI like `donto:record:<uuid>` that is then handed to every subsequent semantic-claim ingest as `source_record_iri` so the provenance chain is `episodic_record ← semantic_claim ← fact`.

### 3.3 Extraction dispatch

`req.mode` (or `s.settings.extract_mode` if unset) is lowercased and matched:

| Mode keyword | Function | Concurrency | Default usage |
|---|---|---|---|
| `single` (default) | `extract_single` | 1 call | omega-bot historic |
| `exhaustive` / `multi` / `apertures` | `extract_exhaustive` | 5 parallel | research/testing |
| `deep` / `sequential` / `iterative` | `extract_deep` | N sequential | omega-bot current |

Unrecognised modes fall through to `single`. Deep mode also accepts `req.passes` (default 3, clamped to `1..=10`).

### 3.4 Semantic-claim ingest

Each surviving fact (post-dedup) is written via the `mem:module/semantic-claim` module. The ingest is **per-fact and sequential**; there is no batch. Progress is logged every 5 s with `ingest progress` lines (`239/697 ingested, 0 errors`). Errors are accumulated but do not abort the run — a failed fact gets logged and the loop continues.

Each semantic claim ingest produces a row at the substrate level, anchored to `episodic_record.record_iri`, typed by the holder's overlay, and bitemporal — visible from `now()` onward through the substrate's tx_time machinery.

---

## 4. `extract_deep` — the orchestrator

The full function is ~140 lines at `crates/donto-memory-core/src/extract.rs:507`. The control flow:

```
extract_deep(text, holder, session_id, source_record_iri, images, passes)
│
│   seen: BTreeSet<content_key>  — global dedup
│   all_facts: Vec<ExtractedFact>  — running fact list
│   pass_yields: Vec<ApertureYield>  — per-pass audit
│   merged_usage: ChatUsage  — accumulated tokens
│
└─► for pass_n in 1..=passes {
        prior_block = if all_facts.is_empty() {
            None
        } else {
            Some(format_prior_facts_block(&all_facts))   ← see §5
        }

        call_one_with_context(SINGLE_PROMPT, pass_id, text, ..., prior_block)
        │
        ├─► on Ok(yield):
        │      for fact in yield.facts:
        │          fact.aperture = Some(pass_id)         ← authoritative pass label
        │          key = sha256(subject | predicate | object_iri_or_lit)
        │          if seen.insert(key):
        │              all_facts.push(fact); added += 1
        │          else:
        │              dedup_collisions += 1; collided += 1
        │      merge usage; log "deep pass complete"
        │
        └─► on Err(e):
               pass_yields.push(ApertureYield { error: Some(e) });
               log "deep pass failed"
               continue with next pass  ← failure of one pass does not abort
    }
```

The orchestrator does **not** retry failed passes. A pass that fails (e.g. pass_2 prose-not-JSON on the Nietzsche run) contributes zero facts and the loop moves on. This is intentional — the cost of a wasted pass is bounded — but as recommended in the previous report, a single retry on JSON parse failure would recover ~14% of capacity on a 7-pass run.

---

## 5. The prior-facts block

This is the only piece of the prompt that varies between passes. It's prepended to the user prompt as a single block, formatted like:

```
Earlier passes over this same chunk already extracted the facts below.
Your job in this pass is to find EVERY remaining fact the previous passes
missed. Do NOT repeat anything in the list — content-hash dedup will drop
repeats anyway, so your job is pure novelty. Push harder: deeper inferences,
unstated assumptions, additional entities (including abstract/conceptual
ones, time/place anchors, counterfactuals), alternate framings,
finer-grained properties, temporal and spatial nuance, causal and dependency
links, contrastive readings, parts of named entities, generic-class facts
("X is a Y", "Y has property Z"), metalinguistic facts about the utterance
itself (sentence count, mood, register, sentiment, politeness, addressee,
speech act), pragmatic implicatures, conventional implicatures, scalar
implicatures, conversational maxims, intent, plan, prerequisite,
consequence, related concepts in the same domain, related practitioners,
related tools/standards/formats, the user's evident expertise level, the
user's evident emotional state, the user's evident workflow, the user's
evident dependencies, the user's evident substitutes-avoided, the user's
evident counterfactual world ("would be lost without X"), domain knowledge
implied. Aim for 30-60+ NEW facts in this pass. Repeat content will be
dropped — your incentive is breadth + novelty. Only return {"facts": []}
if you genuinely cannot think of one more angle.

ALREADY EXTRACTED (subject | predicate | object):
- discord:user:ajaxdavis | rdf:type | donto:DiscordUser
- discord:channel:donto | rdf:type | donto:DiscordChannel
- donto:Song | rdf:type | donto:ArtisticWork
- ...
```

Key behaviours:

- **Window:** the block always contains *at most* the **last 300 facts** (`start = facts.len().saturating_sub(300)`). On a 7-pass run that exits with 697 facts, pass 7 sees facts 397–697. Earlier facts fall out of the model's context window. This is a deliberate cap to bound prompt size, but it means the model can re-derive a fact already extracted in pass 2 if it didn't make the cutoff for pass 7.
- **The "find new angles" laundry list is deliberately exhaustive.** It enumerates the conceptual moves we want the model to make — speech-act analysis, modal inference, parts-of-entity, idiomatic expression analysis, presupposition, scalar implicature. Empirically pass 6–7 reach into these later items (face-saving, illocutionary force, epistemic-reputation management).
- **`Aim for 30-60+ NEW facts in this pass`** is a soft target. Pass 4 on the Nietzsche run actually delivered 204; pass 3 on cat-is-red delivered only 42 unique (108/150 collisions).
- **Empty result is allowed** (`{"facts": []}`). The model is told it's a legitimate output if it really has nothing left. In practice it never returns this — the model finds *something* to extract even when the input is 3 words.

---

## 6. `call_one_with_context` — the LLM call

The function builds an OpenAI-compatible chat-completion request body:

```json
{
  "model": "z-ai/glm-5",
  "temperature": 0.2,
  "max_tokens": 8000,
  "response_format": { "type": "json_object" },
  "messages": [
    { "role": "system", "content": SINGLE_PROMPT },
    { "role": "user", "content":
        "holder: agent:omega-bot\n" +
        "session_id: discord:...\n" +
        "source_record_iri: donto:record:...\n\n" +
        prior_facts_block +              ← only present pass 2+
        "chunk:\n" + text +
        "\n\n" + COMMON_FRAGMENT          ← JSON schema reminder
    }
  ]
}
```

Notes:

- `temperature: 0.2` is a *configurable* default (`DONTO_MEMORY_LLM_TEMPERATURE`). Low but not zero. We could justify pushing this up to 0.5–0.7 for later passes specifically — the model is being asked for novelty, which is exactly what temperature is for.
- `max_tokens: 8000` is **hardcoded** in the request body. This is the choke point that produced the salvage cases across both runs. Bumping to 12000 is the standing recommendation.
- `response_format: { "type": "json_object" }` is the OpenAI-style JSON-mode hint. Z.AI's GLM-5 honours it most of the time but pass_2 on the Nietzsche run still returned prose.
- HTTP client: `reqwest::Client` with a **900-second per-request timeout** (bumped from the original 180 s after a pass-1 hit `elapsed_ms=180002` mid-Pandoc experiment). The right limit is "longer than the worst observed pass plus margin"; 900 s is comfortable.
- Image inputs (if any) are encoded as `{ "type": "image_url", "image_url": { "url": "..." } }` content parts on the user message, same shape as OpenAI vision. Deep mode hasn't been exercised with images yet.

The response is parsed into `ChatCompletion { choices, usage, model }`. The `choices[0].message.content` string is parsed as `{ "facts": [ ExtractedFact, ... ] }`.

---

## 7. JSON salvage

`max_tokens: 8000` is a hard truncation ceiling, and the model exhausts it on roughly half of all passes. When the JSON is structurally invalid (because the closing `]` and `}` got cut off), the orchestrator does *not* throw away the entire pass. Instead:

```
1. Try strict JSON parse.
2. On failure, walk forward through the string looking for the
   "facts": [ marker, then scan element by element using a small
   bracket-balance state machine.
3. For each well-formed object found before the EOF point, parse it
   individually as ExtractedFact.
4. Discard the malformed tail (the last partial fact).
5. Return the salvaged Vec<ExtractedFact> with the original raw count,
   logged as `WARN  LLM JSON truncated; recovered partial facts`.
```

Empirically this saves enormous amounts of pass yield. On cat-is-red's pass_1, the model output truncated at position 1:1 (extreme — almost the whole output was malformed prose-prefix) and the salvager still recovered 10 facts. On the Nietzsche pass_4 the salvager recovered 202 of an attempted 204+. Without this path each truncation would dump an entire 100+-fact pass.

The salvager has its own test (`assert_eq!(out.len(), 2, ...)` at `extract.rs:1237`) and is unit-tested against synthetic truncation cases.

---

## 8. Dedup — content-key hashing

After each pass the orchestrator computes a SHA256 of the fact's content tuple:

```
content_key = SHA256(
    subject.bytes
    | 0x1f
    | predicate.bytes
    | 0x1f
    | (object_iri OR JSON-serialised object_lit).bytes
)
```

The key is then `BTreeSet.insert(key)` — first-write-wins. Confidence, modality, hypothesis_only, aperture label, and notes are deliberately excluded from the key so that a second pass restating a known fact with higher confidence still collides (we keep the earlier copy with its lower confidence).

Limitations of string-key dedup:

- **Synonymous IRIs are not collapsed.** `discord:user:ajaxdavis` and `donto:AjaxDavis` are different keys; both land. Same for `rdf:type` vs `donto:isA`. This is the identity-collapse phenomenon called out in the case-study report.
- **Sub/super class duplicates are not collapsed.** `(donto:Cat, rdf:type, donto:Animal)` and `(donto:Cat, rdf:type, donto:DomesticAnimal)` are both kept. This shows up as suspiciously-zero collision rates in passes where the model is just generating finer-grained type assertions of already-known entities.
- **Predicate aliases are not collapsed.** `donto:requires` vs `donto:needs` look distinct to the hash.

These are not bugs of the dedup function — they're a known limit of going content-key-only. The standing recommendation is a semantic-dedup pass at the end, before substrate ingest.

---

## 9. Token + cost accounting

When the LLM endpoint returns `usage` in the response body, the orchestrator merges it into `merged_usage`. The substrate stores the final tally in the audit row's columns:

| Column | Source | Notes |
|---|---|---|
| `prompt_tokens` | sum of `usage.prompt_tokens` across passes | accurate when endpoint returns usage |
| `completion_tokens` | sum of `usage.completion_tokens` across passes | **on Z.AI endpoint, this is an estimate**, currently `passes_succeeded × max_tokens` |
| `total_tokens` | sum | accurate when individual sums are accurate |
| `model` | `choices[0].model` (echoed by endpoint) | `z-ai/glm-5-20260211` for current runs |

The "completion_tokens looks like an upper bound" footnote in the cost analysis is exactly this — when the endpoint omits `usage.completion_tokens`, we fall back to `max_tokens × successful_passes`, which is a pessimistic estimate.

### Pricing (OpenRouter `z-ai/glm-5`)

- Input: $0.60 / M tokens
- Output: $1.92 / M tokens

### Empirical per-run cost

| Job | Input | Prompt tk | Completion tk* | Cost (worst case) | Facts | $/fact |
|---|---|---|---|---|---|---|
| Nietzsche, 7 passes (6 succeeded) | 109 words | 33,644 | 48,000* | $0.112 | 1000 | $0.000113 |
| cat is red, 7 passes (7 succeeded) | 3 words | 45,839 | 56,000* | $0.135 | 697 | $0.000194 |

\* completion_tokens estimated at `8000 × passes_succeeded`; real value likely 60–80%.

A 7-pass deep run costs roughly **$0.10–0.13 per message** at worst-case GLM-5 prices. The cost is dominated by output tokens (~85% of total). Reducing passes for short messages, raising max_tokens (so fewer passes truncate-and-redo), and caching the prior-facts prefix are the three biggest cost levers.

---

## 10. Audit log — `donto_x_memory_job_log`

Every memorize and recall touches this table. The deep mode pipeline writes **two rows per request**:

```sql
-- Row 1: "queued" placeholder, written immediately after defer decision
INSERT INTO donto_x_memory_job_log (
    job_id, endpoint='POST /memorize (queued)',
    status_code=202, elapsed_ms=0,
    request=<full request body>, response=<{status:queued,queue_id:...}>,
    holder, session_id
)

-- Row 2: "async" final row, written when memorize_one returns
INSERT INTO donto_x_memory_job_log (
    job_id, endpoint='POST /memorize (async)',
    status_code=200 (or 500), elapsed_ms=<final>,
    request=<full request body>, response=<full response body>,
    facts_extracted, facts_ingested,
    model, prompt_tokens, completion_tokens, total_tokens,
    error=NULL (or message),
    holder, session_id
)
```

If the binary restarts mid-run, the startup path stamps any orphaned `(queued)` rows with endpoint `POST /memorize (lost)` and `status_code=500` so they don't appear "still running" forever. This is the marker we saw at 14:44:58 in this session.

Sync-mode requests write **one row** with endpoint `POST /memorize (sync)` or `POST /memorize` depending on path.

The audit table is the source of truth for the `/jobs` index and `/jobs/<id>` detail page.

---

## 11. The `/jobs` UI surface

There are three routes:

| Route | Purpose |
|---|---|
| `GET /jobs` | Paginated index of recent jobs (status, route, holder, text preview, elapsed) |
| `GET /jobs/<id>` | HTML detail page: request payload, response payload, per-pass yields, per-fact table |
| `GET /jobs/<id>/raw` | JSON detail (same shape as the HTML page's underlying data) |

Detail-page features specific to deep mode:

- **Per-pass section headers** — when facts come from multiple apertures (passes), the fact table renders a section header between each `aperture` group: `── pass_1 (132 facts) ──`.
- **"Facts by pass" summary line** — counts per pass at the top of the table for quick scanning.
- **Aperture column** — every fact shows its source pass (`pass_3`, `pass_5`, etc.).
- **Confidence + modality columns** — to help spot the "low-confidence / inferred" speculation cluster.
- **Salvage warnings** — surfaced in the pass-yields panel when a pass had `JSON truncated` in its logs (TODO: this is currently in journal only; should be in the response body).

The operator endpoint `/jobs` was previously gated by `DONTO_MEMORY_OPS_TOKEN`. It is currently set to empty in `/etc/donto-memory/env` so the page is publicly browsable. Anyone with the URL can read every memorize that has ever happened — fine for the current single-user deployment, will need a real auth model before multi-tenant.

---

## 12. What lands in the substrate

Per memorize, the substrate receives:

1. **One episodic record** (`donto:record:<uuid>`) typed via `mem:module/episodic`. The full text + modality + holder + session_id. Bitemporal, visible from `now()`.
2. **N semantic-claim records** (one per fact, post-dedup, post-substrate-policy-gating) typed via `mem:module/semantic-claim`. Each carries:
   - `subject`, `predicate`, `object_iri` or `object_lit`
   - `confidence`, `modality`, `hypothesis_only`
   - `aperture` (= the pass label, e.g. `pass_3`)
   - `source_record_iri` (= the episodic record's IRI from step 1)
   - holder, session_id, consumer_iri (always `ctx:memory`)

The substrate's policy gate can reject a fact (e.g. predicate-domain violation, identity policy violation). The orchestrator counts these as `errors` in the ingest progress log and the audit row's `facts_ingested` will be lower than `facts_extracted`. On the Nietzsche run this gap was 2 facts (998/1000); on the cat-is-red run it was 0.

The substrate also handles federation, sleep-path reconsolidation, and the trust kernel — none of which are deep-mode-specific. They apply to every memorize regardless of mode.

---

## 13. Recall — the other side

Deep mode is a **write** mode. The complementary read mode is `POST /recall`, which:

1. Embeds the query.
2. Runs Reciprocal Rank Fusion across multiple retrievers (BM25, vector, IRI-prefix, holder).
3. Filters by identity lens (which subset of statements is this holder allowed to see).
4. Returns a ranked list with provenance back to the originating episodic record.

The more facts deep mode ingests per message, the higher the recall precision for queries that touch those facts — provided the dedup and identity work properly. Both runs in this analysis have been writes; we have not yet exercised recall against either output corpus.

---

## 14. Observed empirics so far

Two end-to-end deep-mode runs at production parameters (`passes=7`, modality=descriptive, holder=agent:omega-bot):

### Run A — Nietzsche / Eternal Recurrence (109 words)

| Pass | Raw | New unique | Collisions | Elapsed | Notes |
|---|---|---|---|---|---|
| 1 | 132 | 132 | 0 | 124 s | clean cold start |
| 2 | **FAIL** | 0 | 0 | <1 s | prose-not-JSON |
| 3 | 202 | 202 | 0 | 62 s | prior-facts redirect; fastest pass |
| 4 | 204 | 202 | 2 | 127 s | truncated, salvaged |
| 5 | 159 | 157 | 2 | 109 s | |
| 6 | 158 | 156 | 2 | 142 s | |
| 7 | 158 | 151 | 7 | 124 s | highest collision rate |
| **Σ** | **1013** | **1000** | **13** | **854 s** | 998 ingested |

### Run B — "cat is red" (3 words)

| Pass | Raw | New unique | Collisions | Elapsed | Notes |
|---|---|---|---|---|---|
| 1 | 10 | 10 | 0 | 46 s | early truncation, salvaged |
| 2 | 121 | 121 | 0 | 112 s | prior-facts redirect |
| 3 | 150 | 42 | 108 | 100 s | 72% collision — model stuck |
| 4 | 127 | 127 | 0 | 105 s | re-energised |
| 5 | 139 | 139 | 0 | 105 s | re-energised |
| 6 | 103 | 94 | 9 | 96 s | |
| 7 | 165 | 164 | 1 | 120 s | |
| **Σ** | **815** | **697** | **118** | **703 s** | 697 ingested |

### Key contrasts

| | Run A | Run B |
|---|---|---|
| facts/input-word | 9.2 | **232** |
| cost | $0.112 | $0.135 |
| %-truncations (passes truncated) | ~30% | **~86%** |
| pass_2 outcome | prose-fail | 121 unique |
| substrate rejections | 2/1000 | 0/697 |

The non-linearity in facts/word (3-word input produces 25× more facts per word than the 109-word input) is the headline finding from the empirics: the model's elaboration capacity is decoupled from the input. Deep mode on tiny inputs is largely *hallucinating consistent ontology*. This is a feature for some use cases (priming recall on common concepts) and a bug for others (signal-to-noise in the substrate).

---

## 15. Known issues & quirks

In rough priority order:

1. **No retry on JSON parse failure.** A single prose-not-JSON failure forfeits the entire pass (14% capacity loss on a 7-pass run).
2. **`max_tokens: 8000` is too low.** Most passes truncate; salvage works but loses the malformed tail. Recommend 12000.
3. **Completion-token usage isn't reported by Z.AI endpoint.** Cost accounting uses `passes × max_tokens` as an upper-bound estimate. Underreport real cost, overreport per-message cost ceiling.
4. **String-key dedup misses semantic duplicates.** `(Cat, rdf:type, Animal)` and `(Cat, rdf:type, Mammal)` both land. Identity collapse (`discord:user:ajax` vs `donto:Ajax`) similarly slips through.
5. **Prior-facts window is hard-capped at 300.** Beyond 300 cumulative facts, older facts fall out of the model's context and can be re-derived (then dedup-dropped). For 7-pass runs producing 700-1000 facts this is a meaningful blind spot.
6. **Queue is an in-memory Mutex, not a durable table.** Restart loses in-flight + queued work. Startup marks orphaned `(queued)` rows as `(lost)` but they're gone.
7. **Per-pass tracing visibility lives at `info!` level.** If the binary's `RUST_LOG` defaults to WARN, journalctl shows nothing useful during a run. Verified this session — the original Nietzsche run was silent in the journal because the level filter was promoting.
8. **The `format_prior_facts_block` function has dead code** (`let take = facts.len().saturating_sub(facts.len().saturating_sub(300));` is a no-op preceded by `let _ = take;`). Cosmetic but should be cleaned up.
9. **No backpressure signal to clients.** The 202 response gives a `queue_id` but no `estimated_position` or `estimated_wait`. With a single-permit Mutex and 7-pass runs taking 12–14 min, the 5th queued job will wait ~70 min with no signal.
10. **`/jobs` page is unauthenticated.** Acceptable for single-user; needs auth before multi-tenant.

---

## 16. Roadmap (next concrete changes)

In order from cheapest-fastest-win to deepest:

1. **JSON-mode retry on parse failure** — 5 lines of code, recovers the prose-not-JSON case.
2. **Bump `max_tokens` to 12000** — one constant change in `build_request_body`.
3. **Input-length-aware default `passes` on the bot side** — `passes = clamp(2, ceil(words/25), 7)`. Saves 70-80% on short messages.
4. **Promote tracing level filter** — explicit `RUST_LOG=info,donto_memory=info,donto_memory_core=info` in the systemd unit.
5. **Prompt caching on the prior-facts prefix** — Anthropic-style `cache_control` or OpenAI prefix caching. Cuts input cost ~60% on later passes.
6. **Semantic dedup post-pass** — collapse class hierarchies + identity synonyms before substrate ingest. Saves storage + improves recall precision.
7. **Quality-grounding score per fact** — cheap LLM pass scoring 0–1 for direct-support vs speculation; route below-threshold facts to a `candidate` overlay.
8. **Real queue table** — `donto_x_memory_extract_queue` with `FOR UPDATE SKIP LOCKED`, multiple workers, restart-safe.
9. **Per-pass temperature ramp** — start at 0.2 for pass 1, climb to 0.6 for pass 7. Push novelty harder where novelty is the explicit goal.
10. **Identity-resolution module** — proper alias graph in the substrate so `discord:user:ajaxdavis` ↔ `donto:AjaxDavis` ↔ `donto:Reader` collapse at query time.
11. **Multi-tenant auth** on `/jobs` and `/memorize`.

---

## 17. File map

If you're reading source:

| Concern | File | Lines |
|---|---|---|
| Deep orchestrator | `donto-memory-core/src/extract.rs` | 507–646 |
| Prior-facts block formatter | `donto-memory-core/src/extract.rs` | 961–1003 |
| Per-call HTTP + salvage | `donto-memory-core/src/extract.rs` | 676–870 |
| SINGLE_PROMPT constant | `donto-memory-core/src/extract.rs` | 863–872 |
| Content-key dedup | `donto-memory-core/src/extract.rs` | 112–135 (`content_key`) |
| Route handler + dispatch | `donto-memory/src/api/routes/memorize.rs` | 100–340 |
| `memorize_one` inner engine | `donto-memory/src/api/routes/memorize.rs` | 333–560 |
| Ingest progress logging | `donto-memory/src/api/routes/memorize.rs` | 497–540 |
| Async lock initialisation | `donto-memory/src/main.rs` | 106–133 |
| Job audit log table | migrations: `*_donto_x_memory_job_log.sql` | — |
| Job detail page rendering | `donto-memory/src/api/routes/jobs.rs` | — |

Operator endpoints in production:

- API binds `127.0.0.1:7900` on `donto-db` VM (us-central1-a, apex-494316).
- Caddy fronts it as `https://memories.apexpots.com/...`.
- systemd unit: `donto-memory-api.service` (loaded, enabled, autorestart).
- Audit DB DSN: `postgres://donto:***@127.0.0.1:5432/donto_db`.

---

## 18. Glossary

- **Aperture** — historically a per-perspective extraction prompt (surface, linguistic, presupposition, etc.); in deep mode the term is reused as the pass label (`pass_1` … `pass_7`).
- **Content key** — SHA256 of `(subject, predicate, object)` used for first-write-wins dedup across passes.
- **Episodic record** — the always-stored "this text happened" anchor that semantic claims hang off.
- **Holder** — the identity that owns the memory. `agent:omega-bot` for the Discord bot's auto-memorize.
- **Modality** — `descriptive`, `imperative`, `interrogative`, etc. Currently always `descriptive` from omega-bot.
- **Salvage** — recovering partial JSON when the model truncates at `max_tokens`.
- **Substrate** — the underlying donto data store (statements, predicates, holders, overlays, lenses).
- **Trust kernel / sleep path** — substrate-level subsystems that reconcile and consolidate facts after ingest. Not deep-mode-specific.
- **Aperture yield** — per-pass record: `(aperture, raw_facts, elapsed_ms, error)`. Forms the input to the `/jobs/<id>` page's per-pass breakdown.

---

**Related reports:**

- [Deep-mode extraction on a 109-word Discord message yields 1000 facts](/research/donto-deep-mode-eternal-recurrence-2026-05-31.html) — the empirical case study this engine doc complements
- [donto-memory omega-bot corpus audit (2026-05-30)](/research/donto-memory-omega-bot-corpus-2026-05-30.html) — first-light qualitative read
- [donto — Substrate PRD (2026-05-28)](/research/donto-substrate-prd-2026-05-28.html) — where deep mode sits in the broader substrate architecture
- [donto-memory early activation report (2026-05-28)](/research/donto-memory-report-2026-05-28.html)
