Subject: A single Discord message about Nietzsche’s
Eternal Recurrence posted in #donto was passed
through donto-memory’s new mode: "deep" extraction with 7
sequential LLM passes. Pipeline:
omega-bot → /memorize (async, mode=deep, passes=7) → tokio queue → 7× GLM-4.7 → dedup → ingest
Job: d7ed356c-4c95-4ad6-a1e2-22bb2324ad3d
Holder: agent:omega-bot
Session:
discord:1349727923434815519:1497274794586931220
ajaxdavis in #donto: It were the height of presumption to attempt to fix any
particular interpretation of my own to the words of this song. With what has
gone before, the reader, while reading it as poetry, should be able to seek
and find his own meaning in it. The doctrine of the Eternal Recurrence
appears for the last time here, in an art-form. Nietzsche lays stress upon
the fact that all happiness, all delight, longs for repetitions, and just as
a child cries "Again! Again!" to the adult who happens to be amusing him; so
the man who sees a meaning, and a joyful meaning, in existence must also cry
"Again!" and yet "Again!" to all his life.
(For reference: this is from Anthony M. Ludovici’s commentary in the Thus Spake Zarathustra introduction — the user noted afterward they “just copied that from a HN thread.”)
| Pass | Raw facts | Surviving (post-dedup) | Collisions | Elapsed |
|---|---|---|---|---|
| 1 | 132 | 132 | 0 | 124 s |
| 2 | FAILED — prose, no JSON | 0 | 0 | <1 s |
| 3 | 202 | 202 | 0 | 62 s |
| 4 | 204 | 202 | 2 | 127 s |
| 5 | 159 | 157 | 2 | 109 s |
| 6 | 158 | 156 | 2 | 142 s |
| 7 | 158 | 151 | 7 | 124 s |
| Σ | 1013 | 1000 | 13 | 854 s (incl. ingest) |
Observations on the curve:
Pass 2’s worker logged
LLM response decode: error decoding response body and
contributed 0 facts. The model returned a chatty preamble (“Here are the
new facts I’ve extracted that go beyond what the previous pass found…”)
instead of bare JSON. The orchestrator did the right thing: it logged
the failure, appended an empty result to the running set, and continued
to pass 3.
Action item: add a single-shot JSON-mode retry on parse failure. The model would almost certainly have produced JSON on the second attempt — losing one pass out of seven is a 14% capacity loss on a deep-mode run.
987 donto: (ontology entities — concepts, classes, instances)
13 discord: (Discord-level provenance)
The Discord-shell facts (user, channel, message-id, posted-in) all fit in the first 13 statements. The other 987 facts are all about the content — Nietzsche, the doctrine, the reader, repetition, joy, the song-form, the disclaimer, the rhetorical move, the lexicon.
42 descriptive (claims directly recoverable from the text surface)
958 inferred (interpretive moves: "X presupposes Y", "X exemplifies Y", "X conveys Y")
96% of the facts are inferred, not descriptive. This is exactly what a 109-word input should produce when you ask for “every possible fact” — the surface is small; the interpretive surface is enormous.
1.00 26 descriptive facts the model is sure about
0.95 193 high-confidence inferences (e.g. "reader has-interpretive-freedom")
0.90 361 median inference confidence
0.85 234 softer inferences
0.80 159 fuzzy / late-pass material
0.75 20
0.70 7 weakest inferences (typically pass 6–7 social-strategy speculation)
The model self-calibrates the way you’d hope: the bedrock claims (the author, the channel, the literal phrases) get 0.99–1.00; mid-confidence material is the doctrinal content; the lowest-confidence facts cluster in the later passes where the model is stretching for novelty.
352 rdf:type (typed instance assertions — ontology growth)
16 donto:concerns
14 donto:usesPhrase
12 donto:involves
11 donto:expresses
9 donto:includes
8 donto:requires
8 donto:hasObject
8 donto:contrastsWith
7 donto:produces
rdf:type dominates because deep-mode discovers
ontology, not just statements. A single sentence “the reader
should seek his own meaning” produces
donto:Reader rdf:type donto:Agent,
donto:Meaning rdf:type donto:Concept,
donto:MeaningSeeking rdf:type donto:CognitiveAct, etc. Each
new concept introduced gets typed. After 7 passes the model has minted
hundreds of new classes — which is good if you want a rich ontology to
grow and bad if you want a controlled vocabulary.
The 1000 facts cluster into four bands. Sampling from each:
discord:user:ajaxdavis rdf:type donto:DiscordUser
discord:channel:donto rdf:type donto:DiscordChannel
discord:message:f58d1d3b… donto:hasAuthor discord:user:ajaxdavis
discord:message:f58d1d3b… donto:postedIn discord:channel:donto
discord:message:f58d1d3b… donto:hasSentenceCount 5
discord:message:f58d1d3b… donto:hasMood donto:DeclarativeMood
discord:message:f58d1d3b… donto:hasRegister donto:FormalRegister
discord:message:f58d1d3b… donto:hasSentiment donto:ReflectiveSentiment
Clean. This is exactly the kind of provenance scaffolding the substrate needs to attach the content to a holder/session/message.
donto:Song rdf:type donto:ArtisticWork
donto:EternalRecurrence rdf:type donto:PhilosophicalDoctrine
donto:DoctrineOfEternalRecurrence donto:hasFinalAppearanceIn donto:ArtForm
donto:Nietzsche donto:laysStressOn donto:Repetition
donto:Repetition donto:isLongedForBy donto:Happiness
donto:Repetition donto:isLongedForBy donto:Delight
donto:Reader donto:readsAs donto:Poetry
These are the doctrinal claims the text actually makes. High confidence (0.9–1.0), descriptive or strong-inference.
donto:Interpretation donto:hasRisk donto:Presumption
donto:Interpretation donto:isFallible True
donto:Song donto:hasMultipleInterpretations True
donto:Song donto:isInterpretivelyOpen True
donto:Existence donto:canHaveJoyfulMeaning True
donto:MeaningConstruction donto:isSubjective True
donto:AjaxDavis donto:presupposes donto:ReaderCompetence
donto:ReaderCompetence donto:includes donto:PoeticSensitivity
donto:PoeticSensitivity rdf:type donto:AestheticCapacity
This is where deep-mode earns its keep. Pass 3+ surfaces structural claims about hermeneutics, the reader-author relationship, the rhetorical apparatus — none of which is directly stated in the text but all of which is legitimately implied by it. A single-shot extraction would never reach this layer.
donto:InterpretiveDisclaimer donto:hasIllocutionaryForce donto:DeclarativeAct
donto:InterpretiveDisclaimer donto:servesInterpersonalFunction donto:FaceSaving
donto:FaceSaving rdf:type donto:SocialStrategy
donto:AjaxDavis donto:manages donto:EpistemicReputation
donto:HeightOfPresumption rdf:type donto:IdiomaticExpression
donto:HeightOfPresumption donto:conveys donto:EpistemicHumility
donto:EpistemicHumility donto:contrastsWith donto:Presumption
donto:AjaxDavis donto:usesPhrase "with what has gone before"
donto:ContextualPreparation donto:enables donto:ReaderAutonomy
By pass 7 the model is reaching for speech-act theory, face-saving, idiom analysis, and rhetorical scaffolding. This is genuinely novel material — none of these claims overlap with bands 1–3. The collision rate ticking up to 4.4% in pass 7 says we’re scraping the bottom; pass 8 would probably collide more than it added.
Note that the model invented two equivalent identifiers for the same author:
discord:user:ajaxdavis rdf:type donto:DiscordUser (pass 1)
donto:AjaxDavis rdf:type donto:DiscordUser (pass 5)
donto:AjaxDavis donto:hasUsername "ajaxdavis" (pass 5)
This is a known donto problem (identity alignment is its own family in the spec) but it’s worth surfacing: deep-mode amplifies it. Passes 5+ tend to introduce capitalized, domain-style names for entities the earlier passes referred to with namespaced IDs. A post-extract identity-resolution pass (or stricter instructions to reuse prior IRIs verbatim) would help.
We had to bump the HTTP client timeout from 180 s to 900 s mid-experiment because pass 1 took ~125 s and pass 4 took ~127 s, both within the safe band — but a worst-case pass crashed against the old 180 s ceiling. Z.AI’s GLM-4.7 endpoint, under load, can take >3 minutes to return a long JSON response. 15-min client timeout is the right number.
Cloudflare cuts proxied HTTP at ~100 s. A synchronous 854 s
/memorize would always 524. The new pipeline:
/memorize validates input, writes a
(queued) row to donto_x_memory_job_log,
returns 202 with
{queue_id, status:"queued"} immediately.async_memorize_lock (a
single-permit Mutex — one extraction at a time) and runs
memorize_one to completion./jobs/<queue_id>/raw for
progress.The single-permit lock matters: deep-mode is expensive, and running two of them in parallel against the same GLM endpoint would burn quota with very little wall-time speedup.
The per-pass info! lines didn’t reach
journalctl during this run because the tracing subscriber’s
effective level filter was promoting to WARN at the binary level. The
work completed fine — the audit row in
donto_x_memory_job_log confirmed facts=1000 —
but observability was thin during the run. Fix: explicit
RUST_LOG=info,donto_memory=info,donto_memory_core=info in
the systemd unit, plus ensure the tracing init in
donto-memory/src/main.rs:60 doesn’t get clipped.
max_tokens from 8000 to 12000 —
the largest raw-yield pass (pass 4 at 204 facts) was approaching the
token ceiling; some facts were salvaged from truncated JSON by the
extract.rs salvage path.discord:user:ajaxdavis / donto:AjaxDavis /
donto:Reader (in the cases where Reader refers
to ajax) before ingest. Or instruct the model: “use the exact IRIs from
prior facts; do not create new IRIs for entities already
introduced.”ingested=998 vs unique=1000, what was rejected
and why.passes per modality — a
110-word philosophical passage benefits enormously from 7 passes. A
“yeah for real.” 3-word reply does not. The omega-bot client should pick
passes based on input length
(e.g. max(1, min(7, Math.ceil(wordCount / 30)))).Full job detail + 1000 facts: /jobs/d7ed356c-4c95-4ad6-a1e2-22bb2324ad3d
(raw JSON: append /raw)
Related context: donto substrate PRD, donto-memory omega-bot corpus audit.