# Extraction Engineering for Generative Abundance: Provider Rotation, Gleaning Loops, and the Coverage-not-Count Principle

*donto research report — 2026-06-04 — operational companion to [donto — The Substrate for Generative Abundance](donto-abundance-2026-06-02.html)*

> **Scope and honesty contract.** This is an engineering report, not a pitch. Every fact count, anchor rate, and namespace split below was re-verified **live** against `donto_statement` and `donto_evidence_link` on `donto-pg` on 2026-06-04 (read-only `SELECT`s; donto invariant I3 honored, nothing mutated). Where a number comes from a run log or an analysis script rather than the database, it is **labelled as such** — and the run-logs themselves (`run_codex_glean.py` outputs, `glean-spark.log`, `glean_smoke.out`, the v2-citer source) are preserved on the box, not asserted from memory. Where a claim is an emerging finding rather than a settled measurement, it is flagged. This is an **n=1 single-source study** (one article, EntryId 23778) — the findings are *directional engineering evidence*, not a population benchmark; that caveat applies to the whole report and is not repeated at every line. The companion report [The Cerebras / Codex Bake-off](donto-cerebras-bakeoff-2026-06-04.html) covers provider economics and the 5-way bake-off in more depth; this report adds the **gleaning loop**, the **count-vs-coverage saturation principle**, and the **"why 7×" decomposition** — the new material.

---

## 1. Executive summary

donto's thesis is that **generating typed knowledge is no longer the scarce step.** A guided frontier LLM emits an essentially unbounded, multi-directional space of evidence-anchored claims about any entity for fractions of a cent. The engineering question is therefore not *"can we generate enough?"* but *"how do we drive a model to extract **maximally** (exhaust the meaningful content of a source) and **sustainably** (keep the firehose running under real-world quota limits), while keeping each claim anchored to its source — and defer typing, alignment, and identity resolution to query time?"*

This session answered that question with measurements rather than assertions. The settled findings:

- **Provider economics force rotation, not a single lane.** The operator runs on flat-subscription / prepaid lanes only (per-token API is TOS-clean but unaffordable at this volume). Every such lane hard-caps the sustained firehose — z.ai's GLM coding sub hit its weekly cap (error 1310, reset 2026-06-10), Cerebras PAYG returned HTTP 402 (account out of credit), and ChatGPT-Pro/Codex carries a hidden weekly cap plus datacenter-IP ban risk. The answer is a **multi-lane router that rotates by leftover quota**, with a *declarative* cap-detection registry. Built and tested: the whole `donto-extract` repo suite is **91 tests passing**; the lanes module's own file `tests/test_lanes.py` holds **17 tests**.

- **A 5-way model bake-off on one source.** On the same article (EntryId 23778), five extraction configurations produced wildly different live fact counts and anchor rates. The **Codex-CLI runs anchored best** (76.5%, 86.4%, and up to 99.1% after gleaning) versus the opencode runs (47.2%, 69.6%) — but with an important harness caveat (below).

- **Models self-stop by *choice*, not by *budget*.** Three baseline single-shot runs landed at {496, 511, 426} facts; the gleaning loop that followed them was **never quota-capped** (the spark glean log records `usage_capped: false`, stopping on the pass cap with its last pass still adding 616 new keys). The model judges itself "done" long before it is exhausted. The supporting per-run reasoning-token figures (e.g. a 511-fact run reportedly spending ~587 reasoning tokens) come from run-logs, not the DB — see §5.1 and the appendix.

- **A harness gleaning loop fixes this.** `model_reasoning_effort=xhigh` + a resume-and-re-prompt loop ("you missed many, append ≥150 new, do not repeat") raised one gpt-5.4 run from **511 → 1,915 facts at 99.1% anchored, a single clean namespace, and 0.00% exact-duplicate triples**. A spark run reached **3,227 facts in 6 passes** and was *still climbing* (last pass added 616 new keys) — it stopped on the pass cap, not on saturation.

- **Count is the wrong target; coverage / saturation is right.** Pushing a raw count floor makes models **pad with noise**: a gpt-5.4 run chasing volume degenerated into garbage predicates (`containsCommaCharacter`, `answerWordCount`, `coordinateMentionsLatitudeLabel`). The disciplined spark glean (3,227 facts, ~90% camelCase, single namespace, 0 exact-dups) is *healthier* than count-chasing. **Abundance is not noise.**

- **The headline "7× more facts" from glm decomposes to ~1.4–1.7× real unique knowledge.** The naive ratio (glm 3,590 ÷ codex 511 = 7.0×) is mostly harness artifact: glm's run was **two un-deduped merged runs** under two near-disjoint namespaces, plus subject-level dual-namespace identity redundancy (31.8% of facts) and bare-flag restatements (18.5%). Raw counts are **not comparable across harnesses**.

The through-line: **maximize *meaningful* evidence-anchored coverage, anchor as a separable stage, rotate providers to stay alive, and let the substrate fold identity/typing at query time.** That is the abundance vision, operationalized and measured.

---

## 2. The abundance premise

For sixty years the bottleneck in every knowledge system was *generation* — a human had to author each typed fact. That scarcity is gone. The literature is unambiguous: GPTKB extracted ~105M triples at ~$0.00009 each; AutoSchemaKG built a 900M-node schema-free graph; per-token costs are falling roughly an order of magnitude per year. **Generation is now the cheap, abundant step.**

When generation is abundant, three sub-problems become the real work:

1. **Maximize meaningful yield.** A model left to its own devices stops extracting when it *feels* done, not when the source is exhausted (Section 5 shows the run was not budget-bound). Getting the last large fraction of a source's content out requires deliberate harness engineering.

2. **Keep every claim anchored.** A fact without a retrievable source span is, for donto's evidence-first model, a liability. Anchoring is a measurable quality axis independent of count (Sections 4–5), and it is best handled as a **separate stage** (Section 8).

3. **Defer joining.** Models invent predicates and entity IRIs as they go. Two runs will describe the same entity under `ex:` and `exo:`; one model will mint `birthPlace`, another `placeOfBirth`. donto's design answer is **emit free / untyped now, reconcile by similarity at query time** — never a hand-maintained synonym table. Section 7 shows this is not just philosophy: most of glm's apparent "7×" advantage *is* exactly this kind of identity redundancy, which query-time alignment is designed to fold.

This report is the operational companion to that vision: how to actually drive the firehose.

---

## 3. Provider economics — why every flat-sub lane caps, and the rotation answer

### 3.1 The constraint

The operator's hard constraint is cost. Per-token public APIs (OpenAI, Anthropic, etc.) are **TOS-clean and uncapped**, but **unaffordable** at sustained extraction volume. So extraction runs on **flat-subscription or prepaid lanes** — and *every one of them hard-caps the firehose*. Observed **live** this session:

| Lane | Cap mechanism (observed) | State this session |
|---|---|---|
| **z.ai GLM coding subscription** | Weekly cap → error **1310** (returned over HTTP 429) | Capped; reset **2026-06-10 16:43:35** |
| **Cerebras PAYG** | Out of credit → **HTTP 402** `payment_required` (account-wide) | Capped |
| **ChatGPT-Pro / Codex CLI** | Hidden weekly cap (~1–2 days at volume) + GCP-datacenter-IP ban risk for automated non-coding use | The only live route at time of writing |

> **TOS nuance (stated plainly).** The per-token public APIs are the *clean* path — they are simply unaffordable here. The subscription lanes are affordable but carry real terms-of-service exposure: the z.ai/GLM "coding" subscription is intended for *coding* assistance, and driving a 24/7 non-coding extraction firehose through it (or through ChatGPT-Pro from a datacenter IP) is **TOS-risky and an expiring subsidy**, not a durable foundation. The router below is an *availability* mechanism for research on a budget; it is **not** an endorsement of using a coding subscription as a production extraction backend. The durable answer is paying per-token (or self-hosting) once the work is funded.

The economic logic is inescapable: a flat subscription is priced for *interactive coding*, not a 24/7 extraction firehose. Any single lane will throttle. **The answer is to rotate lanes by leftover usage** — when one caps, jump to the next with quota remaining.

### 3.2 The multi-lane router

Built at `donto-extract/src/donto_extract/lanes/`. Design principles, all consistent with the no-brittle-logic rule (verified in the source this session):

- **Declarative cap-detection registry.** Each lane declares its cap *signatures* as a `CapSignature` data record (`registry.py`), not as an `if/elif` ladder buried in driver code: z.ai's `body_code=("1310",)`, Cerebras's `http_status=(402,)`, codex's usage-snapshot cap. The matcher in `caps.py` "matches the *data*" — there is no `if lane == ...` branch. New lanes/signatures are added declaratively.
- **Shared-pool awareness.** Codex exposes four model lanes that all carry `pool="codex"` (verified in `registry.py`) — they draw from **one** ChatGPT-Pro quota pool. So failover *within* Codex buys no headroom — when one Codex model caps, *all four* are capped. The registry's `pool` field is what lets the router know **failover must jump pool**, not just model. (The two Cerebras lanes likewise share `pool="cerebras"`.)
- **Non-cap discrimination.** A Codex `sunset-400` is *not* a quota cap (it's a deprecated-model signal) — the registry/matcher is built to treat it as not-a-cap so the router doesn't waste a failover hop on it.
- **CLI surface.** Verified flags in `lanes/cli.py`: `--lane`, `--auto`, `--context`, `--json`, `--no-probe`, `--timeout`. **There is no literal `--status` flag** — lane status is read through the registry view, not a top-level flag. (An earlier internal note claimed a `--status` flag and a router "91-test" count; both are corrected here — see the test footprint below.)

**Test footprint (corrected for honesty).** Verified by running the suite this session: the **whole** `donto-extract` repo suite is **91 tests passing** (`pytest tests/ -q` → `91 passed`); the lanes module's own file `tests/test_lanes.py` contains **17 tests**. The router is **not** independently "91 tests" — that figure is the entire repo. Both numbers are now correct in this report.

The rotation strategy is what converts a set of individually-capped lanes into a *sustained* firehose. None of it changes extraction *quality* — that's the next section.

---

## 4. The 5-way model bake-off

### 4.1 Read these caveats first

1. **Two different agentic harnesses.** Configurations A/B/C ran under **opencode**; D/E ran under the **Codex CLI**. These are different agent drivers. Extraction *quality* (anchor rate, namespace cleanliness) is comparable across them; **raw speed and raw count are not like-for-like.** Do not read the count column as a clean model ranking.
2. **glm's 3,590 was two merged runs.** The glm-4.7@Cerebras context is **not** one extraction — it is two un-deduped runs concatenated under two near-disjoint namespaces (live: `exo:` 2,327 + `ex:` 1,258 facts; only **30** distinct local subject-names appear in *both*, ~6% overlap). Treat its 3,590 as a *merged* figure, not a single-run yield. Section 7 decomposes this in full.

### 4.2 The table (all numbers live-verified 2026-06-04)

| # | Model / config | Harness | Live facts | Subj | Pred | Anchored | Anchor % | Namespaces |
|---|---|---|---|---|---|---|---|---|
| A | gpt-oss-120b @ Cerebras | opencode | 320 | 145 | 95 | 151 | **47.2%** | 1 (`ex:`) |
| B | glm-4.7 @ Cerebras *(two merged runs)* | opencode | 3,590 | 486 | 1,852 | 2,498 | **69.6%** | 5 (dom. `exo:`+`ex:`) |
| — | glm-4.7 @ z.ai | opencode | — | — | — | — | — | *never ran — capped (code 1310)* |
| D | gpt-5.4 (codex-normal) | Codex CLI | 511 | 119 | 292 | 391 | **76.5%** | 1 (`ex:`) |
| E | gpt-5.3-codex-spark (single run) | Codex CLI | 426 | 78 | 213 | 368 | **86.4%** | 1 |

Contexts (live): `ctx:test/cerebras-gptoss/23778`, `ctx:test/cerebras-glm/23778`, `ctx:test/codex-normal/23778`, `ctx:test/codex-spark/23778`. The z.ai glm lane (`ctx:test/zai-glm/23778`) has **0 rows** — it never executed because the lane was capped, an unintended but instructive demonstration of Section 3.

### 4.3 What the table says

- **Anchoring leadership is real and belongs to the Codex-CLI runs.** 76.5% and 86.4% (single-run) beat the opencode runs' 47.2% and 69.6%. A supplementary Codex smoke run (`ctx:test/codex-smoke/23778`, 496 facts) anchored at **99.2%** (492/496, live-verified). The Codex harness simply tracks its citations better.
- **Count alone is misleading.** glm's 3,590 looks dominant but is two merged runs riddled with dual-namespace identity redundancy (Section 7). The single-run Codex configs are *smaller but cleaner* (one namespace, higher anchoring).
- **glm's 1,852 distinct predicates** vs Codex's ~290 is the abundance signature — and the alignment problem — in miniature: free-minted predicates, to be reconciled at query time, not pruned at write time.

---

## 5. The gleaning loop — models self-stop by choice, not budget

### 5.1 The evidence that "done" is a judgment, not a ceiling

Three baseline single-shot Codex runs on the same source landed at **{496, 511, 426} facts**. The substantive claim — that the model is *satisficing*, not running out of budget — is supported two ways:

- **Directly, from the live glean log (the strongest evidence).** The 6-pass spark glean records `"usage_capped": false` and `"stop_reason": "max_passes (6)"` with its **final pass still adding 616 new keys**. The run was *not* throttled; it would have kept producing. This is verified in `glean-spark.log` on the box.
- **From per-run token accounting (run-log, not DB).** Internal notes record the {496,511,426}-fact runs against token budgets of roughly {127.6k, 104.1k, 89.1k}, with the 511-fact run reportedly spending only **~587 reasoning tokens** — i.e. count does not track spend. **This specific per-run reasoning-token figure is a run-log observation that the surviving logs did not let me re-derive line-for-line; treat it as directional, not DB-verified.** It is consistent with, but weaker than, the `usage_capped:false` evidence above.

Either way the operational insight holds: left alone, a frontier model extracts what it considers a reasonable, representative set and stops — it *satisfices*. For donto's maximal-extraction goal that is a bug, not a feature.

### 5.2 The fix: xhigh effort + a harness resume-and-re-prompt loop

Two levers, applied together:

1. **`model_reasoning_effort = xhigh`** — push the model to think harder per pass. (Verified in the spark run's rollout: all 10 turn-context efforts were honored as `xhigh`.)
2. **A harness gleaning loop** — `codex exec --resume <session-id>`, re-prompting each pass with coverage framing: *"you missed many; append ≥150 NEW facts; do not repeat what you already emitted."* Stop when a pass yields **<30 new keys twice in a row** (a saturation gate), or on a max-pass cap.

The framing matters: telling the model it *missed* things and asking for *coverage* (not "more facts") is what overrides the satisfice instinct.

### 5.3 Results (live-verified)

**gpt-5.4 glean** — `ctx:test/codex-glean-smoke/23778`:
- **511 → 1,915 facts** (a 3.7× lift over the single-shot baseline), **2 passes**.
- **99.1% anchored** (1,897 / 1,915, live-verified) — anchoring went *up*, not down.
- **Single `ex:` namespace; 0.00% exact-duplicate triples** (verified: 1,915 distinct subject+predicate+object triples, zero exact repeats).
- 376 subjects / 398 predicates. Wall time ~**2,083 s (~34.7 min)** (run-log). Caveat from the log: pass 1 hit the 1,500 s timeout (`timed_out: true`) yet still appended 1,298 facts before the cut; pass 2 added 617 in 583 s.

**spark glean** — `ctx:test/codex-glean-spark/23778`:
- **3,227 facts in 6 passes.** Per-pass cumulative keys (from the log): **1,245 → 1,922 → 2,074 → 2,267 → 2,611 → 3,227** (new keys per pass: 1,245 / 677 / 152 / 193 / 344 / **616**).
- **It stopped on the max-pass cap, NOT on the dry gate** (`stop_reason: max_passes (6)`, `usage_capped: false`) — the *last* pass added 616 new keys. The source was **not saturated**; there was more to extract.
- 50.0% self-anchored (1,613 / 3,227, live-verified) — lower, addressed by the post-hoc citer in Section 8. **~90% camelCase** predicates (live + log agree: **280 / 306 distinct = 91.5%**; **2,897 / 3,227 facts = 89.8%**). Single `ex:` namespace; 0.00% exact-duplicate triples. Wall time ~**1,306 s (~22 min)** (run-log).

### 5.4 Why the gpt-5.4 glean is the right exemplar

`511 → 1,915` is the cleanest "maximize meaningful yield" result in the whole set: a **3.7× count lift** that *simultaneously* **raised** anchor coverage to 99.1%, kept a **single clean namespace**, and produced **zero exact-duplicate triples**. That is meaningful abundance — more real anchored claims, not padding. It is the pattern the engine should default to.

---

## 6. Count is the wrong target — coverage and saturation are right

### 6.1 The cautionary tale

When the loop was instead given a raw **count floor** ("emit thousands"), a gpt-5.4 run obeyed — and **degenerated into noise.** The predicate tail collapsed into trivially-true string-property assertions — predicates like `answerWordCount`, `containsCommaCharacter`, `containsApostropheCharacter`, and (this source carries lat/long) `coordinateMentionsLatitudeLabel`. These are not knowledge *about the entity*; they are the model manufacturing filler to hit a number.

> **Caveat (honesty — this one is NOT a live DB context).** This padded run was **not** retained in live `donto_statement` (consistent with it being a rejected output). So its raw line count (recorded in run-logs around ~7,500–8,400 valid lines / ~560 distinct predicates) and the specific garbage-predicate names are **harness-log / on-disk observations, not a queryable context** — they are presented as observed-during-the-run, and I do not claim a precise live figure for it. The *contrast* it illustrates, however, rests entirely on live data: the disciplined spark glean's clean profile (**89.8% camelCase, 0.00% exact-dup, single namespace** — all verified above) is in the database.

### 6.2 The principle

**Abundance ≠ noise.** A count floor optimizes the wrong objective and the model games it. The right target is **exhaustive *meaningful* coverage, with saturation deciding "done"** — the operator's own framing: *"not an absolute number, just everything possible."*

The two glean runs make this concrete:
- The **spark glean (3,227, ~90% camelCase, single namespace, 0 exact-dups)** is *healthier* than a count-chasing run padded toward thousands of `containsCommaCharacter`-grade lines — even with fewer facts.
- And it *should* have kept going: it stopped on the pass cap with its last pass still adding 616 new keys. Saturation, not a number, is the correct stop condition — and here saturation had **not** been reached.

The operational rule that falls out: **drive to saturation (a falling new-key curve), reject count floors, and let the predicate-cleanliness profile (camelCase share, exact-dup rate, namespace count) be a live quality monitor.**

---

## 7. The "why 7×" decomposition — measured

glm's 3,590 vs codex-normal's 511 is a naive **7.0×**. Decomposed against live data, the *real* unique-knowledge multiple is **~1.4–1.7×**. The factors (DB-verified except the one row explicitly marked):

| Factor | Multiplier | Evidence |
|---|---|---|
| **Harness — two merged runs** | ~1.9× | glm context is two un-deduped runs: `exo:` 2,327 + `ex:` 1,258 facts (live); only **30** of ~483 distinct local subject-names appear in *both* namespaces (~6% overlap → near-disjoint, never deduped) |
| **Granularity / redundancy** | ~1.5× | **31.8%** of glm facts (**1,140 / 3,590**, live) sit on subjects whose local name exists in **both** `ex:` and `exo:` — the *same entity described twice under two IRIs*. Plus **18.5%** (**663 / 3,590**, live) bare `true`/`1`/`yes` flag restatements |
| **Q&A reification** | ~1.1× | The source is a Q&A transcript; glm reified questions/answers as extra subjects |
| **Real extra source coverage** | ~1.45× | Span-union analysis: glm touched ~93.3% of source characters vs codex ~64.6% — **analysis-script figure, not recomputable from DB counts alone (see §7.1)** |

**Net:** ≈ 1.9 × 1.5 × 1.1 × 1.45 ≈ 4.5× of "apparent" advantage is artifact; the **honest unique-knowledge multiple is ~1.4–1.7×.** (These four factors are not cleanly orthogonal, so the decomposition is an estimate, not an identity — but every input number is sourced.)

### 7.1 Two precisions that matter for honesty

- **The dual-namespace figure is *subject-level* redundancy, not exact-triple restatement.** Live: **31.8%** (1,140/3,590) of facts are on subjects appearing under *both* IRIs — an identity-resolution tax. The *exact* subject+predicate+object dual-namespace duplication is only **~1.2%** (42 facts / 21 shared keys, live). Both are real; this report uses the right one in each place. The 31.8% is precisely what donto's **query-time identity alignment** is designed to fold — it is not waste, it is *deferred joining made visible*.
- **The 93.3%-vs-64.6% source-character coverage is the one number not recomputable from the DB alone** — it needs the source text plus span char-offsets, so it is a **span-union measurement from the analysis script**, labelled as such. The anchor-span *counts* that feed it **are** DB-verified (glm 2,498 anchored facts; codex-normal 391).

### 7.2 The lesson

**Raw counts are not comparable across harnesses.** glm's "7×" is mostly merged-run concatenation plus dual-namespace identity redundancy — exactly the kind of thing the substrate reconciles at query time. Meanwhile **Codex stops by *choice*, not budget**, and its single disciplined namespace means multi-session UNION dedups *cleanly* — no `ex:`/`exo:` identity-duplication tax to pay later. The right target remains **exhaustive meaningful coverage judged by saturation, with identity and typing reconciled downstream** — donto's emit-free / defer-joining thesis, validated by the numbers.

---

## 8. Forward: always-on post-hoc citing (emerging — honest)

The gleaning loop maximizes *yield*; anchoring is a *separate* quality axis, and the cleanest way to handle it is to **separate extraction from anchoring** — let the model extract freely, then run an **always-on post-hoc citer** that locates a supporting span for each emitted fact.

**v1 result (verified two ways).** The citer lifted the spark glean's anchoring substantially. Measured against the *citer's own input fact file*, it raised anchor coverage **47.1% → 90.2%**. Measured the donto-native way — distinct live self-anchored statements in `donto_statement`/`donto_evidence_link` — the re-ingested context `ctx:test/spark-cited/23778` (3,229 facts; the re-ingest added 2 vs the 3,227 source) goes from **50.0% → 91.9% anchored (2,969 / 3,229, live-verified)**. (The two bases — input-file anchor rate vs DB self-anchor rate — differ slightly; both are reported so neither is cherry-picked.) Either way it is a large, real coverage win, and it confirms the architectural bet: anchoring as a downstream stage works.

**v1 problem (honest, do not overclaim).** An adversarial audit found that a *locatable* citation is not always a *supporting* one. On a small, deliberately relational-skewed sample, roughly **~40–46% of the recovered *relational* citations were wrong** — the citer found a span that *mentions* an entity but does not *support the asserted relation* (e.g. `question-34 askedBy mr-watts` anchored to a different question's "By Mr. WATTS:" line because both share the token `watts`). The ~40% figure is documented in the v2-citer source header; the ~46% figure comes from an n=30 adversarial-judge sample (relational subset n≈26, ~12 wrong) recorded in the run-logs. This is a **worst-case adversarial probe on a curated relational sample, not a population rate** — most facts are simpler attributive/literal claims the citer handles correctly (those anchored fine in v1). But it is a genuine correctness gap, and the exact rate is a run-log/audit figure, not a DB-verified population statistic.

**v2 (in progress).** An **instance-aware co-location gate** (real, in-tree at `cite_facts_v2.py`): classify each fact *structurally* by object type — literal objects keep the v1 lexical layer; IRI (relational) objects must find a span that **co-locates BOTH endpoints** (the subject's *distinguishing* token, weighted by inverse batch document-frequency — IDF computed from the data, never a hand-maintained stopword/synonym list — AND the object's value token) in the *same* window, else route to semantic, else mark `unanchorable` rather than attach a plausible-looking neighbour. **A wrong span is worse than none.** This is **emerging work, not a settled result** — its task is in progress this session and **no precise post-fix correctness number is claimed here.**

The takeaway for the architecture: **anchor as a separate, auditable stage; measure not just coverage but correctness; and treat "locatable ≠ supporting" as a first-class problem.**

---

## 9. What this means for the substrate

Operationalizing generative abundance comes down to four engineering commitments, each now backed by live measurement:

1. **Maximize *meaningful* coverage, not count.** Models satisfice and self-stop *while still producing* (the spark glean was `usage_capped: false` with a 616-new-key final pass). Drive them to **saturation** with xhigh effort + a resume/re-prompt gleaning loop and coverage framing. The exemplar: **511 → 1,915 facts at 99.1% anchored, single namespace, 0 exact-dups.** Reject count floors — they produce `containsCommaCharacter`-grade noise.

2. **Anchor as a separable stage.** Extraction and citation are different quality axes. An always-on post-hoc citer lifted anchoring **50.0% → 91.9%** (DB-verified) — but "locatable ≠ supporting," so the citer needs a co-location correctness gate (v2, in progress). Measure coverage *and* correctness.

3. **Rotate providers to stay alive.** Every flat-sub/prepaid lane caps (z.ai 1310, Cerebras 402, Codex weekly + IP risk). A declarative, pool-aware multi-lane router (91-test repo, 17 lane-specific tests) converts individually-capped lanes into a sustained firehose. This is purely an *availability* mechanism — it does not alter quality, and it is a budget-research expedient, not a TOS-clean production backend (§3.1).

4. **Defer alignment and typing to query time.** Raw counts are not comparable across harnesses, and most of glm's "7×" advantage is identity redundancy (31.8% of facts on dual-IRI subjects, two merged runs) — *exactly* what query-time identity alignment is designed to fold. Emit free, untyped, multi-namespace now; reconcile by similarity later. Do **not** maintain synonym tables or namespace-merge maps by hand.

The data demonstrates the abundance thesis three ways: the gleaning loop **raises yield 3.7× while raising anchor coverage to 99.1%** and keeping one clean namespace (meaningful, not padded); the **saturation principle beats a count floor** (clean spark glean > degenerate count-padding); and the **why-7× decomposition** shows the substrate's deferred-joining design is not a hedge but the *right* place to absorb the identity/typing tax that abundant generation necessarily produces.

Generation is cheap. The engineering — and the win — is in maximizing meaningful, anchored coverage and pushing everything else to query time. This is **n=1, directional**: one source, one session. The next step is to repeat it across many sources, domains, and models before any of these multipliers is treated as a constant.

---

### Appendix — verification method

All live counts: `donto_statement WHERE upper(tx_time) IS NULL` with exact `context =` equality on `donto-pg`, 2026-06-04. **Anchored** = distinct live statements whose `statement_id` is the source of ≥1 live `donto_evidence_link` (`upper(tx_time) IS NULL`), via `LEFT JOIN` on a distinct-`statement_id` subquery. `statement_timeout` 120–300 s. Read-only `SELECT`s only; donto invariant **I3** (no destructive overwrite) honored — nothing mutated. The 91-test suite was re-run this session (`pytest tests/ -q` → `91 passed`); the lanes registry/cap/CLI claims were read from `donto-extract/src/donto_extract/lanes/{registry,caps,cli}.py`.

**Figures that are NOT recomputable from `donto_statement` and are labelled so in-text:** per-pass new-key counts, wall times, and the saturation/`usage_capped` flags come from `glean-spark.log` and `glean_smoke.out` (run-logs preserved on the box); per-run reasoning-token budgets (incl. the ~587-token datum) come from `run_codex_glean.py` run-logs and are directional; the source-character-coverage percentages (93.3% / 64.6%) come from a span-union analysis script that needs source text + span offsets; the padded-run line counts and garbage-predicate names are on-disk/log observations of a run that was *not* retained in the DB; and the adversarial mis-anchor rate (~40–46%) comes from the v2-citer source header plus an n=30 audit log. Every *other* number in this report is DB-verified.

---

**Companion reports:** [donto — The Substrate for Generative Abundance (canonical vision)](donto-abundance-2026-06-02.html) · [Driving Frontier LLMs to Extract Maximally and Sustainably](donto-maximal-extraction-2026-06-04.html) · [The Cerebras / Codex Bake-off (provider economics + 5-way detail)](donto-cerebras-bakeoff-2026-06-04.html) · [The donto Extraction System](donto-extraction-system-2026-06-03.html)