# donto — Extraction-Provider Bake-Off: Cerebras vs. z.ai vs. Codex, a Faithful 5-Way Run

**A measured, single-article comparison of five LLM extraction back-ends — across TWO different agentic harnesses (OpenCode and the Codex CLI) — driving donto's real lens-sweep controller. Goal: pick a cheaper, faster, *available* extraction provider for the months ahead without losing faithfulness. 2026-06-04.**

> **What this is.** donto's extraction front door currently runs **GLM-4.7 via z.ai's "coding" subscription** through an agentic OpenCode driver. That path is fast enough but it is an expiring subsidy, TOS-risky for non-coding use, and rate-capped. This report tests four alternatives against the incumbent, on the *same* article, with the *same* lens-sweep prompt and ingest path, reading results out of *live* `donto_statement`. Three (A, B, C) ran through **OpenCode** inside the omega container; two (D, E) ran through the **Codex CLI on the host** on the user's ChatGPT Pro subscription — a **different agentic harness, no API credit**. Every number below is from the production box on 2026-06-04, not estimated. Where a provider could not run, that is stated plainly and the cap was probed live, not assumed. **As of this writing, both Cerebras and z.ai are quota-exhausted, and the ChatGPT-Pro Codex path is the only live extraction route.**

---

## 0. One-paragraph orientation

donto is a **bitemporal, paraconsistent, evidence-first** knowledge substrate built for the age of generative abundance: generation of typed claims is now cheap, so the engine's job is **maximal faithful capture** — emit free/untyped, multi-directional, evidence-anchored claims and defer typing/alignment/identity/joining to query time. The extraction engine that feeds it drives an **agentic CLI** over an LLM provider, sweeping a document through a broad lens prompt across multiple passes ("loop until dry"), then ingests the parsed facts as anchored statements. In this experiment **two things vary**: (1) *which model on which hardware* does the inference, and (2) *which agentic harness* drives the loop — OpenCode (A/B/C) or the Codex CLI (D/E). Everything else — the article, the lens-sweep prompt, the multi-pass "loop until dry" mechanism, the `{s,p,o,a,c,h}` fact shape, the parse/normalise/ingest path, and the live-DB measurement — is held fixed, so the five contexts are directly comparable.

---

## 1. Why a provider bake-off, and why five-way

The incumbent extraction provider (GLM-4.7 on z.ai's coding subscription) has three problems we want to engineer away:

1. **Cost trajectory.** It is a flat-rate *coding* subscription used for non-coding extraction — an expiring subsidy, and per CLAUDE.md §4 a TOS-risk. We need a path we can run at volume for months.
2. **Throughput / latency.** Extraction is the bottleneck in every consumer (genealogy, memory). Faster inference directly multiplies how much of the firehose we can capture per day.
3. **Hard caps.** The subscription enforces weekly/monthly limits; when it caps, *all* extraction stops.

**Cerebras** was the first candidate: wafer-scale inference is the fastest tokens/sec generally available, and it hosts both a small open model (**gpt-oss-120b**) and — in preview — **GLM-4.7 itself** (`zai-glm-4.7`). That second fact made a clean OpenCode-internal three-way design possible (model effect on identical hardware; hardware effect on the identical model).

But **both Cerebras and z.ai are now quota-exhausted** (§4, §5). So this iteration adds a **second harness and a second account entirely**: the **Codex CLI** (`codex-cli 0.130.0`) running OpenAI models on the user's **ChatGPT Pro subscription** — which uses **no API credit** and is **not subject to either capped account**. That gives two more live providers:

- **(A) gpt-oss-120b @ Cerebras** — small, fast, cheap open model on wafer-scale hardware (via OpenCode).
- **(B) glm-4.7 (`zai-glm-4.7`) @ Cerebras** — the incumbent *model*, on the challenger *hardware*, preview (via OpenCode).
- **(C) glm-4.7 @ z.ai** — the **incumbent production path**, unchanged (via OpenCode).
- **(D) codex-normal — gpt-5.4 via Codex CLI** — OpenAI's deeper general model on ChatGPT Pro (host, no API credit).
- **(E) codex-spark — gpt-5.3-codex-spark via Codex CLI** — OpenAI's Cerebras-accelerated codex model on ChatGPT Pro (host, no API credit).

What the five-way design now isolates:

| Comparison | Holds fixed | Varies | What it isolates |
|---|---|---|---|
| **A vs B** | OpenCode harness, article, **hardware (Cerebras)** | the model (gpt-oss-120b vs glm-4.7) | **model effect** on identical hardware |
| **B vs C** | **model (glm-4.7)**, OpenCode harness, article | the hardware/provider (Cerebras vs z.ai) | **hardware/provider effect** on the identical model |
| **D vs E** | **Codex CLI harness**, article, ChatGPT-Pro account | the OpenAI model (gpt-5.4 vs gpt-5.3-codex-spark) | **depth-vs-speed** within the live Codex path |
| **(A–C) vs (D–E)** | article, lens prompt, fact shape, ingest path | the **agentic harness + provider account** | **harness/availability effect** — the only currently-live route |

---

## 2. The two integration problems we hit on the OpenCode/Cerebras path (and fixed / worked around)

Standing up Cerebras behind donto's OpenCode driver surfaced two distinct, non-obvious failures. Both are documented here because they will recur for anyone wiring an agentic CLI to a reasoning-model endpoint. **Neither applies to the Codex CLI path (D/E)**, which runs on the host with its own shell tool and its own multi-turn loop — see §2.3.

### 2.1 BLOCKER — `reasoning_content` echo → Cerebras HTTP 400 (FIXED)

OpenCode (v1.15.13) uses the Vercel AI-SDK, which, on a multi-turn agentic loop, **echoes the assistant's prior `reasoning_content` / `reasoning` fields back into the next request's message array.** Cerebras's chat-completions endpoint rejects those fields on inbound requests with an HTTP **400**, so the agent loop died on turn two and produced **0 facts**. This is purely an integration artifact — the model is fine; the SDK is replaying a field the upstream won't accept on input.

**Fix:** a tiny **sanitizing reverse-proxy** in front of Cerebras (`/mnt/donto-data/workspace/donto-align/cerebras_proxy.py`, listening on `172.18.0.1:8089`). It forwards chat-completions to `api.cerebras.ai` but **strips `reasoning_content`/`reasoning` from every message in the outbound request body** before relaying. OpenCode points its `provider.cerebras.options.baseURL` at the proxy instead of the upstream. With the proxy in place the agent loop runs to completion. (For the preview `zai-glm-4.7` model the proxy is a **hard runtime dependency** — kill the proxy and B fails even with billing intact. The proxy was verified up, HTTP 200 on `/v1/models`, throughout this session.)

### 2.2 BUG — OpenCode slot-pool deadlock, blind to free slots (WORKED AROUND)

donto bounds concurrent OpenCode subprocesses host-wide with a `flock` over `OPENCODE_MAX_CONCURRENT` slot files. The acquire loop only ever **looks at slot indices `0 .. MAX_CONCURRENT-1`.** The ~10 production frontier-extraction jobs already running hold a set of slot files; if a benchmark process is launched with a **lower** `MAX_CONCURRENT` than the number of busy slots, it iterates only the low indices, finds them all held, and **never even checks the free higher-numbered slot files** — it deadlocks behind production instead of waiting for a genuinely free slot.

**Work-around (per the run protocol):** set **`OPENCODE_MAX_CONCURRENT=16`** on every benchmark invocation so the acquire loop scans *all* slot files and waits for a free one. Benchmark extractions were run **sequentially** as **root**. This is a work-around, not a fix; the underlying loop should scan a dynamically-sized pool, not a fixed range. **(D/E avoid this entirely — the Codex CLI does not touch the OpenCode slot pool.)**

### 2.3 The Codex CLI harness (D/E) — what differs

Providers D and E do **not** use OpenCode at all. They run the **Codex CLI** (`codex-cli 0.130.0`) headless **on the host** under the user's ChatGPT Pro auth:

```
codex exec --dangerously-bypass-approvals-and-sandbox -C <run_dir> -c model="<MODEL>" "<lens-sweep prompt>"
```

- Codex has its **own shell tool and its own multi-turn loop**. Given the **verbatim `extract_broad.txt` brief** (plus a short preamble pinning it to the OUTPUT MECHANISM), it reads `source.txt`, then **`cat >> facts.jsonl <<'JSONL_EOF' … JSONL_EOF`-appends** batches of compact JSONL across multiple passes, re-scanning the source each pass, until it self-judges the source exhausted — i.e. it performs donto's "loop until dry" natively.
- **One required workaround:** Codex's *default* model `gpt-5.3-codex` is **SUNSET for ChatGPT accounts (HTTP 400 "model not supported")**, so every invocation must pass `-c model=` with a supported id. Verified-working supported ids: `gpt-5.4`, `gpt-5.4-mini`, `gpt-5.3-codex-spark` (Pro), `gpt-5.5`. We used **`gpt-5.4` for D** and **`gpt-5.3-codex-spark` for E**.
- The resulting `facts.jsonl` is parsed and ingested by the **exact same path** as the OpenCode providers — `opencode_extract._parse_jsonl` / `_normalize_fact` / `_valid` / `_key`, then `helpers.register_source_document` + `helpers.ingest_facts` — so the output is directly comparable and anchored to a registered source revision just like A/B. Run script: `/tmp/cerebras-test/run_codex_extract.py`.

> **Harness caveat (faithful representation).** D and E are a **different agentic harness on a different account** than A–C. The *prompt, fact shape, and ingest path are identical*, so the **extraction quality** numbers are comparable — but **wall-time and "passes" are not** strictly apples-to-apples across harnesses (different tool-call overhead, different loop heuristics, host vs container). Read D/E wall-times as Codex-CLI figures, not as a like-for-like speed test against OpenCode.

---

## 3. Setup

| Component | Value |
|---|---|
| **Article** | frontier `EntryId 23778` — *"Attack on Aboriginal people — Bundamba Lagoon (August 1860)"*, ~14,600 chars. Reused verbatim: `/tmp/cerebras-test/source.txt`. |
| **Prompt** | the broad lens-sweep `prompts/extract_broad.txt`, identical for all five (D/E prepend only a short harness preamble that restates the prompt's own OUTPUT MECHANISM). |
| **Controllers** | A/B/C: production multi-pass "loop-until-dry" driver `opencode_extract.extract_facts_opencode` over `OpenCodeAgent` (headless OpenCode). D/E: the **Codex CLI's** own shell-tool + multi-turn loop (`codex exec`, host). |
| **Ingest** | the normal donto-api path for **all five**: register source document + revision, parse `{s,p,o,a,c,h}` via `opencode_extract._parse_jsonl`, ingest facts, attach evidence spans. |
| **Measurement** | counts read from **live `donto_statement`** (rows where `upper(tx_time) IS NULL`); anchoring from `donto_evidence_link`. Spot-checked 2026-06-04. |
| **Contexts** | A → `ctx:test/cerebras-gptoss/23778`; B → `ctx:test/cerebras-glm/23778`; C → `ctx:test/zai-glm/23778`; **D → `ctx:test/codex-normal/23778`; E → `ctx:test/codex-spark/23778`**. |

Provider configs: **A/B** point OpenCode's `cerebras` provider at the sanitizing proxy (`baseURL=http://172.18.0.1:8089/v1`), key from `/etc/donto/cerebras.env`; **C** uses the unchanged production z.ai config (`https://api.z.ai/api/coding/paas/v4`, model `glm-4.7`, `GLM_API_KEY`); **D/E** use the Codex CLI on host ChatGPT-Pro auth (no API key/credit), `codex exec … -c model=gpt-5.4` / `gpt-5.3-codex-spark`.

---

## 4. Results

### 4.1 The 5-way table (all figures from live `donto_statement`, spot-checked 2026-06-04)

| Axis | (A) gpt-oss-120b @ Cerebras | (B) glm-4.7 @ Cerebras | (C) glm-4.7 @ z.ai (INCUMBENT) | (D) codex-normal gpt-5.4 | (E) codex-spark gpt-5.3-codex-spark |
|---|---|---|---|---|---|
| **Harness** | OpenCode (container) | OpenCode (container) | OpenCode (container) | **Codex CLI (host)** | **Codex CLI (host)** |
| **Account / billing** | Cerebras PAYG | Cerebras PAYG (preview) | z.ai coding sub | **ChatGPT Pro (no API credit)** | **ChatGPT Pro (no API credit)** |
| **Live context** | `…/cerebras-gptoss/23778` | `…/cerebras-glm/23778` | `…/zai-glm/23778` | `…/codex-normal/23778` | `…/codex-spark/23778` |
| **Status** | ✅ real run | ✅ real run (prior); fresh re-run BLOCKED | ❌ blocked — 0 facts | ✅ **real run, LIVE path** | ✅ **real run, LIVE path** |
| **Live facts** | **320** | **3,590** | **0** | **511** | **426** |
| **Distinct subjects** | 145 | 486 | 0 | 119 | 78 |
| **Distinct predicates** | 95 | 1,852 | 0 | 292 | 213 |
| **Anchored (≥1 evidence_link)** | **151 / 320 = 47.2%** | **2,498 / 3,590 = 69.6%** | — | **391 / 511 = 76.5%** | **368 / 426 = 86.4%** |
| **Object split (IRI / literal)** | 141 / 179 | 748 / 2,842 | — | 265 / 246 | 162 / 264 |
| **Predicate-style hygiene** | 79/93 bare-camel ≈ 83.2%; 2 vocab (`rdf:type`,`rdfs:label`); 0 kebab/space | ≈89.7% camel among bare; 674 `:`-prefixed minted preds dilute all-pred camel to 56.9% | n/a | **273/292 = 93.5% camelCase**; 2 vocab (`rdf:type`,`rdfs:label`); 0 kebab, 0 clause/spaced | **90.1% camelCase** (192/213 distinct); 0 kebab, 0 clause/spaced; one faithful artifact `damagedCattle?` (trailing `?` preserved as emitted) |
| **Source attribution** | edge-style (`reportedIn`/`attestedBy`) | edge-style + some source-baked preds | n/a | **clean edges** (`reportedIn`,`attestedBy`,`accordingTo`) not baked into predicate names | **clean edges** — e.g. 6+ distinct `attestedBy` edges on the event |
| **JSONL cleanliness** | clean | clean | n/a | clean | clean |
| **Controller / loop** | toolloop fired; 3 passes (196→303→320) | toolloop fired; multi-pass | did **not** fire | toolloop fired; multi-pass, self-judged done | toolloop fired; multi-pass (58→…→432 raw → 426 ingested), self-judged done; **retry-on-empty NOT needed** |
| **Wall time** | **71.8 s** | ~prior run; re-run fast-fails at cap | 0 s | **461.1 s** (104k tokens) | **60.3 s** (89k tokens) |
| **Block reason** | — | HTTP **402 `payment_required`** — Cerebras account-wide cap (proxy + direct, ~0.14 s) | z.ai code **1310** "Weekly/Monthly Limit Exhausted, reset **2026-06-10 16:43:35**" | — | — |

> **Note on E's camel-%.** A strict classifier that rejects the trailing `?` counts 192/213 = **90.1%** camelCase (the `damagedCattle?` artifact is the one excluded). A lenient regex that tolerates the `?` counts 193/213. Either way, **0 kebab and 0 clause/spaced** predicates — strong style discipline, the artifact faithfully preserved rather than silently normalised.

### 4.2 The isolated axes

- **Model effect on Cerebras hardware (A vs B).** glm-4.7 (B) extracts **11.2× more facts** than gpt-oss-120b (A): 3,590 vs 320, higher anchoring (69.6% vs 47.2%), far more entities (486 vs 145). On identical hardware the bigger model is the depth winner by an order of magnitude. **(Both currently blocked for fresh runs.)**
- **Hardware/provider effect on glm-4.7 (B vs C).** Still **could not be measured head-to-head** — C is z.ai-1310-capped (reset 2026-06-10) and B's fresh re-run hit Cerebras 402. Same-model-different-hardware remains open.
- **Depth vs speed within the live Codex path (D vs E).** codex-spark (E) is **~7.6× faster** than codex-normal (D) — **60.3 s vs 461.1 s** — for **~83% of the facts** (426 vs 511) at **higher anchor fidelity** (86.4% vs 76.5%) and a tighter, more focused entity set (78 vs 119 subjects). D goes deeper/broader (more facts, more subjects, more predicates) at ~7.6× the wall-time; E is the fast, high-anchor option. Both used clean edge-style source attribution and disciplined camelCase predicates.
- **Harness / availability effect ((A–C) vs (D–E)).** With Cerebras (402) and z.ai (1310) **both capped**, the **only currently-live extraction route is the Codex CLI on ChatGPT Pro.** It is not the deepest path measured (B's 3,590 dwarfs everything), but among *runnable-right-now* providers it is the depth leader: D's 511 and E's 426 both exceed A's 320, at **higher anchoring than any OpenCode run** (76.5% / 86.4% vs A's 47.2% / B's 69.6%) and the cleanest predicate hygiene (93.5% / 90.1% camel, 0 kebab/clause).

### 4.3 On predicate counts and the abundance signature

B's 1,852 distinct predicates are inflated by **674 `:`-prefixed, clause-style minted predicates** — high-resolution but ballooning the raw count (all-pred camel diluted to 56.9%; ≈89.7% among un-prefixed). The Codex runs (D/E) show the **opposite, tighter** profile: 292 / 213 distinct predicates, **0 clause-style and 0 kebab**, 93.5% / 90.1% camelCase, and source attribution modeled as **clean edges** (`reportedIn`, `attestedBy`, `accordingTo`) rather than baked into predicate names — exactly what `extract_broad.txt` asks for. Either profile is aligned **at query time** by the substrate's alignment engine, **not** a static map (CLAUDE.md no-brittle-logic rule); but the Codex output needs *less* query-time predicate-folding to be load-bearing.

---

## 5. Quality verdict

**FAITHFUL on the live data. The headline change since the 3-way run: with Cerebras (402) and z.ai (1310) BOTH quota-exhausted, the ChatGPT-Pro Codex CLI is now the only live extraction path — and it works well.**

- **Depth (all-time).** (B) glm-4.7 @ Cerebras remains the depth champion — 3,590 facts — but it is **not runnable right now** (402). Among **currently-live** providers, the Codex runs lead: **(D) gpt-5.4 = 511**, **(E) gpt-5.3-codex-spark = 426**, both above (A)'s 320; (C) is zero (never executed).
- **Anchoring.** The Codex runs are the **best-anchored of the whole field**: **(E) 86.4%** and **(D) 76.5%**, beating (B) 69.6% and (A) 47.2%. Higher anchoring means more facts carry a retrievable evidence span — directly the donto evidence-first goal.
- **Cleanliness / style.** All live artifacts are clean (0 empty objects, 0 JSON-leak). The Codex runs have the **cleanest predicate hygiene** (93.5% / 90.1% camel, **0 kebab, 0 clause/spaced**) and model source attribution as **clean edges**, not source-baked predicates. One faithful artifact in E (`damagedCattle?`, trailing `?` preserved as emitted) — recorded, not hidden.
- **Did each faithfully drive the loop?** (A) yes (3 passes). (B) yes on its standing artifact (multi-pass, 69.6% anchored). (C) **no** — provider returns 1310 before any completion (true cap failure, not a code bug). **(D) yes** — Codex toolloop fired, multi-pass, self-judged done, exit 0, 461.1 s, 104k tokens. **(E) yes** — Codex toolloop fired, multi-pass (58→…→432 raw lines → 426 ingested), self-judged done, exit 0, **60.3 s, 89k tokens; the codex-cli retry-on-empty path was NOT needed** — genuine extraction on the first attempt, `retries=0`.
- **Harness caveat stated plainly.** D and E run via the **Codex CLI on the host on ChatGPT Pro (no API credit)** — a *different agentic harness and account* than the OpenCode-driven A/B/C. Same prompt + fact shape + ingest path → extraction-quality numbers are comparable; wall-time/passes are Codex-CLI figures, not a like-for-like harness speed test.
- **Caps verified live, this session.** Cerebras = account-wide `payment_required`/402 (both keys, proxy + direct, ~0.14 s); z.ai = code 1310 weekly/monthly cap (host + container keys, identical reset = one account). Neither is a slot deadlock, proxy fault, or code bug.
- **I3 honored.** All five contexts verified by `SELECT` only. The two new Codex contexts are **inserts-only**: D = 511 rows, all 511 live, 0 closed/retracted/superseded; E = 426 rows, all 426 live, 0 closed/retracted/superseded. No retract / supersede / delete touched `donto_statement` anywhere.

---

## 6. Economics

donto's extraction cost is dominated by tokens generated per document × documents per day, **and by which accounts are not capped.**

| Provider / plan | Shape | Indicative rate | Fit for donto extraction |
|---|---|---|---|
| **Cerebras PAYG** (gpt-oss-120b) | per-token | low-single-digit **\$ / Mtok** | best \$/fact for shallow sweeps — **currently 402-capped** |
| **Cerebras PAYG** (`zai-glm-4.7`, preview) | per-token | preview pricing; verify | best depth/\$ if rate reasonable; needs sanitizing proxy — **currently 402-capped** |
| **z.ai GLM coding subscription** (incumbent) | flat-rate | fixed \$ / month | in use; **weekly/monthly cap stops all extraction** (1310, reset 2026-06-10); TOS-risky; expiring subsidy |
| **ChatGPT Pro — Codex CLI / gpt-5.3-codex-spark (E)** | flat-rate Pro sub | **\$0 marginal — no API credit** | **the only live path now**: fast (60 s), high-anchor (86.4%), clean predicates; Cerebras-accelerated OpenAI codex on an already-paid Pro sub |
| **ChatGPT Pro — Codex CLI / gpt-5.4 (D)** | flat-rate Pro sub | **\$0 marginal — no API credit** | live, deeper/slower (511 facts, 461 s); use when depth matters more than wall-time |

The economic story has shifted. The all-time depth argument (B's ~11× over A) still holds *when Cerebras billing is live* — but **right now both per-token providers are capped, and the ChatGPT-Pro Codex path costs no marginal API credit** (it draws on a subscription the user already pays for). Among runnable providers, **codex-spark (E)** gives the best speed × anchoring × cleanliness at zero marginal cost; **codex-normal (D)** trades ~7.6× wall-time for more depth. The flat-rate posture means it does not meter per fact — attractive for steady volume, subject to whatever throughput limits the Pro sub enforces.

**Caveat:** exact \$/Mtok for the preview `zai-glm-4.7` on Cerebras was never pinned (account in `payment_required`); the Codex path has no per-token meter to pin, only the Pro subscription's own usage limits.

---

## 7. Recommendation

**Live-now primary — use the Codex CLI on ChatGPT Pro as the working extraction path while Cerebras and z.ai are capped.** It is the **only runnable route today**, costs **no marginal API credit**, and on this article it produced the **best-anchored, cleanest-predicate** extractions in the whole field.

- **For throughput now → (E) codex-spark (`gpt-5.3-codex-spark`).** Cerebras-accelerated OpenAI codex via the ChatGPT-Pro sub: **60.3 s**, 426 facts, **86.4% anchored** (the highest in the field), 90.1% camel, 0 kebab/clause. ~7.6× faster than gpt-5.4 for ~83% of the facts at higher anchor fidelity. The default live engine.
- **For depth now → (D) codex-normal (`gpt-5.4`).** Deeper and broader (511 facts / 119 subjects / 292 predicates, 76.5% anchored, 93.5% camel) at ~7.6× the wall-time (461 s). Use it when coverage matters more than latency on a given document, still at zero marginal cost.

**All-time depth primary — adopt glm-4.7 on Cerebras (Provider B) once Cerebras billing is restored.** On the same article + prompt it extracted **3,590** faithful facts (~11× gpt-oss, ~7× the Codex runs) at 69.6% anchoring — the depth ceiling measured. Keep it as the heavy-extraction engine when its account is live; it keeps the exact glm-4.7 model donto already runs, on faster hardware, off the z.ai subsidy.

**Secondary tier — gpt-oss-120b on Cerebras (A).** Fast/cheap recall-floor when Cerebras is live, but ~9% of B's yield and the lowest anchoring (47.2%); a coverage floor, not the main engine.

**Incumbent (glm-4.7 @ z.ai, C) — fallback only, currently dead.** Hard-capped now (1310, reset 2026-06-10 16:43:35); TOS-risky; expiring subsidy. Migrate off it.

**Action items:**

1. **Run the genealogy/memory engines on the Codex CLI path now** (it is the only live route): wire `donto-extract`'s swappable provider to `codex exec -c model=gpt-5.3-codex-spark` (default) / `gpt-5.4` (depth) on ChatGPT-Pro auth; remember the **sunset-default workaround** (always pass `-c model=`, never the default `gpt-5.3-codex`).
2. **Restore Cerebras billing**, then re-run B fresh into a clean context to confirm the 3,590-fact result reproduces (current B figure is a standing prior run).
3. **After 2026-06-10 16:43:35**, run `run_glm_zai.py` to populate C and finally measure the same-model-different-hardware axis (B vs C).
4. **Treat the harness as a first-class swappable abstraction** in `donto-extract` (OpenCode *and* Codex CLI), so capping one account or one harness never stops all extraction — exactly the resilience this session demonstrated.
5. **Fix the OpenCode slot-pool acquire loop** to scan a dynamically-sized pool, not `range(MAX_CONCURRENT)` (§2.2).
6. **Pin per-token pricing** for preview `zai-glm-4.7` on Cerebras with a billed run (§6).

---

## 8. Honest limits

- **n = 1.** A single article (23778). Provider-quality conclusions are **directional, not statistically robust** — re-run across a corpus before committing.
- **Two harnesses, not one.** A–C ran through **OpenCode** (container); D–E through the **Codex CLI** (host, ChatGPT Pro, no API credit). Prompt, fact shape, and ingest path are identical → extraction-quality numbers are comparable; **wall-time and "passes" are NOT a like-for-like harness speed test** (different tool-call overhead and loop heuristics).
- **Incomplete OpenCode comparison.** Only **A** has a fresh same-session OpenCode run. **B**'s 3,590 is a **prior** successful run (fresh re-run hit 402); **C** has **zero** data (1310, reset 2026-06-10). The same-model-different-hardware axis (B vs C) **could not be measured.**
- **Both per-token accounts capped now.** Cerebras = account-wide `payment_required`/402 (both keys, proxy + direct); z.ai = code 1310 weekly/monthly (host + container keys, same reset = one account). The Codex/ChatGPT-Pro path is the only live route at writing.
- **Codex model-default workaround.** Codex's default `gpt-5.3-codex` is **sunset for ChatGPT accounts (400)**; every D/E invocation passes `-c model=` with a supported id (`gpt-5.4` for D, `gpt-5.3-codex-spark` for E). Without it, the Codex path returns 0 facts for an account-policy reason, not a model-quality one.
- **One faithful Codex-spark artifact.** Predicate `damagedCattle?` carries a trailing `?` exactly as the model emitted it — preserved, not normalised; it is the single non-clean distinct predicate in E (90.1% vs a lenient-regex 193/213).
- **Preview-model dependency (B).** `zai-glm-4.7` on Cerebras is preview and **requires** the sanitizing proxy (172.18.0.1:8089) — a hard runtime dependency; verified up this session.
- **Anchoring method.** Anchored % = distinct live statements with ≥1 `donto_evidence_link` over live statements. The Codex runs' own ingest meta (D: 391 anchors / 511; E: 368 / 426) **matches** the DB-side 76.5% / 86.4%, cross-validating the method.
- **Excluded empty contexts.** The recent "clean" B re-run wrote `ctx:test/cerebras-glm/23778-clean` (0 rows, cap fast-fail) — excluded; C's `ctx:test/zai-glm/23778` is 0 rows (never executed).
- **I3 honored.** Read-only `SELECT`s against `donto_statement`; the two new Codex contexts are inserts-only (D 511/511 live, E 426/426 live, 0 closed/retracted/superseded); no retract/supersede/delete anywhere; empty target contexts left untouched.

---

*Measured on `donto-db` (apex-494316), 2026-06-04. Counts from live `donto_statement` (`upper(tx_time) IS NULL`); anchoring from `donto_evidence_link`. Caps probed live this session. Mechanisms: production `opencode_extract` multi-pass controller over `OpenCodeAgent` with broad lens prompt `extract_broad.txt` (A/B/C, `OPENCODE_MAX_CONCURRENT=16`); the Codex CLI (`codex exec`, codex-cli 0.130.0, ChatGPT Pro, host) with the same prompt + `{s,p,o,a,c,h}` fact shape + the same parse/ingest path (D/E). Codex run script: `/tmp/cerebras-test/run_codex_extract.py`.*