genes.apexpots.com / research source: donto-cerebras-bakeoff-2026-06-04.md

donto — Extraction-Provider Bake-Off: Cerebras vs. z.ai vs. Codex, a Faithful 5-Way Run

A measured, single-article comparison of five LLM extraction back-ends — across TWO different agentic harnesses (OpenCode and the Codex CLI) — driving donto's real lens-sweep controller. Goal: pick a cheaper, faster, available extraction provider for the months ahead without losing faithfulness. 2026-06-04.

What this is. donto's extraction front door currently runs GLM-4.7 via z.ai's "coding" subscription through an agentic OpenCode driver. That path is fast enough but it is an expiring subsidy, TOS-risky for non-coding use, and rate-capped. This report tests four alternatives against the incumbent, on the same article, with the same lens-sweep prompt and ingest path, reading results out of live donto_statement. Three (A, B, C) ran through OpenCode inside the omega container; two (D, E) ran through the Codex CLI on the host on the user's ChatGPT Pro subscription — a different agentic harness, no API credit. Every number below is from the production box on 2026-06-04, not estimated. Where a provider could not run, that is stated plainly and the cap was probed live, not assumed. As of this writing, both Cerebras and z.ai are quota-exhausted, and the ChatGPT-Pro Codex path is the only live extraction route.

0. One-paragraph orientation

donto is a bitemporal, paraconsistent, evidence-first knowledge substrate built for the age of generative abundance: generation of typed claims is now cheap, so the engine's job is maximal faithful capture — emit free/untyped, multi-directional, evidence-anchored claims and defer typing/alignment/identity/joining to query time. The extraction engine that feeds it drives an agentic CLI over an LLM provider, sweeping a document through a broad lens prompt across multiple passes ("loop until dry"), then ingests the parsed facts as anchored statements. In this experiment two things vary: (1) which model on which hardware does the inference, and (2) which agentic harness drives the loop — OpenCode (A/B/C) or the Codex CLI (D/E). Everything else — the article, the lens-sweep prompt, the multi-pass "loop until dry" mechanism, the {s,p,o,a,c,h} fact shape, the parse/normalise/ingest path, and the live-DB measurement — is held fixed, so the five contexts are directly comparable.

1. Why a provider bake-off, and why five-way

The incumbent extraction provider (GLM-4.7 on z.ai's coding subscription) has three problems we want to engineer away:

Cost trajectory. It is a flat-rate coding subscription used for non-coding extraction — an expiring subsidy, and per CLAUDE.md §4 a TOS-risk. We need a path we can run at volume for months.
Throughput / latency. Extraction is the bottleneck in every consumer (genealogy, memory). Faster inference directly multiplies how much of the firehose we can capture per day.
Hard caps. The subscription enforces weekly/monthly limits; when it caps, all extraction stops.

Cerebras was the first candidate: wafer-scale inference is the fastest tokens/sec generally available, and it hosts both a small open model (gpt-oss-120b) and — in preview — GLM-4.7 itself (zai-glm-4.7). That second fact made a clean OpenCode-internal three-way design possible (model effect on identical hardware; hardware effect on the identical model).

But both Cerebras and z.ai are now quota-exhausted (§4, §5). So this iteration adds a second harness and a second account entirely: the Codex CLI (codex-cli 0.130.0) running OpenAI models on the user's ChatGPT Pro subscription — which uses no API credit and is not subject to either capped account. That gives two more live providers:

(A) gpt-oss-120b @ Cerebras — small, fast, cheap open model on wafer-scale hardware (via OpenCode).
(B) glm-4.7 (zai-glm-4.7) @ Cerebras — the incumbent model, on the challenger hardware, preview (via OpenCode).
(C) glm-4.7 @ z.ai — the incumbent production path, unchanged (via OpenCode).
(D) codex-normal — gpt-5.4 via Codex CLI — OpenAI's deeper general model on ChatGPT Pro (host, no API credit).
(E) codex-spark — gpt-5.3-codex-spark via Codex CLI — OpenAI's Cerebras-accelerated codex model on ChatGPT Pro (host, no API credit).

What the five-way design now isolates:

Comparison	Holds fixed	Varies	What it isolates
A vs B	OpenCode harness, article, hardware (Cerebras)	the model (gpt-oss-120b vs glm-4.7)	model effect on identical hardware
B vs C	model (glm-4.7), OpenCode harness, article	the hardware/provider (Cerebras vs z.ai)	hardware/provider effect on the identical model
D vs E	Codex CLI harness, article, ChatGPT-Pro account	the OpenAI model (gpt-5.4 vs gpt-5.3-codex-spark)	depth-vs-speed within the live Codex path
(A–C) vs (D–E)	article, lens prompt, fact shape, ingest path	the agentic harness + provider account	harness/availability effect — the only currently-live route

2. The two integration problems we hit on the OpenCode/Cerebras path (and fixed / worked around)

Standing up Cerebras behind donto's OpenCode driver surfaced two distinct, non-obvious failures. Both are documented here because they will recur for anyone wiring an agentic CLI to a reasoning-model endpoint. Neither applies to the Codex CLI path (D/E), which runs on the host with its own shell tool and its own multi-turn loop — see §2.3.

2.1 BLOCKER — `reasoning_content` echo → Cerebras HTTP 400 (FIXED)

OpenCode (v1.15.13) uses the Vercel AI-SDK, which, on a multi-turn agentic loop, echoes the assistant's prior reasoning_content / reasoning fields back into the next request's message array. Cerebras's chat-completions endpoint rejects those fields on inbound requests with an HTTP 400, so the agent loop died on turn two and produced 0 facts. This is purely an integration artifact — the model is fine; the SDK is replaying a field the upstream won't accept on input.

Fix: a tiny sanitizing reverse-proxy in front of Cerebras (/mnt/donto-data/workspace/donto-align/cerebras_proxy.py, listening on 172.18.0.1:8089). It forwards chat-completions to api.cerebras.ai but strips reasoning_content/reasoning from every message in the outbound request body before relaying. OpenCode points its provider.cerebras.options.baseURL at the proxy instead of the upstream. With the proxy in place the agent loop runs to completion. (For the preview zai-glm-4.7 model the proxy is a hard runtime dependency — kill the proxy and B fails even with billing intact. The proxy was verified up, HTTP 200 on /v1/models, throughout this session.)

donto bounds concurrent OpenCode subprocesses host-wide with a flock over OPENCODE_MAX_CONCURRENT slot files. The acquire loop only ever looks at slot indices 0 .. MAX_CONCURRENT-1. The ~10 production frontier-extraction jobs already running hold a set of slot files; if a benchmark process is launched with a lower MAX_CONCURRENT than the number of busy slots, it iterates only the low indices, finds them all held, and never even checks the free higher-numbered slot files — it deadlocks behind production instead of waiting for a genuinely free slot.

Work-around (per the run protocol): set OPENCODE_MAX_CONCURRENT=16 on every benchmark invocation so the acquire loop scans all slot files and waits for a free one. Benchmark extractions were run sequentially as root. This is a work-around, not a fix; the underlying loop should scan a dynamically-sized pool, not a fixed range. (D/E avoid this entirely — the Codex CLI does not touch the OpenCode slot pool.)

2.3 The Codex CLI harness (D/E) — what differs

Providers D and E do not use OpenCode at all. They run the Codex CLI (codex-cli 0.130.0) headless on the host under the user's ChatGPT Pro auth:

codex exec --dangerously-bypass-approvals-and-sandbox -C <run_dir> -c model="<MODEL>" "<lens-sweep prompt>"

Codex has its own shell tool and its own multi-turn loop. Given the verbatim extract_broad.txt brief (plus a short preamble pinning it to the OUTPUT MECHANISM), it reads source.txt, then cat >> facts.jsonl <<'JSONL_EOF' … JSONL_EOF-appends batches of compact JSONL across multiple passes, re-scanning the source each pass, until it self-judges the source exhausted — i.e. it performs donto's "loop until dry" natively.
One required workaround: Codex's default model gpt-5.3-codex is SUNSET for ChatGPT accounts (HTTP 400 "model not supported"), so every invocation must pass -c model= with a supported id. Verified-working supported ids: gpt-5.4, gpt-5.4-mini, gpt-5.3-codex-spark (Pro), gpt-5.5. We used gpt-5.4 for D and gpt-5.3-codex-spark for E.
The resulting facts.jsonl is parsed and ingested by the exact same path as the OpenCode providers — opencode_extract._parse_jsonl / _normalize_fact / _valid / _key, then helpers.register_source_document + helpers.ingest_facts — so the output is directly comparable and anchored to a registered source revision just like A/B. Run script: /tmp/cerebras-test/run_codex_extract.py.

Harness caveat (faithful representation). D and E are a different agentic harness on a different account than A–C. The prompt, fact shape, and ingest path are identical, so the extraction quality numbers are comparable — but wall-time and "passes" are not strictly apples-to-apples across harnesses (different tool-call overhead, different loop heuristics, host vs container). Read D/E wall-times as Codex-CLI figures, not as a like-for-like speed test against OpenCode.

3. Setup

Component	Value
Article	frontier `EntryId 23778` — "Attack on Aboriginal people — Bundamba Lagoon (August 1860)", ~14,600 chars. Reused verbatim: `/tmp/cerebras-test/source.txt`.
Prompt	the broad lens-sweep `prompts/extract_broad.txt`, identical for all five (D/E prepend only a short harness preamble that restates the prompt's own OUTPUT MECHANISM).
Controllers	A/B/C: production multi-pass "loop-until-dry" driver `opencode_extract.extract_facts_opencode` over `OpenCodeAgent` (headless OpenCode). D/E: the Codex CLI's own shell-tool + multi-turn loop (`codex exec`, host).
Ingest	the normal donto-api path for all five: register source document + revision, parse `{s,p,o,a,c,h}` via `opencode_extract._parse_jsonl`, ingest facts, attach evidence spans.
Measurement	counts read from live `donto_statement` (rows where `upper(tx_time) IS NULL`); anchoring from `donto_evidence_link`. Spot-checked 2026-06-04.
Contexts	A → `ctx:test/cerebras-gptoss/23778`; B → `ctx:test/cerebras-glm/23778`; C → `ctx:test/zai-glm/23778`; D → `ctx:test/codex-normal/23778`; E → `ctx:test/codex-spark/23778`.

Provider configs: A/B point OpenCode's cerebras provider at the sanitizing proxy (baseURL=http://172.18.0.1:8089/v1), key from /etc/donto/cerebras.env; C uses the unchanged production z.ai config (https://api.z.ai/api/coding/paas/v4, model glm-4.7, GLM_API_KEY); D/E use the Codex CLI on host ChatGPT-Pro auth (no API key/credit), codex exec … -c model=gpt-5.4 / gpt-5.3-codex-spark.

4. Results

4.1 The 5-way table (all figures from live `donto_statement`, spot-checked 2026-06-04)

Axis	(A) gpt-oss-120b @ Cerebras	(B) glm-4.7 @ Cerebras	(C) glm-4.7 @ z.ai (INCUMBENT)	(D) codex-normal gpt-5.4	(E) codex-spark gpt-5.3-codex-spark
Harness	OpenCode (container)	OpenCode (container)	OpenCode (container)	Codex CLI (host)	Codex CLI (host)
Account / billing	Cerebras PAYG	Cerebras PAYG (preview)	z.ai coding sub	ChatGPT Pro (no API credit)	ChatGPT Pro (no API credit)
Live context	`…/cerebras-gptoss/23778`	`…/cerebras-glm/23778`	`…/zai-glm/23778`	`…/codex-normal/23778`	`…/codex-spark/23778`
Status	✅ real run	✅ real run (prior); fresh re-run BLOCKED	❌ blocked — 0 facts	✅ real run, LIVE path	✅ real run, LIVE path
Live facts	320	3,590	0	511	426
Distinct subjects	145	486	0	119	78
Distinct predicates	95	1,852	0	292	213
Anchored (≥1 evidence_link)	151 / 320 = 47.2%	2,498 / 3,590 = 69.6%	—	391 / 511 = 76.5%	368 / 426 = 86.4%
Object split (IRI / literal)	141 / 179	748 / 2,842	—	265 / 246	162 / 264
Predicate-style hygiene	79/93 bare-camel ≈ 83.2%; 2 vocab (`rdf:type`,`rdfs:label`); 0 kebab/space	≈89.7% camel among bare; 674 `:`-prefixed minted preds dilute all-pred camel to 56.9%	n/a	273/292 = 93.5% camelCase; 2 vocab (`rdf:type`,`rdfs:label`); 0 kebab, 0 clause/spaced	90.1% camelCase (192/213 distinct); 0 kebab, 0 clause/spaced; one faithful artifact `damagedCattle?` (trailing `?` preserved as emitted)
Source attribution	edge-style (`reportedIn`/`attestedBy`)	edge-style + some source-baked preds	n/a	clean edges (`reportedIn`,`attestedBy`,`accordingTo`) not baked into predicate names	clean edges — e.g. 6+ distinct `attestedBy` edges on the event
JSONL cleanliness	clean	clean	n/a	clean	clean
Controller / loop	toolloop fired; 3 passes (196→303→320)	toolloop fired; multi-pass	did not fire	toolloop fired; multi-pass, self-judged done	toolloop fired; multi-pass (58→…→432 raw → 426 ingested), self-judged done; retry-on-empty NOT needed
Wall time	71.8 s	~prior run; re-run fast-fails at cap	0 s	461.1 s (104k tokens)	60.3 s (89k tokens)
Block reason	—	HTTP 402 `payment_required` — Cerebras account-wide cap (proxy + direct, ~0.14 s)	z.ai code 1310 "Weekly/Monthly Limit Exhausted, reset 2026-06-10 16:43:35"	—	—

Note on E's camel-%. A strict classifier that rejects the trailing ? counts 192/213 = 90.1% camelCase (the damagedCattle? artifact is the one excluded). A lenient regex that tolerates the ? counts 193/213. Either way, 0 kebab and 0 clause/spaced predicates — strong style discipline, the artifact faithfully preserved rather than silently normalised.

4.2 The isolated axes

Model effect on Cerebras hardware (A vs B). glm-4.7 (B) extracts 11.2× more facts than gpt-oss-120b (A): 3,590 vs 320, higher anchoring (69.6% vs 47.2%), far more entities (486 vs 145). On identical hardware the bigger model is the depth winner by an order of magnitude. (Both currently blocked for fresh runs.)
Hardware/provider effect on glm-4.7 (B vs C). Still could not be measured head-to-head — C is z.ai-1310-capped (reset 2026-06-10) and B's fresh re-run hit Cerebras 402. Same-model-different-hardware remains open.
Depth vs speed within the live Codex path (D vs E). codex-spark (E) is ~7.6× faster than codex-normal (D) — 60.3 s vs 461.1 s — for ~83% of the facts (426 vs 511) at higher anchor fidelity (86.4% vs 76.5%) and a tighter, more focused entity set (78 vs 119 subjects). D goes deeper/broader (more facts, more subjects, more predicates) at ~7.6× the wall-time; E is the fast, high-anchor option. Both used clean edge-style source attribution and disciplined camelCase predicates.
Harness / availability effect ((A–C) vs (D–E)). With Cerebras (402) and z.ai (1310) both capped, the only currently-live extraction route is the Codex CLI on ChatGPT Pro. It is not the deepest path measured (B's 3,590 dwarfs everything), but among runnable-right-now providers it is the depth leader: D's 511 and E's 426 both exceed A's 320, at higher anchoring than any OpenCode run (76.5% / 86.4% vs A's 47.2% / B's 69.6%) and the cleanest predicate hygiene (93.5% / 90.1% camel, 0 kebab/clause).

4.3 On predicate counts and the abundance signature

B's 1,852 distinct predicates are inflated by 674 :-prefixed, clause-style minted predicates — high-resolution but ballooning the raw count (all-pred camel diluted to 56.9%; ≈89.7% among un-prefixed). The Codex runs (D/E) show the opposite, tighter profile: 292 / 213 distinct predicates, 0 clause-style and 0 kebab, 93.5% / 90.1% camelCase, and source attribution modeled as clean edges (reportedIn, attestedBy, accordingTo) rather than baked into predicate names — exactly what extract_broad.txt asks for. Either profile is aligned at query time by the substrate's alignment engine, not a static map (CLAUDE.md no-brittle-logic rule); but the Codex output needs less query-time predicate-folding to be load-bearing.

5. Quality verdict

FAITHFUL on the live data. The headline change since the 3-way run: with Cerebras (402) and z.ai (1310) BOTH quota-exhausted, the ChatGPT-Pro Codex CLI is now the only live extraction path — and it works well.

Depth (all-time). (B) glm-4.7 @ Cerebras remains the depth champion — 3,590 facts — but it is not runnable right now (402). Among currently-live providers, the Codex runs lead: (D) gpt-5.4 = 511, (E) gpt-5.3-codex-spark = 426, both above (A)'s 320; (C) is zero (never executed).
Anchoring. The Codex runs are the best-anchored of the whole field: (E) 86.4% and (D) 76.5%, beating (B) 69.6% and (A) 47.2%. Higher anchoring means more facts carry a retrievable evidence span — directly the donto evidence-first goal.
Cleanliness / style. All live artifacts are clean (0 empty objects, 0 JSON-leak). The Codex runs have the cleanest predicate hygiene (93.5% / 90.1% camel, 0 kebab, 0 clause/spaced) and model source attribution as clean edges, not source-baked predicates. One faithful artifact in E (damagedCattle?, trailing ? preserved as emitted) — recorded, not hidden.
Did each faithfully drive the loop? (A) yes (3 passes). (B) yes on its standing artifact (multi-pass, 69.6% anchored). (C) no — provider returns 1310 before any completion (true cap failure, not a code bug). (D) yes — Codex toolloop fired, multi-pass, self-judged done, exit 0, 461.1 s, 104k tokens. (E) yes — Codex toolloop fired, multi-pass (58→…→432 raw lines → 426 ingested), self-judged done, exit 0, 60.3 s, 89k tokens; the codex-cli retry-on-empty path was NOT needed — genuine extraction on the first attempt, retries=0.
Harness caveat stated plainly. D and E run via the Codex CLI on the host on ChatGPT Pro (no API credit) — a different agentic harness and account than the OpenCode-driven A/B/C. Same prompt + fact shape + ingest path → extraction-quality numbers are comparable; wall-time/passes are Codex-CLI figures, not a like-for-like harness speed test.
Caps verified live, this session. Cerebras = account-wide payment_required/402 (both keys, proxy + direct, ~0.14 s); z.ai = code 1310 weekly/monthly cap (host + container keys, identical reset = one account). Neither is a slot deadlock, proxy fault, or code bug.
I3 honored. All five contexts verified by SELECT only. The two new Codex contexts are inserts-only: D = 511 rows, all 511 live, 0 closed/retracted/superseded; E = 426 rows, all 426 live, 0 closed/retracted/superseded. No retract / supersede / delete touched donto_statement anywhere.

6. Economics

donto's extraction cost is dominated by tokens generated per document × documents per day, and by which accounts are not capped.

Provider / plan	Shape	Indicative rate	Fit for donto extraction
Cerebras PAYG (gpt-oss-120b)	per-token	low-single-digit $ / Mtok	best $/fact for shallow sweeps — currently 402-capped
Cerebras PAYG (`zai-glm-4.7`, preview)	per-token	preview pricing; verify	best depth/$ if rate reasonable; needs sanitizing proxy — currently 402-capped
z.ai GLM coding subscription (incumbent)	flat-rate	fixed $ / month	in use; weekly/monthly cap stops all extraction (1310, reset 2026-06-10); TOS-risky; expiring subsidy
ChatGPT Pro — Codex CLI / gpt-5.3-codex-spark (E)	flat-rate Pro sub	$0 marginal — no API credit	the only live path now: fast (60 s), high-anchor (86.4%), clean predicates; Cerebras-accelerated OpenAI codex on an already-paid Pro sub
ChatGPT Pro — Codex CLI / gpt-5.4 (D)	flat-rate Pro sub	$0 marginal — no API credit	live, deeper/slower (511 facts, 461 s); use when depth matters more than wall-time

The economic story has shifted. The all-time depth argument (B's ~11× over A) still holds when Cerebras billing is live — but right now both per-token providers are capped, and the ChatGPT-Pro Codex path costs no marginal API credit (it draws on a subscription the user already pays for). Among runnable providers, codex-spark (E) gives the best speed × anchoring × cleanliness at zero marginal cost; codex-normal (D) trades ~7.6× wall-time for more depth. The flat-rate posture means it does not meter per fact — attractive for steady volume, subject to whatever throughput limits the Pro sub enforces.

Caveat: exact $/Mtok for the preview zai-glm-4.7 on Cerebras was never pinned (account in payment_required); the Codex path has no per-token meter to pin, only the Pro subscription's own usage limits.

7. Recommendation

Live-now primary — use the Codex CLI on ChatGPT Pro as the working extraction path while Cerebras and z.ai are capped. It is the only runnable route today, costs no marginal API credit, and on this article it produced the best-anchored, cleanest-predicate extractions in the whole field.

For throughput now → (E) codex-spark (gpt-5.3-codex-spark). Cerebras-accelerated OpenAI codex via the ChatGPT-Pro sub: 60.3 s, 426 facts, 86.4% anchored (the highest in the field), 90.1% camel, 0 kebab/clause. ~7.6× faster than gpt-5.4 for ~83% of the facts at higher anchor fidelity. The default live engine.
For depth now → (D) codex-normal (gpt-5.4). Deeper and broader (511 facts / 119 subjects / 292 predicates, 76.5% anchored, 93.5% camel) at ~7.6× the wall-time (461 s). Use it when coverage matters more than latency on a given document, still at zero marginal cost.

All-time depth primary — adopt glm-4.7 on Cerebras (Provider B) once Cerebras billing is restored. On the same article + prompt it extracted 3,590 faithful facts (~11× gpt-oss, ~7× the Codex runs) at 69.6% anchoring — the depth ceiling measured. Keep it as the heavy-extraction engine when its account is live; it keeps the exact glm-4.7 model donto already runs, on faster hardware, off the z.ai subsidy.

Secondary tier — gpt-oss-120b on Cerebras (A). Fast/cheap recall-floor when Cerebras is live, but ~9% of B's yield and the lowest anchoring (47.2%); a coverage floor, not the main engine.

Incumbent (glm-4.7 @ z.ai, C) — fallback only, currently dead. Hard-capped now (1310, reset 2026-06-10 16:43:35); TOS-risky; expiring subsidy. Migrate off it.

Action items:

Run the genealogy/memory engines on the Codex CLI path now (it is the only live route): wire donto-extract's swappable provider to codex exec -c model=gpt-5.3-codex-spark (default) / gpt-5.4 (depth) on ChatGPT-Pro auth; remember the sunset-default workaround (always pass -c model=, never the default gpt-5.3-codex).
Restore Cerebras billing, then re-run B fresh into a clean context to confirm the 3,590-fact result reproduces (current B figure is a standing prior run).
After 2026-06-10 16:43:35, run run_glm_zai.py to populate C and finally measure the same-model-different-hardware axis (B vs C).
Treat the harness as a first-class swappable abstraction in donto-extract (OpenCode and Codex CLI), so capping one account or one harness never stops all extraction — exactly the resilience this session demonstrated.
Fix the OpenCode slot-pool acquire loop to scan a dynamically-sized pool, not range(MAX_CONCURRENT) (§2.2).
Pin per-token pricing for preview zai-glm-4.7 on Cerebras with a billed run (§6).

8. Honest limits

n = 1. A single article (23778). Provider-quality conclusions are directional, not statistically robust — re-run across a corpus before committing.
Two harnesses, not one. A–C ran through OpenCode (container); D–E through the Codex CLI (host, ChatGPT Pro, no API credit). Prompt, fact shape, and ingest path are identical → extraction-quality numbers are comparable; wall-time and "passes" are NOT a like-for-like harness speed test (different tool-call overhead and loop heuristics).
Incomplete OpenCode comparison. Only A has a fresh same-session OpenCode run. B's 3,590 is a prior successful run (fresh re-run hit 402); C has zero data (1310, reset 2026-06-10). The same-model-different-hardware axis (B vs C) could not be measured.
Both per-token accounts capped now. Cerebras = account-wide payment_required/402 (both keys, proxy + direct); z.ai = code 1310 weekly/monthly (host + container keys, same reset = one account). The Codex/ChatGPT-Pro path is the only live route at writing.
Codex model-default workaround. Codex's default gpt-5.3-codex is sunset for ChatGPT accounts (400); every D/E invocation passes -c model= with a supported id (gpt-5.4 for D, gpt-5.3-codex-spark for E). Without it, the Codex path returns 0 facts for an account-policy reason, not a model-quality one.
One faithful Codex-spark artifact. Predicate damagedCattle? carries a trailing ? exactly as the model emitted it — preserved, not normalised; it is the single non-clean distinct predicate in E (90.1% vs a lenient-regex 193/213).
Preview-model dependency (B). zai-glm-4.7 on Cerebras is preview and requires the sanitizing proxy (172.18.0.1:8089) — a hard runtime dependency; verified up this session.
Anchoring method. Anchored % = distinct live statements with ≥1 donto_evidence_link over live statements. The Codex runs' own ingest meta (D: 391 anchors / 511; E: 368 / 426) matches the DB-side 76.5% / 86.4%, cross-validating the method.
Excluded empty contexts. The recent "clean" B re-run wrote ctx:test/cerebras-glm/23778-clean (0 rows, cap fast-fail) — excluded; C's ctx:test/zai-glm/23778 is 0 rows (never executed).
I3 honored. Read-only SELECTs against donto_statement; the two new Codex contexts are inserts-only (D 511/511 live, E 426/426 live, 0 closed/retracted/superseded); no retract/supersede/delete anywhere; empty target contexts left untouched.

Measured on donto-db (apex-494316), 2026-06-04. Counts from live donto_statement (upper(tx_time) IS NULL); anchoring from donto_evidence_link. Caps probed live this session. Mechanisms: production opencode_extract multi-pass controller over OpenCodeAgent with broad lens prompt extract_broad.txt (A/B/C, OPENCODE_MAX_CONCURRENT=16); the Codex CLI (codex exec, codex-cli 0.130.0, ChatGPT Pro, host) with the same prompt + {s,p,o,a,c,h} fact shape + the same parse/ingest path (D/E). Codex run script: /tmp/cerebras-test/run_codex_extract.py.