A measured, single-article comparison of five LLM extraction back-ends — across TWO different agentic harnesses (OpenCode and the Codex CLI) — driving donto's real lens-sweep controller. Goal: pick a cheaper, faster, available extraction provider for the months ahead without losing faithfulness. 2026-06-04.
What this is. donto's extraction front door currently runs GLM-4.7 via z.ai's "coding" subscription through an agentic OpenCode driver. That path is fast enough but it is an expiring subsidy, TOS-risky for non-coding use, and rate-capped. This report tests four alternatives against the incumbent, on the same article, with the same lens-sweep prompt and ingest path, reading results out of live
donto_statement. Three (A, B, C) ran through OpenCode inside the omega container; two (D, E) ran through the Codex CLI on the host on the user's ChatGPT Pro subscription — a different agentic harness, no API credit. Every number below is from the production box on 2026-06-04, not estimated. Where a provider could not run, that is stated plainly and the cap was probed live, not assumed. As of this writing, both Cerebras and z.ai are quota-exhausted, and the ChatGPT-Pro Codex path is the only live extraction route.
donto is a bitemporal, paraconsistent,
evidence-first knowledge substrate built for the age of
generative abundance: generation of typed claims is now cheap, so the
engine's job is maximal faithful capture — emit
free/untyped, multi-directional, evidence-anchored claims and defer
typing/alignment/identity/joining to query time. The extraction engine
that feeds it drives an agentic CLI over an LLM
provider, sweeping a document through a broad lens prompt across
multiple passes ("loop until dry"), then ingests the parsed facts as
anchored statements. In this experiment two things
vary: (1) which model on which hardware does the
inference, and (2) which agentic harness drives the loop —
OpenCode (A/B/C) or the Codex CLI (D/E). Everything else — the article,
the lens-sweep prompt, the multi-pass "loop until dry" mechanism, the
{s,p,o,a,c,h} fact shape, the parse/normalise/ingest path,
and the live-DB measurement — is held fixed, so the five contexts are
directly comparable.
The incumbent extraction provider (GLM-4.7 on z.ai's coding subscription) has three problems we want to engineer away:
Cerebras was the first candidate: wafer-scale
inference is the fastest tokens/sec generally available, and it hosts
both a small open model (gpt-oss-120b) and — in preview
— GLM-4.7 itself (zai-glm-4.7). That
second fact made a clean OpenCode-internal three-way design possible
(model effect on identical hardware; hardware effect on the identical
model).
But both Cerebras and z.ai are now quota-exhausted
(§4, §5). So this iteration adds a second harness and a second
account entirely: the Codex CLI
(codex-cli 0.130.0) running OpenAI models on the user's
ChatGPT Pro subscription — which uses no API
credit and is not subject to either capped
account. That gives two more live providers:
zai-glm-4.7) @ Cerebras —
the incumbent model, on the challenger hardware,
preview (via OpenCode).What the five-way design now isolates:
| Comparison | Holds fixed | Varies | What it isolates |
|---|---|---|---|
| A vs B | OpenCode harness, article, hardware (Cerebras) | the model (gpt-oss-120b vs glm-4.7) | model effect on identical hardware |
| B vs C | model (glm-4.7), OpenCode harness, article | the hardware/provider (Cerebras vs z.ai) | hardware/provider effect on the identical model |
| D vs E | Codex CLI harness, article, ChatGPT-Pro account | the OpenAI model (gpt-5.4 vs gpt-5.3-codex-spark) | depth-vs-speed within the live Codex path |
| (A–C) vs (D–E) | article, lens prompt, fact shape, ingest path | the agentic harness + provider account | harness/availability effect — the only currently-live route |
Standing up Cerebras behind donto's OpenCode driver surfaced two distinct, non-obvious failures. Both are documented here because they will recur for anyone wiring an agentic CLI to a reasoning-model endpoint. Neither applies to the Codex CLI path (D/E), which runs on the host with its own shell tool and its own multi-turn loop — see §2.3.
reasoning_content echo → Cerebras HTTP 400
(FIXED)OpenCode (v1.15.13) uses the Vercel AI-SDK, which, on a multi-turn
agentic loop, echoes the assistant's prior
reasoning_content / reasoning fields back into
the next request's message array. Cerebras's chat-completions
endpoint rejects those fields on inbound requests with an HTTP
400, so the agent loop died on turn two and produced
0 facts. This is purely an integration artifact — the
model is fine; the SDK is replaying a field the upstream won't accept on
input.
Fix: a tiny sanitizing
reverse-proxy in front of Cerebras
(/mnt/donto-data/workspace/donto-align/cerebras_proxy.py,
listening on 172.18.0.1:8089). It forwards chat-completions
to api.cerebras.ai but strips
reasoning_content/reasoning from every message
in the outbound request body before relaying. OpenCode points
its provider.cerebras.options.baseURL at the proxy instead
of the upstream. With the proxy in place the agent loop runs to
completion. (For the preview zai-glm-4.7 model the proxy is
a hard runtime dependency — kill the proxy and B fails
even with billing intact. The proxy was verified up, HTTP 200 on
/v1/models, throughout this session.)
donto bounds concurrent OpenCode subprocesses host-wide with a
flock over OPENCODE_MAX_CONCURRENT slot files.
The acquire loop only ever looks at slot indices
0 .. MAX_CONCURRENT-1. The ~10 production
frontier-extraction jobs already running hold a set of slot files; if a
benchmark process is launched with a lower
MAX_CONCURRENT than the number of busy slots, it iterates
only the low indices, finds them all held, and never even checks
the free higher-numbered slot files — it deadlocks behind
production instead of waiting for a genuinely free slot.
Work-around (per the run protocol): set
OPENCODE_MAX_CONCURRENT=16 on every
benchmark invocation so the acquire loop scans all slot files
and waits for a free one. Benchmark extractions were run
sequentially as root. This is a
work-around, not a fix; the underlying loop should scan a
dynamically-sized pool, not a fixed range. (D/E avoid this
entirely — the Codex CLI does not touch the OpenCode slot
pool.)
Providers D and E do not use OpenCode at all. They
run the Codex CLI (codex-cli 0.130.0)
headless on the host under the user's ChatGPT Pro
auth:
codex exec --dangerously-bypass-approvals-and-sandbox -C <run_dir> -c model="<MODEL>" "<lens-sweep prompt>"
extract_broad.txt
brief (plus a short preamble pinning it to the OUTPUT
MECHANISM), it reads source.txt, then
cat >> facts.jsonl <<'JSONL_EOF' … JSONL_EOF-appends
batches of compact JSONL across multiple passes, re-scanning the source
each pass, until it self-judges the source exhausted — i.e. it performs
donto's "loop until dry" natively.gpt-5.3-codex is SUNSET for ChatGPT accounts
(HTTP 400 "model not supported"), so every invocation must pass
-c model= with a supported id. Verified-working supported
ids: gpt-5.4, gpt-5.4-mini,
gpt-5.3-codex-spark (Pro), gpt-5.5. We used
gpt-5.4 for D and
gpt-5.3-codex-spark for E.facts.jsonl is parsed and ingested by the
exact same path as the OpenCode providers —
opencode_extract._parse_jsonl /
_normalize_fact / _valid / _key,
then helpers.register_source_document +
helpers.ingest_facts — so the output is directly comparable
and anchored to a registered source revision just like A/B. Run script:
/tmp/cerebras-test/run_codex_extract.py.Harness caveat (faithful representation). D and E are a different agentic harness on a different account than A–C. The prompt, fact shape, and ingest path are identical, so the extraction quality numbers are comparable — but wall-time and "passes" are not strictly apples-to-apples across harnesses (different tool-call overhead, different loop heuristics, host vs container). Read D/E wall-times as Codex-CLI figures, not as a like-for-like speed test against OpenCode.
| Component | Value |
|---|---|
| Article | frontier EntryId 23778 — "Attack on Aboriginal
people — Bundamba Lagoon (August 1860)", ~14,600 chars. Reused
verbatim: /tmp/cerebras-test/source.txt. |
| Prompt | the broad lens-sweep prompts/extract_broad.txt,
identical for all five (D/E prepend only a short harness preamble that
restates the prompt's own OUTPUT MECHANISM). |
| Controllers | A/B/C: production multi-pass "loop-until-dry" driver
opencode_extract.extract_facts_opencode over
OpenCodeAgent (headless OpenCode). D/E: the Codex
CLI's own shell-tool + multi-turn loop
(codex exec, host). |
| Ingest | the normal donto-api path for all five: register
source document + revision, parse {s,p,o,a,c,h} via
opencode_extract._parse_jsonl, ingest facts, attach
evidence spans. |
| Measurement | counts read from live donto_statement
(rows where upper(tx_time) IS NULL); anchoring from
donto_evidence_link. Spot-checked 2026-06-04. |
| Contexts | A → ctx:test/cerebras-gptoss/23778; B →
ctx:test/cerebras-glm/23778; C →
ctx:test/zai-glm/23778; D →
ctx:test/codex-normal/23778; E →
ctx:test/codex-spark/23778. |
Provider configs: A/B point OpenCode's
cerebras provider at the sanitizing proxy
(baseURL=http://172.18.0.1:8089/v1), key from
/etc/donto/cerebras.env; C uses the
unchanged production z.ai config
(https://api.z.ai/api/coding/paas/v4, model
glm-4.7, GLM_API_KEY); D/E
use the Codex CLI on host ChatGPT-Pro auth (no API key/credit),
codex exec … -c model=gpt-5.4 /
gpt-5.3-codex-spark.
donto_statement,
spot-checked 2026-06-04)| Axis | (A) gpt-oss-120b @ Cerebras | (B) glm-4.7 @ Cerebras | (C) glm-4.7 @ z.ai (INCUMBENT) | (D) codex-normal gpt-5.4 | (E) codex-spark gpt-5.3-codex-spark |
|---|---|---|---|---|---|
| Harness | OpenCode (container) | OpenCode (container) | OpenCode (container) | Codex CLI (host) | Codex CLI (host) |
| Account / billing | Cerebras PAYG | Cerebras PAYG (preview) | z.ai coding sub | ChatGPT Pro (no API credit) | ChatGPT Pro (no API credit) |
| Live context | …/cerebras-gptoss/23778 |
…/cerebras-glm/23778 |
…/zai-glm/23778 |
…/codex-normal/23778 |
…/codex-spark/23778 |
| Status | ✅ real run | ✅ real run (prior); fresh re-run BLOCKED | ❌ blocked — 0 facts | ✅ real run, LIVE path | ✅ real run, LIVE path |
| Live facts | 320 | 3,590 | 0 | 511 | 426 |
| Distinct subjects | 145 | 486 | 0 | 119 | 78 |
| Distinct predicates | 95 | 1,852 | 0 | 292 | 213 |
| Anchored (≥1 evidence_link) | 151 / 320 = 47.2% | 2,498 / 3,590 = 69.6% | — | 391 / 511 = 76.5% | 368 / 426 = 86.4% |
| Object split (IRI / literal) | 141 / 179 | 748 / 2,842 | — | 265 / 246 | 162 / 264 |
| Predicate-style hygiene | 79/93 bare-camel ≈ 83.2%; 2 vocab
(rdf:type,rdfs:label); 0 kebab/space |
≈89.7% camel among bare; 674 :-prefixed minted preds
dilute all-pred camel to 56.9% |
n/a | 273/292 = 93.5% camelCase; 2 vocab
(rdf:type,rdfs:label); 0 kebab, 0
clause/spaced |
90.1% camelCase (192/213 distinct); 0 kebab, 0
clause/spaced; one faithful artifact damagedCattle?
(trailing ? preserved as emitted) |
| Source attribution | edge-style (reportedIn/attestedBy) |
edge-style + some source-baked preds | n/a | clean edges
(reportedIn,attestedBy,accordingTo)
not baked into predicate names |
clean edges — e.g. 6+ distinct
attestedBy edges on the event |
| JSONL cleanliness | clean | clean | n/a | clean | clean |
| Controller / loop | toolloop fired; 3 passes (196→303→320) | toolloop fired; multi-pass | did not fire | toolloop fired; multi-pass, self-judged done | toolloop fired; multi-pass (58→…→432 raw → 426 ingested), self-judged done; retry-on-empty NOT needed |
| Wall time | 71.8 s | ~prior run; re-run fast-fails at cap | 0 s | 461.1 s (104k tokens) | 60.3 s (89k tokens) |
| Block reason | — | HTTP 402 payment_required — Cerebras
account-wide cap (proxy + direct, ~0.14 s) |
z.ai code 1310 "Weekly/Monthly Limit Exhausted, reset 2026-06-10 16:43:35" | — | — |
Note on E's camel-%. A strict classifier that rejects the trailing
?counts 192/213 = 90.1% camelCase (thedamagedCattle?artifact is the one excluded). A lenient regex that tolerates the?counts 193/213. Either way, 0 kebab and 0 clause/spaced predicates — strong style discipline, the artifact faithfully preserved rather than silently normalised.
B's 1,852 distinct predicates are inflated by 674
:-prefixed, clause-style minted predicates —
high-resolution but ballooning the raw count (all-pred camel diluted to
56.9%; ≈89.7% among un-prefixed). The Codex runs (D/E) show the
opposite, tighter profile: 292 / 213 distinct
predicates, 0 clause-style and 0 kebab, 93.5% / 90.1%
camelCase, and source attribution modeled as clean
edges (reportedIn, attestedBy,
accordingTo) rather than baked into predicate names —
exactly what extract_broad.txt asks for. Either profile is
aligned at query time by the substrate's alignment
engine, not a static map (CLAUDE.md no-brittle-logic
rule); but the Codex output needs less query-time
predicate-folding to be load-bearing.
FAITHFUL on the live data. The headline change since the 3-way run: with Cerebras (402) and z.ai (1310) BOTH quota-exhausted, the ChatGPT-Pro Codex CLI is now the only live extraction path — and it works well.
damagedCattle?, trailing ? preserved as
emitted) — recorded, not hidden.retries=0.payment_required/402 (both keys, proxy +
direct, ~0.14 s); z.ai = code 1310 weekly/monthly cap (host + container
keys, identical reset = one account). Neither is a slot deadlock, proxy
fault, or code bug.SELECT only. The two new Codex contexts are
inserts-only: D = 511 rows, all 511 live, 0
closed/retracted/superseded; E = 426 rows, all 426 live, 0
closed/retracted/superseded. No retract / supersede / delete touched
donto_statement anywhere.donto's extraction cost is dominated by tokens generated per document × documents per day, and by which accounts are not capped.
| Provider / plan | Shape | Indicative rate | Fit for donto extraction |
|---|---|---|---|
| Cerebras PAYG (gpt-oss-120b) | per-token | low-single-digit $ / Mtok | best $/fact for shallow sweeps — currently 402-capped |
Cerebras PAYG (zai-glm-4.7,
preview) |
per-token | preview pricing; verify | best depth/$ if rate reasonable; needs sanitizing proxy — currently 402-capped |
| z.ai GLM coding subscription (incumbent) | flat-rate | fixed $ / month | in use; weekly/monthly cap stops all extraction (1310, reset 2026-06-10); TOS-risky; expiring subsidy |
| ChatGPT Pro — Codex CLI / gpt-5.3-codex-spark (E) | flat-rate Pro sub | $0 marginal — no API credit | the only live path now: fast (60 s), high-anchor (86.4%), clean predicates; Cerebras-accelerated OpenAI codex on an already-paid Pro sub |
| ChatGPT Pro — Codex CLI / gpt-5.4 (D) | flat-rate Pro sub | $0 marginal — no API credit | live, deeper/slower (511 facts, 461 s); use when depth matters more than wall-time |
The economic story has shifted. The all-time depth argument (B's ~11× over A) still holds when Cerebras billing is live — but right now both per-token providers are capped, and the ChatGPT-Pro Codex path costs no marginal API credit (it draws on a subscription the user already pays for). Among runnable providers, codex-spark (E) gives the best speed × anchoring × cleanliness at zero marginal cost; codex-normal (D) trades ~7.6× wall-time for more depth. The flat-rate posture means it does not meter per fact — attractive for steady volume, subject to whatever throughput limits the Pro sub enforces.
Caveat: exact $/Mtok for the preview
zai-glm-4.7 on Cerebras was never pinned (account in
payment_required); the Codex path has no per-token meter to
pin, only the Pro subscription's own usage limits.
Live-now primary — use the Codex CLI on ChatGPT Pro as the working extraction path while Cerebras and z.ai are capped. It is the only runnable route today, costs no marginal API credit, and on this article it produced the best-anchored, cleanest-predicate extractions in the whole field.
gpt-5.3-codex-spark). Cerebras-accelerated OpenAI
codex via the ChatGPT-Pro sub: 60.3 s, 426 facts,
86.4% anchored (the highest in the field), 90.1% camel,
0 kebab/clause. ~7.6× faster than gpt-5.4 for ~83% of the facts at
higher anchor fidelity. The default live engine.gpt-5.4). Deeper and broader (511 facts / 119
subjects / 292 predicates, 76.5% anchored, 93.5% camel) at ~7.6× the
wall-time (461 s). Use it when coverage matters more than latency on a
given document, still at zero marginal cost.All-time depth primary — adopt glm-4.7 on Cerebras (Provider B) once Cerebras billing is restored. On the same article + prompt it extracted 3,590 faithful facts (~11× gpt-oss, ~7× the Codex runs) at 69.6% anchoring — the depth ceiling measured. Keep it as the heavy-extraction engine when its account is live; it keeps the exact glm-4.7 model donto already runs, on faster hardware, off the z.ai subsidy.
Secondary tier — gpt-oss-120b on Cerebras (A). Fast/cheap recall-floor when Cerebras is live, but ~9% of B's yield and the lowest anchoring (47.2%); a coverage floor, not the main engine.
Incumbent (glm-4.7 @ z.ai, C) — fallback only, currently dead. Hard-capped now (1310, reset 2026-06-10 16:43:35); TOS-risky; expiring subsidy. Migrate off it.
Action items:
donto-extract's swappable provider to
codex exec -c model=gpt-5.3-codex-spark (default) /
gpt-5.4 (depth) on ChatGPT-Pro auth; remember the
sunset-default workaround (always pass
-c model=, never the default
gpt-5.3-codex).run_glm_zai.py to populate C and finally measure the
same-model-different-hardware axis (B vs C).donto-extract (OpenCode
and Codex CLI), so capping one account or one harness never
stops all extraction — exactly the resilience this session
demonstrated.range(MAX_CONCURRENT)
(§2.2).zai-glm-4.7 on Cerebras with a billed run (§6).payment_required/402 (both keys, proxy +
direct); z.ai = code 1310 weekly/monthly (host + container keys, same
reset = one account). The Codex/ChatGPT-Pro path is the only live route
at writing.gpt-5.3-codex is sunset for ChatGPT accounts
(400); every D/E invocation passes -c model= with
a supported id (gpt-5.4 for D,
gpt-5.3-codex-spark for E). Without it, the Codex path
returns 0 facts for an account-policy reason, not a model-quality
one.damagedCattle? carries a trailing ? exactly as
the model emitted it — preserved, not normalised; it is the single
non-clean distinct predicate in E (90.1% vs a lenient-regex
193/213).zai-glm-4.7 on Cerebras is preview and
requires the sanitizing proxy (172.18.0.1:8089) — a
hard runtime dependency; verified up this session.donto_evidence_link over live
statements. The Codex runs' own ingest meta (D: 391 anchors / 511; E:
368 / 426) matches the DB-side 76.5% / 86.4%,
cross-validating the method.ctx:test/cerebras-glm/23778-clean (0 rows, cap
fast-fail) — excluded; C's ctx:test/zai-glm/23778 is 0 rows
(never executed).SELECTs against
donto_statement; the two new Codex contexts are
inserts-only (D 511/511 live, E 426/426 live, 0
closed/retracted/superseded); no retract/supersede/delete anywhere;
empty target contexts left untouched.Measured on donto-db (apex-494316), 2026-06-04.
Counts from live donto_statement
(upper(tx_time) IS NULL); anchoring from
donto_evidence_link. Caps probed live this session.
Mechanisms: production opencode_extract multi-pass
controller over OpenCodeAgent with broad lens prompt
extract_broad.txt (A/B/C,
OPENCODE_MAX_CONCURRENT=16); the Codex CLI
(codex exec, codex-cli 0.130.0, ChatGPT Pro, host) with the
same prompt + {s,p,o,a,c,h} fact shape + the same
parse/ingest path (D/E). Codex run script:
/tmp/cerebras-test/run_codex_extract.py.