# Claim-Substrate Report — Research Appendix (raw findings)
_Companion to the iteration-3 report. Structured output of the 5-area study (2026-06-02)._

---

## resume-job-skills-intelligence

The skills-intelligence market runs on a handful of large public skill taxonomies that all model skills as graphs of related nodes, exactly the structure a claim substrate would emit. Sizes: ESCO (EU) ~14,575 skills/competences linked to 3,039 occupations across 28 languages; Lightcast Open Skills ~32,000-34,000 skills in 31 categories, refreshed every 2 weeks from 1B+ job postings; LinkedIn Skills Graph ~39,000 skills, 374,000 aliases, 200,000+ explicit skill-to-skill links ("knowledge lineages"), spanning 875M people and 59M companies; O*NET (US gov, free) 1,016 occupations with ~35 skills + 277 descriptors per the Content Model; SkillsFuture Singapore ~11,000 skill competencies across 34 frameworks/38 sectors. Crucially these are skill-to-skill relationship graphs (prerequisite, alias, adjacency), not flat lists, and the newest research (Skill-LLM, SkiLLMo, "LLMs as zero-shot ESCO matchers", ESCOX) uses LLMs to EXTRACT and NORMALIZE skill claims from free text against ESCO with ~79-91% precision. That extraction step IS donto's typed-claim ingestion in another vocabulary.

The commercial layer is large and consolidating around "talent intelligence" / knowledge graphs. Eightfold AI: ~$96.6M ARR 2024 (up from $58.8M 2023), ~$2.1B valuation, $410M raised, pitches an AI "Talent Intelligence Platform" with a deep-learning talent graph inferring latent/adjacent skills. Beamery: ~$112.8M ARR 2024, hit unicorn $1B in Dec 2022 on a "Talent Graph", but laid off 12% (2023) then ~25% (2024) chasing profitability. SeekOut: ~$25.2M ARR 2024, was $1.2B (2022) now ~$435M, sources from 700M+ profiles, cut 30% staff in 2024. Draup: $20M Series A, talent+sales intelligence on 80,000+ sources for workforce/reskilling planning. hiring.cafe: scrappy aggregator scraping real employer ATS feeds (Greenhouse/Lever/Workday) with GPT-based per-job summaries and natural-language search, free to seekers. The pattern: the unicorns are CASH-CONSTRAINED and their graphs are proprietary/opaque deep-learning embeddings; explainability and evidence-grounding are exactly where they are weak. Underlying market: talent-acquisition software ~$22-26B in 2025, AI-recruiting segment ~$3.2B growing ~12% CAGR; skills-based hiring claimed by 85% of employers in 2025 (up from 57% in 2022), though only a tiny fraction of hires are actually affected (the rhetoric/reality gap is itself a wedge).

JSON Resume (the founder's asset): an MIT-licensed open standard (schema v1.0.0, the resume-schema repo ~2.4k stars, jsonresume.org monorepo) with a clean typed structure already (basics, work, education, skills{name,level,keywords}, projects, awards, languages, etc.) plus a draft job-schema.json. Adoption is "10,000+ developers / tens of thousands of users", 400+ themes; the registry renders any GitHub Gist named resume.json at registry.jsonresume.org/<user> and already has nascent AI "suggestions". So the founder controls BOTH a typed-resume corpus AND a draft typed-job schema and the registry distribution channel, the two sides of the matching graph in a format that is already half-decomposed into claims.

Academic person-job fit has moved from CNN/RNN co-attention models (PJFCANN) to graph neural nets (Fine-Grained Semantics-Enhanced GNN 2025, graph adaptive fusion 2025) and to LLM re-ranking. The state-of-the-art honest admission is the donto thesis verbatim: ConFit v2/v3 (ACL 2025/2026) note embedding rankers "lack controllability and explainability as the ranking process happens entirely in latent embedding space" and bolt on LLM re-ranking to recover interpretability; bias is real (one benchmark 72% male). That is the precise gap a claim substrate fills: evidence-anchored, contradiction-aware, re-rankable matching with a human-readable WHY.

**Players / datasets / taxonomies / standards:**

- **ESCO (European Skills, Competences, Qualifications and Occupations)** — EU public skills/occupations ontology; ~14,575 skills/competences linked to 3,039 occupations, 28 languages; skill-occupation matrix is an explicit relationship graph. The de facto target vocabulary for LLM skill-extraction research. A donto skill claim can be canonicalized to ESCO IRIs. _[14,575 skills, 3,039 occupations, 28 languages]_ https://esco.ec.europa.eu/en/classification
- **Lightcast Open Skills (ex-Emsi Burning Glass)** — Largest commercial-but-open skills library; 32,000-34,000 skills in 31 categories, updated every 2 weeks from 1B+ job postings / 40,000 sources. Specialized vs common vs software skills. Free taxonomy download; paid data feeds. Benchmark for skill coverage/freshness. _[~33,000 skills; 1B+ postings; refreshed biweekly]_ https://lightcast.io/open-skills
- **LinkedIn Skills Graph** — ~39,000 skills, 374,000 aliases, 200,000+ skill-to-skill 'knowledge lineage' edges, mapped onto 875M people / 59M companies. The reference example of a skills KNOWLEDGE GRAPH (nodes+typed edges) powering skills-first matching. Proprietary, not exportable. _[39K skills, 374K aliases, 200K+ links, 875M people]_ https://www.linkedin.com/blog/engineering/skills-graph/building-linkedin-s-skills-graph-to-power-a-skills-first-world
- **O*NET (US DOL)** — Free US occupational database; 1,016 O*NET-SOC occupations, ~35 skills + 277 Content Model descriptors per occupation, work activities, abilities, knowledge. Public, citable, ideal seed ontology for occupation/role claims and skill importance weights. _[1,016 occupations, 277 descriptors]_ https://www.onetcenter.org/database.html
- **SkillsFuture Singapore Skills Frameworks** — National skills taxonomy; ~11,000 skill competencies across 34 frameworks / 38 sectors / 1,000+ job roles, ML-clustered into ~9 layers, plus 16 Critical Core Skills. Demonstrates government-scale skill-cluster hierarchies (career-path adjacency). _[~11,000 competencies, 34 frameworks, 38 sectors]_ https://www.skillsfuture.gov.sg/skills-framework
- **Eightfold AI** — Talent intelligence platform built on a deep-learning 'talent graph' that infers latent/adjacent skills and potential. Closest commercial analog to the donto vision but with opaque embedding graph (low explainability). _[~$96.6M ARR 2024; ~$2.1B val; $410M raised]_ https://eightfold.ai/
- **Beamery** — Talent CRM + 'Talent Graph' for skills-based talent management. Unicorn 2022 then heavy layoffs (12% 2023, ~25% 2024) chasing profitability; shows the category is real but capital-stressed. _[~$112.8M ARR 2024; $1B val (2022); $223M raised]_ https://beamery.com/
- **SeekOut** — AI talent sourcing across 700M+ profiles from 30+ sources; agentic recruiting. Valuation fell $1.2B(2022)->$435M(2024), 30% layoffs 2024. Evidence that profile-aggregation alone (no claim/evidence layer) is a fragile moat. _[~$25.2M ARR 2024; 700M+ profiles; val $1.2B->$435M]_ https://www.seekout.com/
- **Draup** — Talent + sales intelligence; workforce-planning/reskilling from 80,000+ sources, career-pathing and skill-gap analytics. Relevant for the skill-adjacency / career-path network-effect use cases. _[$20M Series A; 80,000+ sources]_ https://draup.com/talent/
- **hiring.cafe** — Free job aggregator scraping real employer ATS (Greenhouse/Lever/Workday/Workable) with GPT per-job summaries + natural-language search; filters ghost jobs. Cheap, fast distribution model; a realistic ingestion source for the jobs side of a JSON Resume matcher. _[aggregates from 14,000+ companies; free to seekers]_ https://hiring.cafe/
- **JSON Resume (jsonresume.org)** — Founder-owned MIT open resume standard (schema v1.0.0) + Gist-backed registry + draft job-schema.json. Already typed (basics/work/education/skills{name,level,keywords}/projects). Controls both sides of the graph and the distribution channel. _[resume-schema ~2.4k stars; 10,000+ devs; 400+ themes]_ https://jsonresume.org/schema
- **ConFit v2 / v3 (resume-job matching, ACL 2025/2026)** — SOTA embedding matcher + LLM re-ranking. Explicitly states embedding rankers 'lack controllability and explainability... entirely in latent embedding space' and adds LLM re-ranking to recover the WHY. Directly validates the claim-substrate explainability thesis. +13.8% recall / +17.5% nDCG over BM25/OpenAI embeddings. _[+13.8% recall, +17.5% nDCG vs baselines]_ https://arxiv.org/html/2502.12361v1
- **LLM skill extractors (Skill-LLM, SkiLLMo, ESCOX, 'Zero-shot ESCO matchers')** — Research line that extracts and normalizes typed skill claims from free-text resumes/jobs to ESCO IRIs at ~79-91% precision. This IS donto's typed-claim ingestion stage in skills vocabulary; reusable extraction prior art. _[91% extract / 80% standardize precision (SkiLLMo)]_ https://arxiv.org/html/2410.12052v1
- **Person-Job Fit GNN models (PJFCANN; Fine-Grained Semantics-Enhanced GNN 2025)** — Graph-based matching using co-occurrence/skill graphs + attention. Shows the field already believes the graph is the right structure; donto provides the persistent, evidence-anchored, contradiction-aware version they lack. _[unknown (academic)]_ https://www.mdpi.com/1099-4300/27/7/703
- **Skills-based hiring market context** — 85% of employers claim skills-based hiring in 2025 (57% in 2022); removing degree filters expands pools ~19x; skills 5x more predictive than education (McKinsey); but rhetoric>>reality gap. Talent-acquisition software ~$22-26B 2025, AI-recruiting ~$3.2B at ~12% CAGR. _[85% adoption claim; TAM ~$22-26B; AI seg ~$3.2B]_ https://blog.theinterviewguys.com/the-state-of-skills-based-hiring/

**Volume verdict:** High-volume typed extraction is unambiguously HELPFUL on this domain, more clearly than almost any other, but only when split across the two layers. WHERE IT HELPS (extraction/decomposition): a single resume bullet ('Led migration of monolith to microservices on AWS for a 5M-user fintech') legitimately contains dozens of typed claims, explicit (AWS, microservices, leadership) AND implicit/inferred (distributed systems, observability, PCI/regulatory exposure, team-scale, domain=fintech, seniority signal). The founder is right: that latent typed structure is vast, and extracting it densely is pure recall gain, the raw fuel. Denser per-document claim graphs are PRECONDITIONS for discovery: you cannot find a hidden-candidate-to-job (A-C) link without the intermediate skill/role/trajectory (B) nodes existing (Swanson ABC). Skill-adjacency and career-path graphs only become rich at volume, the network effect is real, and the open taxonomies (ESCO/Lightcast/LinkedIn) prove the relationship structure exists and is large (200K+ LinkedIn skill links). So volume at the typed-extraction layer is the asset. WHERE IT BECOMES NOISE (relationship-hypothesis layer): the cross product resumes x jobs x skill-combinations is combinatorially huge, and naively materializing every candidate match edge produces overwhelming false-discovery (most skill overlaps are spurious, most implicit-skill inferences are weak, most 'matches' are degree-of-overlap noise). Here volume MUST flow through a verifier/ranker that scores by evidentiary support, novelty, contradiction-value and verification cost, re-ranks on new evidence, and gates low-evidence match hypotheses as hypothesis_only. The reconciliation: VOLUME FEEDS THE VERIFIER. More extracted claims => denser graph => more candidate relationships => but the verifier decides which candidate edges become surfaced matches. Volume and the verifier are complementary, not opposed; the failure mode is skipping the verifier and treating every extracted relationship as a finding. A useful concrete guardrail: every emitted match/implicit-skill claim must be individually evidence-anchored and falsifiable, claims with no backing bullet are hypothesis_only and never shown as confident matches.

**Design implications:**
- Make ESCO + O*NET (both free/open) the canonical skill/occupation IRI vocabulary in donto, and treat each extracted resume/job skill as a typed claim that resolves (non-destructively, identity-as-hypothesis) to one-or-more taxonomy nodes. Keep raw surface form + normalized IRI + confidence as separate evidence-anchored claims so you never destroy the original bullet text. This directly reuses the Skill-LLM/SkiLLMo extraction pattern donto already has via opencode.
- Decompose BOTH resume and job into the SAME typed claim schema (skill-claim, role-claim, seniority-claim, domain-claim, tenure-claim, education-claim, trajectory-claim, and crucially IMPLICIT/inferred-skill-claim). Matching then becomes scored relationship discovery between two claim sets, not cosine similarity between two opaque vectors. Each match edge carries its provenance: 'requirement R met by skill-claim S extracted from bullet B of resume gist G'. This is the 'show the human WHY' the whole thesis hinges on, and it is exactly what ConFit/embedding rankers admit they cannot do.
- Use the contradiction frontier for real recruiting signals: resume asserts seniority:senior but tenure-claims sum to 18 months (rebuts); resume claims skill X but no evidence-bearing bullet supports it (undercut, hypothesis_only); two gists/profiles for the same person disagree (identity-as-hypothesis + typed argument edges). Surface these as a confidence/risk panel rather than silently averaging them away. This is a feature no embedding matcher can offer and it maps onto donto's existing argument/identity machinery (which is currently barely exercised).
- Lean on bitemporality for two concrete product features: (a) skill decay / freshness (a skill-claim valid-from 2018 with no recent reinforcement is weighted down; market demand for that skill is itself a time-series), and (b) 'replay why we matched' for audit/fairness/compliance (EU AI Act-style explainability for hiring). Bitemporal replay turns into a defensible compliance and trust story the incumbents (opaque graphs) cannot easily retrofit.
- VOLUME belongs at extraction: run many extraction lenses over each resume/job and EMIT TYPED CLAIMS, not prose. A 'skill lens' emits normalized skill-claims; an 'implicit-skill lens' infers skills from role+company+project (built a payments API => infers idempotency/PCI/distributed-systems claims as hypothesis_only with provenance); a 'trajectory lens' emits career-path edges; a 'seniority lens' emits time-bounded seniority assertions. Each candidate claim must be evidence-linked and individually verifiable/falsifiable. This is where '1M facts' is correct and fuels denser graphs (Swanson ABC: you need the intermediate skill/role B-nodes to exist to discover candidate-to-job A-C links).
- Put the verifier at the relationship-hypothesis layer to control false-discovery: rank candidate match edges by novelty (hidden candidate / non-obvious skill bridge), plausibility, evidentiary support (count + quality of backing claims), contradiction-value, and verification cost. Re-rank ALL stored match hypotheses when new evidence arrives (candidate adds a project, job edits a requirement, market shifts). The verifier is what stops the combinatorial explosion of resume x job x skill from becoming noise; volume feeds it, it does not fight volume.
- Exploit network effects deliberately and make them the moat: as resumes+jobs accumulate, build a SKILL-ADJACENCY graph (skills that co-occur in evidence become candidate prerequisite/substitute edges), CAREER-PATH graph (observed role transitions become trajectory hypotheses with frequency evidence), and HIDDEN-CANDIDATE discovery (people whose implicit-skill-claims match a job even though their surface resume keywords do not). These are emergent typed relationships the incumbents charge enterprise prices for; donto generates them as a byproduct of the claim lifecycle.
- Ship it on assets the founder already owns to get a real corpus cheaply: the JSON Resume registry is the resume side (Gist-backed, already typed, version-controlled = free bitemporal valid-time); harvest the jobs side from open ATS feeds the way hiring.cafe does (Greenhouse/Lever/Workday public boards). Use the existing draft job-schema.json. This gives a true second non-genealogy domain proving donto is domain-neutral, with a built-in distribution channel (registry.jsonresume.org/<user>).
- Lead the product UX with explainability and contradiction, since that is the demonstrable gap vs Eightfold/Beamery/embedding research: every match shows the backing claim chain, the counter-evidence, the confidence, and a verification path ('confirm this inferred skill by adding the repo link'). This converts donto's weakness-today (contradiction/identity machinery barely used) into the headline differentiator in a domain where it is naturally exercised.

**Honest caveats:**
- The incumbents are not naive: Eightfold/Beamery/LinkedIn already infer latent and adjacent skills via deep-learning talent graphs. donto's differentiation is NOT 'we have a skills graph' (they have bigger ones) but specifically evidence-anchoring + contradiction-preservation + bitemporal replay + human-readable WHY. If the product collapses claims to scores like everyone else, there is no moat.
- Cold-start / corpus-size reality check: JSON Resume's real adoption is modest (10,000+ developers, ~tens of thousands of users, dev-skewed) versus LinkedIn's 875M profiles and Lightcast's 1B postings. The network effects the thesis needs are real in principle but donto starts orders of magnitude behind on volume; the dev-resume niche may be too narrow to bootstrap broad skill-adjacency. Start where the data is dense (software roles) and be honest that it is a wedge, not the whole market on day one.
- The skills-based-hiring TAM is partly hype: 85% of employers CLAIM it but reported real impact on hires is tiny (rhetoric/reality gap). Buyers may not actually pay for better explainable matching if their hiring is still degree/pedigree-driven in practice. Monetization is plausible (TA software ~$22-26B, AI seg ~$3.2B/12% CAGR) but the explainability/fairness angle, not raw matching accuracy, is the likeliest paid wedge (EU AI Act / hiring-audit compliance).
- Bias and fairness cut both ways: evidence-anchoring can REDUCE bias (decisions traceable to job-relevant claims) but the extraction LLM itself inherits bias (ConFit benchmark 72% male; skill inference can encode proxies for protected attributes). Claiming donto is 'fairer' requires actual measurement, not architecture alone.
- Extraction precision is not free: best ESCO skill-extraction pipelines hit ~79-91% precision, meaning a meaningful fraction of emitted skill-claims are wrong. At high volume that is a lot of false claims entering the graph. The contradiction/hypothesis_only machinery must actually be wired to demote them, this is exactly the part of donto that is currently barely exercised (argument edges ~2,426, evidence ~4.7%), so the domain will stress-test unbuilt code paths.
- There is NO cross-entity relationship-generation step in donto's code today, per the brief. The entire 'generate match hypotheses from claim combinations + verifier/ranker' layer that makes this a product is net-new build, not a config of existing genealogy pipelines. The taxonomies and extraction prior art exist; the discovery/verifier engine is the real engineering risk.
- Privacy/consent: matching real people's resumes to jobs at scale (especially Gist-scraped or ATS-scraped data) has GDPR/CCPA exposure that genealogy of deceased ancestors does not. Bitemporal 'remember everything forever' is a compliance liability for living job-seekers (right-to-erasure conflicts with contradiction-preserving permanence).


## linguistic-decomposition-dimensionality

The founder's intuition is literally true and measurable. Modern NLP defines a stack of ~12 distinct, gold-standard, FALSIFIABLE annotation layers over the same text, each with public datasets and inter-annotator agreement. They are not redundant — each emits a different TYPE of claim. Layer by layer, every token and every predicate gets decomposed many times over. Universal Dependencies annotates EVERY token with 6 fields: form, lemma, 1 of 17 universal POS tags, a bundle drawn from ~30 morphological feature types (200+ possible values: Case has 40+, Tense, Mood, Aspect, Voice, Number, Person, Gender, Definite, etc.), a HEAD pointer, and 1 of 37 dependency relations. That alone is ~5-6 typed claims per token before you touch meaning. Then Abstract Meaning Representation (AMR 3.0, ~59K gold sentences, LDC2020T02) re-encodes the same sentence as a rooted directed graph of concepts+relations (roughly one node per content word plus :ARG/:mod/:time edges). Then PropBank (every WSJ verb, sense-numbered e.g. leave.01 + numbered args ARG0-ARG5) and FrameNet (1,000+ frames, 13,000+ lexical units, named roles like Buyer/Seller) label predicate-argument structure / semantic roles. Then word-sense disambiguation tags content words to WordNet (content words average ~3-8 senses; SemCor is the gold corpus). Then coreference + named-entity + entity-linking (OntoNotes 5.0 layers all of these over 400K words across 3 languages). Then Universal Decompositional Semantics (UDS1.0, decomp.io) replaces categorical roles with 16-18 GRADED real-valued proto-role properties per predicate-argument edge (instigation, volition, awareness, sentience, change-of-state, existed-before/during/after...) PLUS predicate-level factuality (did it happen?) and genericity (kind/hypothetical/dynamic) — i.e. each single edge becomes ~20 scalar claims. Then temporal: TimeML/TimeBank events + TIMEX3 + TLINKs using Allen interval relations; TimeBank-Dense shows the combinatorial blow-up directly — 36 documents, 1.6K events yield 5.7K temporal links (~10x denser than sparse TimeBank, because you label ALL event/time pairs). Then discourse: PDTB-3 (53,631 annotated discourse-relation tokens, 3-level sense hierarchy) and RST/eRST (rhetorical trees). Then multiword expressions (PARSEME 1.3: 455K sentences, 26 languages, 62K+ verbal MWEs). Plus sentiment/stance and presupposition/implicature layers.

So a realistic count for ONE rich 100-word paragraph: ~100 tokens x ~5 UD claims = ~500 syntactic/morphological claims; +~80-120 AMR nodes/edges; +~10-20 predicates x (sense + ~3 args + ~20 UDS graded properties each) = several hundred semantic claims; +coref chains, +entity links, +5-15 temporal events/links, +3-8 discourse relations, +MWEs, +sense tags. That is conservatively 1,000-2,000+ typed, falsifiable annotations per paragraph WITHOUT any free-form prose. At corpus scale this is not hand-waving: SemMedDB is the existence proof — SemRep extracted 96.3 MILLION typed subject-predicate-object predications from 29.1M PubMed abstracts, and that database is the literal substrate of automated biomedical hypothesis generation. So "millions of typed linguistic facts from a corpus" is not aspirational; it is below the demonstrated ceiling.

Crucially for donto's thesis, these typed claims DO enable cross-text relationship discovery, and the mechanism is exactly Swanson's ABC / literature-based discovery: shared intermediate typed facts are the bridge. Swanson's Raynaud-fish-oil discovery worked because the bridging B-concept (blood viscosity) appeared as a typed claim in BOTH literatures even though only 4 of 489 articles co-mentioned A and C — the discovery LIVED in the intermediate typed facts, not the prose. Shared FrameNet frames, shared WordNet senses, shared entity links, and shared AMR concept nodes are precisely the join keys that let two texts that never cite each other be connected. This is the honest reconciliation of the volume debate: dense typed extraction is the FUEL (high recall, the raw B-facts must exist or no bridge is findable), and the danger is purely at the relationship-HYPOTHESIS layer where all-pairs combinatorics (TimeBank-Dense's 1.6K events -> 5.7K links, and far worse cross-document) demand a verifier/ranker. Volume feeds the verifier; they are not opposed.

**Players / datasets / taxonomies / standards:**

- **Universal Dependencies (UD v2)** — Per-token gold annotation: 17 universal POS tags, 37 dependency relations, ~30 morphological feature types (200+ values; Case alone 40+). CoNLL-U fields: FORM, LEMMA, UPOS, FEATS, HEAD, DEPREL = ~5-6 typed claims per token. Falsifiable, huge gold standard. _[200+ treebanks, 150+ languages, 600+ contributors; every token fully decomposed]_ https://universaldependencies.org/
- **Abstract Meaning Representation (AMR 3.0 / LDC2020T02)** — Whole-sentence meaning as a rooted directed acyclic graph: concept nodes (PropBank predicates + entities) and :ARGn/:mod/:time/:location edges. Roughly one node per content word plus relation edges. Gold-standard semantic parsing task. _[~59,255 gold AMRs (English); MASSIVE-AMR 84K+ annotations, 50+ languages]_ https://catalog.ldc.upenn.edu/LDC2020T02
- **PropBank** — Predicate-argument / semantic role labeling: every verb sense-numbered (leave.01 vs leave.02) with numbered args ARG0-ARG5 + ArgM modifiers. Annotated over all WSJ verbs in Penn Treebank. The workhorse SRL standard. _[Full WSJ verb coverage; foundation of OntoNotes propositions]_ https://www.cs.rochester.edu/~gildea/palmer-propbank-cl.pdf
- **FrameNet** — Frame-semantic SRL: evokes 1 of 1,000+ hierarchically-related frames with named roles (Buyer/Seller/Goods). Richer/typed roles vs PropBank's numbered args. Different typed claim type from the same predicate. _[1,000+ frames, 13,000+ lexical units, 200K+ annotated instances]_ https://framenet.icsi.berkeley.edu/
- **Universal Decompositional Semantics (UDS1.0, decomp.io)** — Replaces categorical roles with GRADED real-valued claims: 16-18 semantic proto-role properties per pred-arg edge (instigation, volition, awareness, sentient, change_of_state, existed_before/during/after, was_used...) + predicate-level factuality + genericity (kind/hypothetical/dynamic). Turns ONE edge into ~20 scalar typed claims. _[Unifies 5 decompositional annotation sets on a UD-anchored graph; SPARQL-queryable]_ http://decomp.io/
- **OntoNotes 5.0 (LDC2013T19)** — Multi-layer gold corpus stacking Treebank + PropBank + word sense + named entities + coreference over the SAME text, 3 languages. Proves the layers compose on one corpus with cross-layer DB access. _[400K words treebank/prop; 300K words coref+NER; 200K words sense; ~5,000 noun/verb senses]_ https://catalog.ldc.upenn.edu/LDC2013T19
- **Word-Sense Disambiguation (WordNet / SemCor)** — Tags each content word to a WordNet synset. Content words average ~3-8 senses; sense identity is a join key across texts. SemCor is the gold all-words corpus. _[SemCor = largest WordNet-tagged corpus; WordNet 117K+ synsets]_ https://web.stanford.edu/~jurafsky/slp3/19.pdf
- **TimeML / TimeBank + TimeBank-Dense** — Events (EVENT), time expressions (TIMEX3), and temporal links (TLINK) using Allen interval relations (before/after/during/overlaps...). TimeBank-Dense labels ALL event/time pairs — the combinatorial-blowup canary. _[TimeBank 1.2: 183 docs, 27K+ event/temporal annotations. TB-Dense: 36 docs, 1.6K events -> 5.7K TLINKs (~10x denser)]_ https://timeml.github.io/site/timebank/documentation-1.2.html
- **PDTB-3 (Penn Discourse Treebank) + RST/eRST** — Discourse relations between spans (explicit + implicit connectives) with a 3-level sense hierarchy (Temporal/Contingency/Comparison/Expansion). RST/eRST = rhetorical-structure trees. Typed inter-clause/inter-sentence claims. _[PDTB-3: 53,631 annotated discourse-relation tokens; DISRPT 2023 shared task spans 26 datasets, 13 languages]_ https://catalog.ldc.upenn.edu/docs/LDC2019T05/PDTB3-Annotation-Manual.pdf
- **PARSEME (multiword expressions)** — Verbal MWE annotation (idioms, light-verb constructions, verb-particle) — non-compositional units that change which typed claims are valid. Unified guidelines across many languages. _[v1.3: 455K+ sentences, 26 languages, 62K+ verbal MWEs annotated]_ https://www.lisn.upsaclay.fr/corpus-software/parseme-corpora-annotated-for-verbal-multiword-expressions-en/
- **SemMedDB / SemRep (real-world proof of typed-extraction-at-scale)** — NLM's SemRep extracts typed subject-PREDICATE-object predications (TREATS, CAUSES, INHIBITS, LOCATION_OF...) from PubMed. The single best evidence that high-volume typed extraction is real AND drives discovery — it is the substrate of automated biomedical literature-based discovery. _[96.3 MILLION predications from 29.1M PubMed citations]_ https://academic.oup.com/bioinformatics/article/28/23/3158/195282
- **Swanson literature-based discovery (ABC model)** — The canonical proof that shared INTERMEDIATE typed facts enable cross-text relationship discovery. Raynaud (A) <- blood viscosity (B) -> fish oil (C); the bridge lived in the typed B-facts even though only 4 of 489 articles co-mentioned A and C. Directly validates donto's relationship-hypothesis layer. _[4 of 489 seed articles bridged A-C; hypothesis later clinically confirmed]_ https://en.wikipedia.org/wiki/Literature-based_discovery

**Volume verdict:** HIGH-VOLUME TYPED EXTRACTION HELPS, decisively, at the decomposition/recall layer and as the supply of intermediate B-facts for discovery. The founder is correct and the numbers back him: ~12 distinct gold annotation layers, ~5-6 UD claims per token, ~20 UDS scalar claims per pred-arg edge, conservatively 1,000-2,000+ typed falsifiable annotations per rich paragraph, and SemMedDB's 96.3M predications prove corpus-scale typed extraction is already real and already drives discovery. Denser typed graphs strictly enable MORE discovery, because Swanson-style bridges only exist if the intermediate typed facts were extracted in the first place (4-of-489: the discovery was invisible in prose, visible only in the shared typed B-concept). So volume = recall = fuel: extract aggressively, multi-layer, with provenance. It becomes NOISE at exactly one place: the relationship-HYPOTHESIS layer, where you combine claims into NEW candidate edges. There the math is combinatorial (TimeBank-Dense: 1.6K events -> 5.7K links in 36 docs; cross-document entity pairs are quadratic), so undisciplined generation produces a false-discovery flood. The fix is not less extraction — it is a verifier/ranker between the dense typed-claim layer and the asserted-relationship layer (rank by novelty x plausibility x evidentiary-support x contradiction-value x verification-cost; re-rank on new evidence). Concretely: typed EXTRACTION should be maximal-recall and volume-loving; typed RELATIONSHIP-GENERATION should be precision-gated. They are complementary, not in tension. The one thing that IS pure noise regardless of volume is FREE-FORM PROSE output from a lens — it has no join key, no falsifiability, no provenance type, so it can neither feed the ranker nor bridge two texts. Every lens must emit rows in a declared typed-claim family or it is decorative.

**Design implications:**
- Model the annotation LAYER as a first-class claim namespace. donto already has typed predicates; reserve predicate families per layer (ud:deprel, amr:ARGn, srl:frame/role, wn:sense, uds:proto-role-property, tml:tlink, pdtb:relation) so a single token/predicate legally carries claims from many layers at once — exactly OntoNotes' multi-layer-over-one-text design. This is the schema that makes '1,000+ typed claims per paragraph' representable rather than hand-wavy.
- Make every typed claim point at a source byte AND its annotation layer + extractor version. UD/AMR/UDS disagree by design; treating layer-of-origin as provenance is what lets contradictions be legal state (UDS factuality=0.2 vs an SRL layer asserting the event occurred is a real, useful contradiction, not a bug).
- Adopt UDS's GRADED claims, not just categorical ones. Proto-role properties, factuality, and genericity are real-valued [-3,3]. donto's confidence/ranking machinery should ingest these scalars directly as evidence weights — they are pre-built plausibility signals for the ranker.
- Build cross-text JOIN KEYS explicitly: shared WordNet sense, shared FrameNet frame, shared entity-link IRI, shared AMR concept. These are the literal bridges for relationship discovery (Swanson ABC). A 'discovery query' is: find entities/claims that share an intermediate typed B-fact but are not yet directly linked. This is the cross-entity relationship-generation step the codebase currently lacks.
- Treat all-pairs typed relations (Allen temporal links, coref, cross-doc entity matches) as the place the verifier/ranker is mandatory. TimeBank-Dense (1.6K events -> 5.7K links in 36 docs) shows the blow-up in-document; cross-document it is quadratic in entities. Generate candidates densely, but gate them by novelty x plausibility x evidentiary-support x verification-cost before they become asserted edges.
- Define each lens as emitting a specific typed-claim family with a named target schema (a temporal lens MUST emit tml:event/tml:tlink rows; a role lens MUST emit srl:frame/role + uds:property scalars). Borrow the 12 NLP layers as the canonical lens catalog so 'is this lens decorative?' has an objective test: does it write rows in its declared predicate family that the ranker can use?
- Use SemMedDB as the architectural benchmark and de-risker: a rule-based extractor already produced 96.3M typed predications and powers real discovery. donto's claim lifecycle (contradiction-preserving, bitemporal, evidence-anchored, re-rankable) is precisely the layer SemMedDB LACKS — that gap is the product.

**Honest caveats:**
- The per-paragraph count (1,000-2,000+) is an arithmetic projection from layer densities (UD per-token, UDS per-edge, etc.), not a measured single-corpus figure — no one corpus has ALL layers stacked on the same text; OntoNotes stacks ~5, UDS stacks decompositional ones on UD. The decomposition is real and falsifiable, but the headline number is a synthesis, not a citation.
- These are HUMAN gold-standard layers; automatic extractors are noisier. SRL/AMR parsers run high-80s/low-90s F1; WSD on rare senses and implicit/cross-document coreference and temporal TLINK classification are materially worse. donto would ingest extractor output, so a calibrated per-claim confidence (and the contradiction machinery) is load-bearing, not optional.
- An LLM-based extractor (donto's actual path via GLM) does NOT natively emit UD/AMR/PropBank IDs — it must be constrained to the target schemas. Without a typed output contract you get prose dressed as facts, which is exactly the noise case. The gold standards are the schema to target, not what the extractor produces for free.
- Some layers genuinely overlap (PropBank vs FrameNet vs UDS roles describe the same predicate-argument structure three ways). That overlap is a FEATURE for contradiction-preservation and cross-source corroboration, but it inflates raw counts; 'distinct typed claims' should be deduplicated by (layer, target) or the volume figure double-counts.
- SemMedDB and Swanson are biomedical, where typed predicate vocabularies (UMLS, predication types) are mature and curated. General-domain typed-claim vocabularies are messier; the jsonresume->jobs and genealogy domains will need their own controlled predicate/role inventories before the 'shared B-fact bridge' trick works as cleanly as it does in biomedicine.
- Cross-document entity linking and cross-document coreference — the join keys that make intertextual discovery work — are among the HARDEST and least-solved tasks here. The bridges donto needs for discovery depend on the weakest links in the stack, so identity-as-hypothesis (non-destructive merge, query-time lenses) is well-matched but the underlying signal is noisy.


## volume-vs-precision-reconciliation

The literature cleanly splits volume into two layers, and the founder is right on the first while the prior report was right on the second. LAYER 1 — typed extraction / graph density (volume HELPS): Knowledge-graph completion and link prediction degrade sharply as graphs get sparser. Common-neighbor and triplet-closure methods "falter in sparse graphs since they rely on closed triplets," and commonsense-KB completion models "implicitly assume densely connected graphs, with performance degrading quickly as graph density is reduced" (Malaviya et al., arXiv:1910.02915). Real KGs are brutally sparse (commonsense and biochemical graphs avg degree ~2); denser benchmarks (FB15k-237 vs WN18RR) yield materially better link prediction. The mechanism is exactly Swanson's ABC: an A→C discovery is impossible unless the intermediate B-facts physically exist in the graph. If extraction is shallow, the bridge term is simply absent and the true latent link is unrecoverable — a recall ceiling no downstream ranker can lift. So the founder's "vast latent typed structure of one text" intuition is the correct objective at this layer: more typed claims = denser graph = more reachable true links = higher recall. This is the raw fuel, and starving it is the one mistake you cannot fix later. LAYER 2 — unverified relationship hypotheses (volume HURTS without a gate): The Calude-Longo theorem ("The Deluge of Spurious Correlations in Big Data," Foundations of Science 2017) proves via Ramsey / Van der Waerden theory that sufficiently large databases MUST contain arbitrary regularities purely as a function of size, independent of the data's nature — "most correlations are spurious," findable even in randomly generated data. Candidate relationships between N entities scale ~O(N²) (or worse for typed/path hypotheses), so the number of false candidates grows combinatorially while true links grow ~linearly: the signal-to-noise of unverified candidates collapses as you scale. This is the classic multiple-comparisons regime where even Benjamini-Hochberg FDR control (E[V/R]) is needed and Bonferroni FWER is "too conservative to be useful" (BH 1995). The applied evidence agrees: OpenIE's documented coverage-vs-utility tension ("cover more information... at the cost of utility and compactness," noisier triples at high recall), and the fact that automatically constructed KGs "inevitably bring in plenty of noise" requiring trustiness/error-detection layers (Entropy 21(11):1083). RECONCILIATION: volume and precision are NOT opposed — they live at different layers. Volume at extraction feeds the verifier; the verifier (scoring + evidence + contradiction + human-in-loop) is what makes volume at the hypothesis layer safe instead of a deluge. donto's architecture is literally the missing gate: every claim is evidence-anchored (so a hypothesis carries its provenance / "why"), contradictions are first-class (a generated link that rebuts existing evidence is a feature, not corruption), and the ranking step (novelty/plausibility/support/contradiction-value/verification-cost) IS the FDR controller. The boundary is therefore concrete: extract maximally (high recall, accept noisy typed claims, they're individually source-checkable), but NEVER promote a generated relationship to "believed" state without passing the scorer + evidence threshold; hold the rest as hypothesis_only.

**Players / datasets / taxonomies / standards:**

- **Calude & Longo — The Deluge of Spurious Correlations in Big Data (Foundations of Science, 2017)** — The load-bearing theoretical citation. Uses Ramsey theory / Van der Waerden theorem + ergodic + algorithmic information theory to PROVE large enough databases necessarily contain arbitrary correlations as a function of size alone, not nature; 'most correlations are spurious' and appear even in randomly generated data. Conclusion: correlation-mining must be accompanied by causation/meaning/verification, not replace it. This is the rigorous justification for why the relationship-hypothesis layer needs a verifier. _[Foundational theorem; widely cited (~Foundations of Science 2017)]_ https://link.springer.com/article/10.1007/s10699-016-9489-4
- **Benjamini-Hochberg FDR procedure (1995) + control under dependency (2001)** — The operational model for the ranking/gate step. Controls expected proportion of false discoveries E(V/R) rather than any-false-positive (FWER); explicitly the right tool when the number of candidate hypotheses m is large, where Bonferroni/FWER is 'too conservative to be useful.' Maps directly to donto's hypothesis-ranking: treat each generated relationship as a test, control the false-discovery proportion among promoted links. _[BH 1995 is one of the most-cited statistics papers; Benjamini-Yekutieli 2001 extends to dependent tests]_ https://en.wikipedia.org/wiki/False_discovery_rate
- **Swanson ABC / Literature-Based Discovery** — The canonical proof that density at the extraction layer enables discovery: an A-C hypothesis is only findable if intermediate B-terms EXIST in the graph (Raynaud-fish oil via blood viscosity). Also documents the dark side: the 'sheer number of intermediate terms exponentially expands the search space,' producing too many uninterpretable targets — i.e. recall benefit at B-layer, precision cost at hypothesis-layer. Exactly donto's two-layer split. _[Field founded by Swanson 1986; active LBD literature 2021-2024]_ https://pmc.ncbi.nlm.nih.gov/articles/PMC5771422/
- **Malaviya et al. — Commonsense KB Completion with Structural and Semantic Context (2019)** — Direct evidence that sparsity kills completion: existing KB-completion models 'implicitly assume densely connected graphs, with performance degrading quickly as graph density is reduced.' Real commonsense/biochemical KGs have avg degree ~2 (very sparse). Motivates graph densification + transfer learning. Supports: denser extracted graph = better link recall. _[avg degree ~2 in real sparse KGs; perf degrades quickly with sparsity]_ https://arxiv.org/pdf/1910.02915
- **Stanovsky et al. / OpenIE6 / OpenIE survey — recall-precision tradeoff in extraction** — Quantifies the extraction-layer tradeoff donto must own: confidence/rank modeling is used to trade precision vs recall; neural OpenIE 'covers more information... at the cost of utility and compactness'; 'incorrect extractions of one sentence often have higher likelihood than correct extractions of another' (cross-sentence confidence is unreliable). Implication: extract high-recall, but DON'T trust a global confidence score to gate — gate with evidence + downstream verification instead. _[Rank-aware iterative learning improves AUC; medical QA needs high-precision/low-recall settings]_ https://arxiv.org/pdf/1905.13413
- **Microsoft GraphRAG + LazyGraphRAG / KET-RAG — indexing cost economics** — The cost reality of high-volume LLM extraction: graph extraction is ~75% of indexing cost; building a KG over 1M tokens costs ~$20-50 and hours, hundreds of dollars at corpus scale. LazyGraphRAG achieves ~0.1% of full indexing cost by deferring LLM summarization. Lesson for donto: extract typed claims eagerly but defer/lazy the expensive relationship-generation+summarization until query-time or until a candidate clears a cheap pre-filter. _[~75% of index cost is extraction; LazyGraphRAG ~0.1% of full cost; ~$20-50 per 1M tokens]_ https://www.microsoft.com/en-us/research/blog/graphrag-new-tool-for-complex-data-discovery-now-on-github/
- **Embedding Learning with Triple Trustiness on Noisy KGs (Entropy 2019, 21(11):1083)** — Confirms automatically-constructed KGs 'inevitably bring in plenty of noise' and that naive embedding models wrongly assume all triples correct. Solution = per-triple trustiness scoring from entity types + descriptions. Direct analog to donto's evidence-anchoring + confidence-update: noise at extraction is expected and managed by a trust/evidence layer, not by refusing to extract. _[Trustiness-weighted embeddings outperform on noisy KGs]_ https://www.mdpi.com/1099-4300/21/11/1083
- **ESCO + implicit-skill extraction (resume-jobs flagship grounding)** — For the jsonresume->jobs domain: ESCO is the EU standard with 3,039 occupations + 13,939 skills across 28 languages — a ready-made typed-claim ontology and relationship backbone. Implicit-skill extraction (skills not explicitly stated, inferred from role/industry/geography) hits ~78-80% precision / ~86-88% recall at industrial scale — proving the typed-extraction layer is tractable and that implicit/inferred claims (the founder's 'IMPLICIT skills') are a real, measured category. _[ESCO: 3,039 occupations, 13,939 skills, 28 languages; implicit-skill extraction ~80% precision / ~86% recall]_ https://esco.ec.europa.eu/en/classification

**Volume verdict:** HELPS at the typed-extraction/decomposition layer; HURTS at the unverified-relationship layer. Concretely: (1) HELPS — denser extraction raises recall of true latent links because discovery is bridge-dependent (Swanson ABC: no B-fact, no A-C link) and completion models provably degrade on sparse graphs (avg degree ~2 is where they fail). Extract everything, including implicit/inferred typed claims; each is individually source-checkable, so extraction noise is recoverable. This is the founder's correct intuition and donto is currently STARVED here (4.7% evidence coverage). (2) HURTS — candidate relationships scale ~O(N^2)/combinatorially while true links grow ~linearly, and Calude-Longo proves large data MUST contain spurious regularities by size alone, so the believed-relationship set degrades toward noise unless gated. THE BOUNDARY is a state transition, not a volume cap: a typed claim entering the graph from extraction = allowed at any volume (it's evidence-anchored and stays a claim). A generated relationship being PROMOTED from hypothesis_only to believed = must pass the scorer (novelty/plausibility/support/contradiction-value/verification-cost) under an explicit FDR budget. Volume feeds the verifier; the verifier is what makes volume safe. They are sequential, not opposed.

**Design implications:**
- TWO-LAYER VOLUME POLICY (state it explicitly in the report): (a) extraction layer = maximize volume/recall, accept noisy typed claims because each is individually evidence-anchored and re-checkable; (b) relationship-hypothesis layer = volume is a liability, every generated link stays in hypothesis_only state until it clears the scorer. Never let extraction-volume leak directly into believed relationships.
- The ranker IS the FDR controller — frame it that way. Score = f(novelty, plausibility, evidentiary-support, contradiction-value, verification-cost). Set an explicit promotion threshold and report an estimated false-discovery proportion among promoted hypotheses (a Benjamini-Hochberg-style budget), so 'how many of our generated links are probably real' is a first-class, defensible number — not vibes.
- Make density a measured, monitored input. Track avg node degree / evidence-coverage % per context; donto today is ~4.7% evidence and 2,426 argument edges — i.e. starved at BOTH layers. The report should say: the extraction layer is currently UNDER-fed (good news for the volume thesis) and the relationship layer is essentially EMPTY (the missing generation+gate step is the actual product gap).
- Lazy/deferred relationship generation, per GraphRAG cost data. Extract typed claims eagerly (cheap, source-checkable); generate the O(N^2) candidate relationships lazily or behind a cheap pre-filter (type-compatibility, shared-evidence, embedding-neighborhood) BEFORE spending an LLM on full hypothesis articulation. Don't materialize the combinatorial blob.
- Contradiction is the precision asset, not a bug. Because false candidates are inevitable at scale (Calude-Longo), the differentiator is that donto can hold a generated link AND its rebutting evidence simultaneously and rank by contradiction-value. Competitors that collapse to a single 'truth' must silently drop or overwrite — donto turns the noise into typed argument edges. Lead with this.
- Each lens must emit typed claims that move the graph (re-state the accepted reframe with teeth): a lens whose output cannot change a confidence, add an evidence link, create a contradiction, or propose a verifiable relationship is decorative and should be cut. The test for any lens = does it raise recall of true latent links OR sharpen the gate? If neither, delete.
- Resume-jobs is the clean FDR demo: matching N resumes x M jobs is exactly O(N*M) candidate relationships — the combinatorial-explosion case made concrete. Use it to show the gate working: extract dense typed skill claims (incl. implicit/inferred, ~80% precision is fine pre-gate), generate match candidates, then promote only matches with evidence-anchored skill->requirement satisfaction above threshold, and surface contradictions (claims-senior vs tenure-junior) as ranked signals. Report a precision@k on promoted matches as the proof number.
- Bitemporal re-ranking is the volume-safety valve over time: because new evidence arrives, old low-ranked hypotheses get re-scored rather than re-generated. This bounds cost (you don't re-run the combinatorial generation) and is the honest answer to 'won't volume just keep producing more spurious links?' — no, the believed set is gated and re-ranked, the hypothesis pool is held cheaply as hypothesis_only.

**Honest caveats:**
- The two-layer model is clean in theory but the boundary is fuzzy in practice: extraction itself already implies relationships (a typed claim 'X HAS_SKILL Y' is both a fact and an edge). The report should be honest that 'extraction' vs 'relationship-hypothesis' is a spectrum — the real distinction is source-grounded-single-statement (extracted, trust the byte) vs combinatorially-generated-from-multiple-claims (hypothesis, must gate).
- Calude-Longo is a statement about spurious correlations in arbitrary/random data; it does NOT prove that real-domain discovery is hopeless. It's a justification for gating, not for pessimism. Don't overclaim it as 'big data is useless' — it specifically says mining ENRICHES but cannot REPLACE causal/verification reasoning.
- No paper I found gives a clean closed-form 'recall improves by X% per unit of graph density' — the density->recall link is strongly directionally supported (multiple completion/LBD papers) but the exact dose-response curve is dataset-specific and unmeasured for donto's own data. The report should present density-helps-recall as well-evidenced direction, not a quantified guarantee.
- FDR/Benjamini-Hochberg assumes you can assign something like a p-value or calibrated score to each candidate. donto's hypothesis scores are heuristic composites (novelty/plausibility/etc.), not true p-values, so 'FDR control' is an analogy/design target, not a literal statistical guarantee. Be honest that the promoted-hypothesis false-discovery number is an estimate from a heuristic, validated empirically (precision@k on a held-out verified set), not a proven bound.
- OpenIE confidence-scoring evidence cuts BOTH ways: it shows extraction CAN be ranked, but also that cross-sentence confidence is unreliable — which means donto cannot lean on a global extractor-confidence to gate; it must use evidence-anchoring + downstream verification. This is a strength of donto's design but also more work than 'just threshold the LLM score.'
- The resume-jobs flagship inherits a real risk: implicit/inferred skill claims (~80% precision) are exactly the kind of high-volume noisy typed claims that, if treated as believed rather than hypothesis, would produce bad matches and erode trust fast. The gate is not optional in the monetizable domain — it's the difference between a useful matcher and a spam engine.


## claim-hypothesis-substrate-prior-art

The "claim/hypothesis/evidence substrate" is well-trodden territory — donto must cite it, not claim it. The shipped reference point is Wikidata: 120M+ items, ~1.65B statements, with a real claim-status model — every statement carries qualifiers (time/role/context), a list of references (each a "snak"/source), and a RANK of preferred/normal/deprecated. Crucially, deprecated statements are KEPT, not deleted ("known to be erroneous but still listed... in order to prevent them being constantly added and removed") — i.e. Wikidata already ships a contradiction-tolerant, source-attributed, three-valued claim store at billion-scale. ~73% of statements have provenance. Nanopublications are the academic gold standard for the per-claim envelope: three named RDF graphs — assertion + provenance (where it came from) + publication-info (who/when minted it) — with immutable Trusty-URI hashing; >10M published, decentralized registry (Nanopub Registry / Nanodash / Knowledge Pixels). The 2025 "knowledge-provenance" extension adds a 4th graph specifically to track an assertion aggregated from a body of supporting AND conflicting evidence with truth values — i.e. nanopubs are actively bolting on what donto has natively. Bucur et al. (2021) "super-pattern" formalizes scientific claims in logic to auto-DETECT contradictory claims (evaluated on the Cooperation Databank). So the claim-level provenance + contradiction-detection problem is solved-ish; what nobody ships is contradiction as permanent first-class legal state.

On typed-claim extraction (the VOLUME layer): SemMedDB/SemRep is the scale proof — 130M+ subject-predicate-object "semantic predications" from 37M+ PubMed abstracts, concepts normalized to UMLS. It is the canonical "decompose a corpus into millions of typed claims" system — and it was DEPRECATED Dec 31 2024 (final v43), leaving a vacuum donto's extraction layer directly addresses. SciClaim (EMNLP 2021) is the fine-grained schema: typed graph annotations (causal/comparative/predictive/statistical/proportional associations + qualifying attributes), 12,738 labels, >2x the label density of prior datasets — concrete proof that one text yields dense typed structure (validates the founder's "vast latent structure" intuition). ClaimsKG (28,383 fact-checked claims → 6.6M triples) shows claims-with-truth-ratings as a queryable KG. FEVER (185,445 claims labeled SUPPORTED/REFUTED/NOTENOUGHINFO with evidence-sentence sets) is the canonical evidence-anchored verification dataset — directly analogous to scored resume↔job matching ("this skill-claim meets this requirement").

On argumentation + re-ranking (the HYPOTHESIS-LIFECYCLE layer): AIF (Argument Interchange Format) is the standard ontology for typed argument edges — it already distinguishes REBUT (attacks a conclusion) vs UNDERCUT (attacks the inference itself), exactly donto's supports/rebuts/undercuts; AIFdb holds 14,000+ argument maps / 160,000+ claims. The re-ranking story has a flagship 2025-26 exemplar: Google DeepMind's AI Co-Scientist (Nature, May 2026) runs a literal "tournament of ideas" — Generation → Reflection (peer-review for correctness+novelty) → Ranking (pairwise Elo debates) → Evolution (refine top hypotheses, re-enter tournament) → Proximity (dedup). This is precisely "generate hypotheses, rank by novelty/plausibility, re-rank as new evidence arrives" — but it runs in-memory per-session over literature, NOT over a persistent paraconsistent substrate. Swanson's ABC literature-based discovery (fish-oil↔Raynaud's) is the foundational A-B-C relationship-generation pattern, and the key reconciliation point: ABC discovery REQUIRES the intermediate B-facts to already exist in the graph — denser typed extraction (volume) is the fuel that makes A→C discovery possible. Inconsistency-tolerant KG reasoning is a 2025 survey area (paraconsistent logics, QC-negation/QCDL avoid the principle of explosion) — academically mature, but as research, not shipped infrastructure.

**Players / datasets / taxonomies / standards:**

- **Wikidata (ranks: preferred/normal/deprecated)** — Shipped, billion-scale claim-status model. Statements carry qualifiers + a list of references (snaks) + a RANK. Deprecated = known-wrong but KEPT to preserve the record — a contradiction-tolerant three-valued status that ships in production. The closest existing thing to donto's claim-status idea. _[120M+ items, ~1.65B statements, ~73% with provenance/references (early 2025)]_ https://www.wikidata.org/wiki/Help:Ranking
- **Nanopublications (assertion + provenance + pub-info)** — Per-claim RDF envelope: 3 named graphs (the claim; where it came from; who minted it/when), immutable via Trusty-URI hashing, decentralized registry. The gold standard for evidence-anchored atomic claims. Lacks: native contradiction-as-legal-state, identity-as-hypothesis, bitemporality. _[>10M published nanopubs (mostly life-science); Nanopub Registry / Nanodash / Knowledge Pixels infra]_ https://nanopub.net/
- **Nanopub knowledge-provenance extension (2025)** — Adds a 4th named graph to track an assertion derived from a BODY of evidence with supporting AND conflicting pieces + truth values from a truth-discovery process. Direct evidence the field is retrofitting contradiction-tracking that donto has natively. _[unknown (2025 paper, Int'l J. on Digital Libraries)]_ https://link.springer.com/article/10.1007/s00799-025-00431-x
- **Bucur et al. 2021 — contradictory-claim detection via nanopubs (super-pattern)** — Formalizes high-level scientific claims in logic (the 'super-pattern') and uses nanopub provenance to DETECT + EXPLAIN contradictory claims across studies. Detects contradictions; does not preserve them as queryable permanent state. _[Evaluated on Cooperation Databank (CoDa) social-science repository]_ https://ieeexplore.ieee.org/document/9582393/
- **SemMedDB / SemRep / Semantic MEDLINE** — PubMed-scale repository of typed subject-predicate-object 'semantic predications' auto-extracted from abstracts, normalized to UMLS. THE proof that high-volume typed extraction over a corpus works — and the gap donto fills. DEPRECATED Dec 31 2024 (final v43), archived only. _[130M+ predications from 37M+ PubMed citations; deprecated 2024]_ https://lhncbc.nlm.nih.gov/temp/SemRep_SemMedDB_SKR/dbinfo.html
- **SciClaim (EMNLP 2021, SIFT)** — Fine-grained typed claim-graph schema: entity spans as nodes, typed relations as edges + qualifying attributes; captures causal/comparative/predictive/statistical/proportional associations. Proves one text yields dense typed structure (>2x prior label density) — validates the 'vast latent structure' thesis. _[12,738 labels across SBS/PubMed/CORD-19; transformer joint extraction baselines]_ https://github.com/siftech/SciClaim
- **ClaimsKG** — Knowledge graph of fact-checked claims with harmonized truth ratings, authors, dates, DBpedia-linked entities; supports queries like 'all false claims by X in 2017 mentioning Y'. Claims-with-status as a queryable KG, but status is curated truth-ratings, not generated/re-ranked hypotheses. _[28,383 claims since 1996 → 6.6M triples (2019)]_ http://users.ics.forth.gr/~fafalios/files/pubs/ISWC2019_ClaimsKG.pdf
- **FEVER / FEVEROUS** — Canonical evidence-anchored claim-verification dataset: claims labeled SUPPORTED/REFUTED/NOTENOUGHINFO, each tied to specific evidence sentences (often multi-sentence/multi-page). The closest analogue to scored, evidence-grounded relationship verification (e.g. resume-skill meets job-requirement). _[185,445 claims verified vs Wikipedia; FEVEROUS adds structured-table evidence]_ https://fever.ai/dataset/fever.html
- **Argument Interchange Format (AIF) + argument mining** — Standard ontology for typed argument structure. Already distinguishes REBUT (attacks a conclusion) vs UNDERCUT (attacks the inference) — exactly donto's typed argument edges. Models argument structure but has no extraction-at-scale, no bitemporality, no identity layer. _[AIFdb: 14,000+ argument maps, 160,000+ claims, 14 languages]_ http://www.arg-tech.org/wp-content/uploads/2011/09/aif-spec.pdf
- **Google DeepMind AI Co-Scientist (Nature, May 2026)** — The flagship re-ranking exemplar. Generation→Reflection(novelty+correctness review)→Ranking(pairwise Elo 'tournament of ideas')→Evolution(refine top, re-enter)→Proximity(dedup). Exactly 'generate, rank by novelty/plausibility, re-rank.' BUT in-memory per-session over literature — no persistent paraconsistent substrate, no bitemporal replay. _[Nature-validated May 2026 (AML drug repurposing, liver fibrosis); lab-validated, not clinic]_ https://deepmind.google/blog/co-scientist-a-multi-agent-ai-partner-to-accelerate-research/
- **Swanson ABC literature-based discovery** — Foundational relationship-generation pattern: if A-B in one literature and B-C in a disjoint literature, infer candidate A-C. The reconciliation anchor: ABC discovery REQUIRES the intermediate B-facts to already exist — high-volume typed extraction is the fuel for hypothesis generation. _[Origin: fish-oil↔Raynaud's; recovered/extended via semantic predications]_ https://pubmed.ncbi.nlm.nih.gov/23026233/
- **Inconsistency-tolerant / paraconsistent KG reasoning (2025 survey)** — Academic basis for donto's paraconsistency: paraconsistent logics admit contradictions without the principle of explosion; QC-negation/QCDL reason over inconsistent KGs; detect/repair/tolerate framing. Mature as theory — but research, not shipped infrastructure. _[Survey area (arXiv 2502.19023, 2025)]_ https://arxiv.org/html/2502.19023v1

**Volume verdict:** Volume HELPS unambiguously at the typed-extraction/decomposition layer and the prior art proves it: SemMedDB shipped 130M predications from 37M abstracts and SciClaim showed one text yields >2x prior typed-label density — the founder is correct that the latent typed structure of even one text is vast, and denser graphs are a precondition, not a luxury. Swanson's ABC is the clinching argument: you cannot discover A→C unless the intermediate B-facts physically exist in the graph, so under-extraction directly caps discovery. For jsonresume→jobs, volume is genuine network effect — more resumes+jobs → richer skill-adjacency and career-path priors → better hidden-candidate discovery. So at the recall/fuel layer, more typed claims = strictly more discoverable relationships. Volume becomes NOISE at the relationship-HYPOTHESIS layer, where combinatorics explode (N typed claims → up to O(N^2) candidate relationships) and false-discovery-rate dominates — this is exactly why every serious system in this space pairs generation with a VERIFIER/RANKER: the AI Co-Scientist's Elo tournament + reflection reviewer, FEVER's evidence-sentence requirement, Bucur's logical super-pattern for contradiction, time-aware evidence ranking for fact-checking. The reconciliation the report must state plainly: volume FEEDS the verifier, it does not compete with it. Donto's job is to make the extraction layer as high-volume as possible (the fuel) AND to put a ranker (novelty/plausibility/evidentiary-support/contradiction-value/verification-cost) and ideally a Lean-certified typed gate between candidate generation and accepted relationships. A lens or extraction pass that raises recall of typed B-facts is always valuable; a step that emits unranked, unverified A→C relationships at volume is the decorative/dangerous case the prior art uniformly guards against.

**Design implications:**
- Cite, don't claim originality: the report must explicitly position donto against Wikidata-ranks (shipped claim status), nanopublications (per-claim provenance envelope), AIF (typed rebut/undercut), SemMedDB (typed extraction at scale), and the AI Co-Scientist (Elo re-ranking). Donto's novelty is the COMBINATION held as permanent legal state — paraconsistency + identity-as-hypothesis + bitemporality + Lean shapes — not any one piece.
- Sharpen the one-line wedge: existing systems either DETECT contradictions then resolve/collapse them (Bucur super-pattern, KG repair) or RANK hypotheses in-memory per session (AI Co-Scientist) — donto is the only one that KEEPS mutually-incompatible claims as queryable first-class state forever AND re-ranks them as evidence arrives. Lead with that.
- Wikidata's 'deprecated but kept' rank is the proof-of-concept to cite for why preserving wrong/contested claims is operationally necessary (it stops the add/remove churn war) — frame donto's contradiction frontier as the principled generalization of a pattern Wikidata already needed at billion-scale.
- Adopt the nanopublication 3-graph (assertion/provenance/pub-info) vocabulary explicitly as the per-claim envelope so donto interoperates rather than reinvents; note donto's evidence-byte anchoring is a stricter form of nanopub provenance. The 2025 4th-graph 'knowledge-provenance' extension is direct evidence the field is converging toward donto's native model — cite it as validation.
- SemMedDB's Dec-2024 deprecation is a concrete, dated market gap: a 130M-predication typed-claim resource just went unmaintained. Position donto's high-volume opencode extraction as the live successor to that decomposition layer (and a domain-neutral one, not biomedical-only).
- Map the lifecycle to named prior art so each stage is credible: extract→SemMedDB/SciClaim; hold-incompatible→Wikidata-ranks/paraconsistent-KG; typed argument edges→AIF rebut/undercut; verify-against-evidence→FEVER; generate+rank+re-rank→AI Co-Scientist Elo tournament; A→C discovery→Swanson ABC. Donto = the persistent substrate that unifies all six stages that today live in six separate systems.
- For the jsonresume→jobs flagship: frame matching as FEVER-style evidence-anchored claim verification (skill-claim from a resume bullet SUPPORTS/REFUTES a job requirement) plus Swanson-ABC adjacency discovery (resume→skill→adjacent-skill→hidden-job). This grounds the matching thesis in two named research lineages.
- Lean-4 shape certification has NO direct analogue in any cited system — it is donto's most genuinely differentiated component (others use SHACL/ShEx at best). Give it weight as the verifier/typing guarantee, but don't oversell: today it certifies shapes, it is not yet the FDR-controlling relationship verifier the report's thesis needs.

**Honest caveats:**
- No single cited system combines all four donto pillars, but that cuts both ways: it is genuine whitespace AND a sign that holding contradictions forever may be operationally costly enough that mature projects (Wikidata) chose a lighter 3-value rank instead. The report should not assume 'nobody did it' equals 'nobody should' — argue the use case, don't assume the gap is pure opportunity.
- Donto's contradiction machinery is barely exercised today (2,426 argument edges, 77 identity edges, ~4.7% evidence coverage on 39.5M statements). Cited systems have orders more traction in their niches: Wikidata 1.65B statements with 73% provenance, SemMedDB 130M predications, FEVER 185K verified claims, AIFdb 160K claims. The prior art is more proven than donto's own contradiction layer — be honest that donto's differentiators are largely latent/aspirational right now.
- The AI Co-Scientist already ships the exact generate→rank→re-rank loop the reframe centers on, was Nature-validated (May 2026), and is backed by DeepMind. Donto's claim to differentiation is the PERSISTENT PARACONSISTENT SUBSTRATE underneath (it re-ranks in-memory per session, donto would re-rank over durable bitemporal state) — that is a real distinction but a narrower one than 'nobody does hypothesis ranking.' Frame precisely.
- Lean-4 shape certification has no analogue and is differentiated, but today it certifies shapes, not the false-discovery-controlling relationship verifier the volume-vs-noise thesis actually requires. Do not let 'Lean overlay' stand in for a verifier that does not yet exist.
- SemMedDB's deprecation is a real gap but also a caution: an NLM-backed, 130M-predication, decade-running resource was shut down — typed-extraction-at-scale has a sustainability/maintenance problem, not just a technical one. The opportunity is real but the graveyard is instructive.
- Nanopublication network scale (>10M, ~2021 figure) is the best public number found; live registry totals were not retrievable, so treat it as a floor, not current. Wikidata 'deprecated' semantics are 'known-erroneous-but-kept,' which is contradiction-TOLERANT but not contradiction-PRESERVING-as-equal-legal-state — don't overstate the equivalence to donto's frontier.


## cross-domain-application-survey

There is a real, large, mostly-unsolved market for a substrate that holds source-grounded typed claims, keeps contradictory claims as legal state, treats identity as query-time hypothesis, and generates+ranks relationship hypotheses. The clearest existing proof is biomedical literature-based discovery (LBD): Swanson's ABC model is exactly "generate A-C relationship hypotheses from a dense graph of A-B and B-C claims," and the canonical fuel is SemMedDB — ~98M semantic predications (subject-predicate-object) auto-extracted by SemRep from ~29M PubMed abstracts. SemMedDB demonstrates donto's volume thesis (you need the intermediate B-facts to exist before any A-C hypothesis can be found) AND its core gap (SemMedDB stores predications but has weak first-class machinery for contradiction, provenance-as-byte, and identity-as-hypothesis; conflicting predications are noise, not preserved legal state). Every domain surveyed splits the same way: volume is unambiguously good at the typed-extraction/recall layer (more claims = denser graph = more candidate relationships) and dangerous at the hypothesis layer (combinatorial blowup of candidate A-C links demands a ranker/verifier or the false-discovery rate buries the signal).

The strongest "this genuinely needs all four properties" domains are: (1) LBD/drug-repurposing (SemMedDB, DRKG, Hetionet) — contradictions between studies are scientifically real, identity-as-hypothesis matters because gene/drug synonymy is messy, evidence-anchoring to the PMID/sentence is the whole game; (6) financial-crime/AML beneficial-ownership — OpenSanctions (2M+ entities, 337 sources), OCCRP Aleph + FollowTheMoney schema, Open Ownership; entity resolution IS identity-as-hypothesis (is this "John Smith" the sanctioned one?), false positives are the #1 industry pain, every match must be explainable to a regulator, and ownership changes over time (bitemporal); (4) OSINT/investigative journalism — leaked datasets contradict each other and official records, confidence tiers (confirmed/probable/possible) are already standard practice, "why do we believe X" must survive legal/editorial review; (2) legal case-law — citation graphs (Caselaw Access Project, 6.7M cases) with TYPED edges (cites/distinguishes/overrules/follows) are literally a paraconsistent argument graph where an overruled precedent stays in the record as superseded-not-deleted (bitemporal valid-time), and "what was good law on date T" is the killer bitemporal query.

The flagship jsonresume→jobs case is the cleanest non-genealogy proof because matching is literally scored relationship-discovery over typed claims: resumes and jobs both decompose into skill/role/seniority/domain/education/trajectory claims (ESCO ~3,000 occupations + ~13,900 skills; O*NET ~900 occupations as the public typed vocabularies); the match is an evidence-anchored edge ("you meet this requirement BECAUSE this skill-claim came from this resume bullet"); it is contradiction-aware (resume claims senior, tenure dates say junior; two roles overlap in time); it is bitemporal (skills decay, the market re-prices them, so old match-hypotheses must RE-RANK when a new job posting or a new resume arrives); identity-as-hypothesis handles duplicate/variant profiles and inferred-vs-stated skills; and volume genuinely produces network effects (more resumes+jobs → richer skill-adjacency and career-path graphs → hidden-candidate and "adjacent role you didn't know to apply for" discovery, which is exactly Swanson ABC applied to careers). It is monetizable today and proves donto is domain-neutral.

The honest caveats: existing players already do a lot. SemMedDB, Hetionet/DRKG, OpenSanctions/Aleph, Mem0 (agent memory with temporal validity windows), ESCO/CareerBERT — none of them combine all four properties, but each does its slice well, so donto's wedge is the COMBINATION (paraconsistent + evidence-byte + identity-as-hypothesis + bitemporal + re-ranking hypothesis lifecycle), not any single property. The places where contradiction-preservation is decorative rather than load-bearing are domains where one authoritative source dominates and disagreement is rare/uninteresting (much of routine supply-chain provenance, single-EHR clinical data without cross-source conflict). Those make weaker example projects.

**Players / datasets / taxonomies / standards:**

- **SemMedDB / SemRep (NLM)** — Canonical LBD fuel: subject-predicate-object semantic predications auto-extracted from PubMed. THE proof that volume-of-typed-claims enables Swanson ABC discovery. Weak on contradiction-as-legal-state, byte-level evidence, identity-as-hypothesis — donto's exact gap to fill. _[~98M predications from ~29M PubMed abstracts (2019 release)]_ https://pmc.ncbi.nlm.nih.gov/articles/PMC3509487/
- **Hetionet / DRKG (drug-repurposing KGs)** — Integrative biomedical knowledge graphs (genes, compounds, diseases, pathways) used for repurposing via metapath/embedding link prediction. Show the relationship-hypothesis layer; do NOT preserve inter-study contradictions. _[Hetionet ~47k nodes / ~2.25M edges; DRKG ~97k entities / ~5.8M triples]_ https://www.nature.com/articles/s41598-019-42806-6
- **Swanson ABC model (Fish oil–Raynaud, Migraine–Magnesium)** — Foundational LBD method: A-B and B-C claims imply candidate A-C. Directly validates the founder's volume thesis — intermediate B-facts MUST exist for discovery; denser graph = more candidates. Also shows the verifier need (combinatorial A-C explosion). _[unknown (foundational; 1986–present body of work)]_ https://arxiv.org/pdf/2506.12385
- **Caselaw Access Project (Harvard LIL)** — Full US case corpus + citation graph with TYPED treatment edges (cites/follows/distinguishes/overrules). A real paraconsistent argument graph: overruled precedent stays as superseded-not-deleted; 'what was good law on date T' is the killer bitemporal query. _[6.7M cases / 40M+ pages; full citation graph released]_ https://lil.law.harvard.edu/blog/2020/04/22/caselaw-access-project-citation-graph/
- **LAMUS legal argument-mining corpus** — Sentence-level legal argument mining (facts/issues/rules/analysis/conclusion) from US caselaw via LLM annotation. Proves typed-claim extraction from legal prose is tractable at scale — the lens layer for the legal domain. _[unknown (corpus built from SCOTUS + Texas appellate opinions)]_ https://arxiv.org/pdf/2603.08286
- **SciFact / SciFact-Open / Check-Covid** — Scientific claim-verification benchmarks (claim + evidence + SUPPORTS/REFUTES/NEI label). The verifier layer donto needs: ranks evidence and detects contradiction. Directly maps to supports/rebuts/undercuts typed argument edges. _[unknown (SciFact ~1.4k claims, widely used benchmark)]_ https://www.researchgate.net/publication/372933925_SciFact-Open_Towards_open-domain_scientific_claim_verification
- **Retraction Watch Database** — Tracks retractions/corrections — i.e. claims whose truth-status changed over time. Pure bitemporal use case: 'we believed paper X at tx-time T1; retracted at T2' and downstream re-ranking of every hypothesis that cited it. _[60k+ retractions tracked; retractions spiked 2020-2024 (paper mills)]_ https://retractionwatch.com/
- **OCCRP Aleph (FollowTheMoney schema)** — Investigative KG tying people/companies/payments/leaks/public records; cross-references entity lists against all datasets. Built on the open FollowTheMoney (FtM) ontology — a ready-made typed-claim schema for the finance/OSINT domain. _[unknown (used by global investigative-journalism network; large leak corpora e.g. Panama/Pandora Papers)]_ https://aleph.occrp.org/
- **OpenSanctions** — Open DB of sanctions/PEPs/criminal-interest entities. Entity resolution IS identity-as-hypothesis; false positives are the #1 AML pain; new 'logic-v2' matcher is explicitly explainable. FtM-schema, integrates Open Ownership beneficial-ownership data. _[2M+ entities from 337 sources (default bundles 325)]_ https://www.opensanctions.org/datasets/default/
- **Open Ownership + GLEIF (beneficial ownership)** — Beneficial-ownership data converted to FtM and joined with sanctions/LEI data (OpenScreening). Ownership %/control changes over time = bitemporal; 'who really controls X' = identity-as-hypothesis through shell layers. _[unknown (GLEIF LEI registry millions of legal entities)]_ https://www.gleif.org/en/newsroom/blog/how-opensanctions-open-ownership-and-gleif-are-collaborating-to-enhance-sanctions-and-anti-money-laundering-screening
- **SAO / SAOx patent mining + PatentsView** — Subject-Action-Object extraction from patents encodes function-behavior-structure as typed claims; cross-domain analogy mining = find a function in domain A solved by structure in domain B (Swanson ABC for invention). PatentsView is the AI-cleaned USPTO corpus. _[USPTO ~8M+ granted patents; SAO extraction F1 ~89.6% (Gemini 1.5 Pro)]_ https://www.nature.com/articles/s41598-026-49727-1
- **EU Digital Product Passport (ESPR)** — Mandatory per-product provenance/material/impact claims across supply chains from 2027 (batteries/textiles/electronics first). Source-level traceability = evidence-anchoring; conflicting supplier claims = contradiction; supplier swaps over time = bitemporal. _[DPP market USD 185.9M (2024) → ~USD 1.78B (2030), CAGR ~45.7%]_ https://data.europa.eu/en/news-events/news/eus-digital-product-passport-advancing-transparency-and-sustainability
- **Getty Provenance Index + Central Registry of Looted Cultural Property** — Provenance chains for artworks; contested attribution and competing ownership claims are the core problem. Identity-as-hypothesis (is this the same object?), contradiction (rival claims), evidence-anchoring (archival inventory byte). _[Getty index 2.3M+ records; Central Registry 25k+ objects; WJRO est. 100k+ covered objects in US museums, ~10% researched]_ https://art.claimscon.org/resources/overview-of-worldwide-looted-art-and-provenance-research-databases/
- **OMOP CDM (OHDSI) + clinical-guideline argumentation** — Standardized observational patient data; temporal events. Guideline conflicts under multimorbidity are real and resolvable via argumentation. Bitemporal (diagnosis revised), contradiction (conflicting recs), evidence (which trial supports which rec). _[OHDSI network covers ~800M+ patient records globally (OMOP-mapped)]_ https://ohdsi.github.io/CommonDataModel/background.html
- **Mem0 (agent memory layer)** — Closest direct competitor for personal-AI-memory: hybrid vector+graph+KV store, temporal validity windows, supersedes contradictory facts. KEY CONTRAST: Mem0 REPLACES outdated/contradictory facts; donto's thesis is to PRESERVE them as legal state and re-rank. _[claims ~26% better than OpenAI built-in memory on its benchmark; open-source, widely adopted]_ https://mem0.ai/blog/state-of-ai-agent-memory-2026
- **ESCO / O*NET (skills+occupation taxonomies)** — Public typed vocabularies for the jsonresume→jobs flagship: occupations, skills, and skill-relations (requires/complements/evolves-into/co-occurs). The schema backbone for decomposing resumes+jobs into typed claims. _[ESCO ~3,000 occupations + ~13,900 skills/competences; O*NET ~900 occupations]_ https://esco.ec.europa.eu/en/classification
- **CareerBERT / TalentCLEF 2025 / ESCO matching** — State of the art in resume↔job matching: shared-embedding and KG-GNN methods over ESCO/O*NET, with inferred (not just stated) skills. Proves the matching-as-scored-relationship-discovery layer; lacks contradiction/bitemporal/evidence-byte lifecycle. _[unknown (CareerBERT reports SOTA vs prior embedding baselines)]_ https://arxiv.org/pdf/2503.02056

**Volume verdict:** Volume HELPS, decisively, at the typed-extraction/decomposition layer in every domain, and the founder is right that one text's latent typed structure is vast: SemMedDB (~98M predications from ~29M abstracts) is the existence proof that you cannot do Swanson-ABC discovery until the intermediate B-claims have been extracted at scale — denser graphs strictly enable more discoverable A-C relationships. Same logic in jobs (more skill/role claims per resume → richer skill-adjacency and career-path graphs → hidden-candidate and adjacent-role discovery = network effects), patents (more SAO function-claims → more cross-domain analogies), and AML (more ownership/payment claims → more reachable shell chains). Volume becomes NOISE at the relationship-HYPOTHESIS layer, where it is combinatorial: N typed claims yield ~O(N^2) candidate A-C edges, and without a ranker/verifier the false-discovery rate buries the few real findings — this is precisely why LBD work added contextual/semantic constraints beyond raw co-occurrence, why AML lives or dies on false-positive suppression (OpenSanctions' explainable logic-v2 matcher), and why scientific claim-verification (SciFact) exists as a separate stage. Reconciliation: volume FEEDS the verifier, it does not oppose it. Decompose maximally (high recall, cheap), then rank/verify ruthlessly (high precision, evidence-anchored). The danger isn't extracting too many claims; it's PROMOTING unranked claim-combinations to asserted relationships. donto's bitemporal re-ranking turns volume into a strength even at the hypothesis layer, because more evidence arriving over time monotonically improves the ranking of a fixed hypothesis set rather than multiplying it.

**Design implications:**
- Ship the claim lifecycle as the product, not the lenses. Every domain example must demonstrate: extract typed claims → hold contradictions → generate candidate relationships → attach evidence+counter-evidence → rank → RE-RANK on new evidence → explain WHY. A lens that doesn't change the graph (new candidate edge / evidence link / contradiction / confidence update / verification path) is decorative and should be cut.
- Adopt or mirror existing open typed schemas per domain instead of inventing — it makes each example a same-day proof and an importable dataset: FollowTheMoney (finance/OSINT), ESCO/O*NET (jobs), SemMedDB predication types + UMLS (biomedical), legal treatment-edge vocabulary (cites/distinguishes/overrules/follows) for caselaw, OMOP (clinical). donto's value is the lifecycle on top, not the vocabulary.
- Make the verifier/ranker a first-class, swappable component because volume forces it. In LBD, AML, and jobs the candidate-relationship set is combinatorial; the founder's volume thesis is correct at extraction and only survives at the hypothesis layer if a ranker (novelty × plausibility × evidentiary-support × contradiction-value × verification-cost) caps the false-discovery rate. Demonstrate the ranker explicitly on at least the LBD and jobs examples.
- Lead the 10 examples with the four domains where all four properties are load-bearing AND a public dataset exists for a same-week demo: (a) drug-repurposing on SemMedDB/Hetionet, (b) AML beneficial-ownership on OpenSanctions+FtM, (c) legal good-law-as-of-date on Caselaw Access Project, (d) jsonresume→jobs on ESCO/O*NET. These are the credibility anchors; the other six (patents-analogy, scientific-integrity/retraction-aware, OSINT, clinical-guideline-conflict, cultural-provenance, personal-memory) round out domain-neutrality.
- Frame the personal-AI-memory example explicitly AGAINST Mem0: Mem0 supersedes/replaces contradictory facts; donto's differentiator is preserving them as legal bitemporal state and re-ranking beliefs. Without that contrast the example looks like a me-too memory layer.
- For every example state a falsifiable success test up front (e.g. LBD: rediscover a held-out, later-confirmed drug-disease link from pre-discovery literature; jobs: beat ESCO-embedding baseline on a held-out hire/interview outcome with an evidence-anchored, contradiction-flagged explanation; AML: surface a true sanctioned-beneficial-owner through ≥2 shell layers with a regulator-readable why-trail).
- Show the re-ranking loop concretely with bitemporal triggers: a retraction (Retraction Watch), a new sanctions listing, a new job posting, or a new resume edit must each demonstrably re-rank previously generated hypotheses. This is the single behavior no surveyed competitor does end-to-end and should be the demo money-shot.

**Honest caveats:**
- Each domain already has competent incumbents doing a slice well (SemMedDB/Hetionet, OpenSanctions/Aleph, Mem0, CareerBERT, ESCO). donto's wedge is the COMBINATION of all four properties plus the re-ranking lifecycle, not any single property — the survey must not overclaim novelty on properties taken individually.
- Contradiction-preservation is load-bearing in only ~6 of the 10 domains. In routine supply-chain provenance and single-source clinical data, one authoritative source usually dominates and disagreement is rare, so paraconsistency is decorative there; those make the weakest examples and should be framed as 'multi-source-conflict' variants or demoted.
- Identity-as-hypothesis is the hardest property to make demonstrably better than mature entity-resolution stacks (OpenSanctions logic-v2, dedupe pipelines). Claiming it as a differentiator requires showing query-time, non-destructive, evidence-justified merges beating a strong ER baseline — not just asserting the design.
- The biomedical and clinical examples are credibility-rich but regulatory/validation-heavy; an LBD hypothesis is only valuable if it survives wet-lab/clinical scrutiny, which donto cannot provide. Position these as hypothesis-generation-and-prioritization, never as evidence of efficacy.
- jsonresume→jobs has a cold-start/coverage reality: the JSON Resume registry is far smaller than commercial resume corpora (LinkedIn-scale), so the network-effects volume argument is true in principle but unproven at the founder's current data scale; the falsifiable test should be run on whatever real registry+jobs volume exists, not a hypothetical millions.
- Several cited extraction F1/benchmark numbers (SAO ~89.6%, claim-verification scores) are domain-and-dataset specific and will not transfer to donto's open-domain GLM extraction without its own evaluation; treat them as feasibility signals, not promised performance.
- Public 'good law as of date T' and looted-art contested-claim demos touch legal liability — outputs are interpretive (matching the user-memory stance that no source is ground truth), so these examples should ship as decision-support with explicit source-attribution, never as authoritative rulings.

