Claim-Substrate Report — Research Appendix
(2026-06-02)
Claim-Substrate
Report — Research Appendix (raw findings)
Companion to the iteration-3 report. Structured output of the
5-area study (2026-06-02).
resume-job-skills-intelligence
The skills-intelligence market runs on a handful of large public
skill taxonomies that all model skills as graphs of related nodes,
exactly the structure a claim substrate would emit. Sizes: ESCO (EU)
~14,575 skills/competences linked to 3,039 occupations across 28
languages; Lightcast Open Skills ~32,000-34,000 skills in 31 categories,
refreshed every 2 weeks from 1B+ job postings; LinkedIn Skills Graph
~39,000 skills, 374,000 aliases, 200,000+ explicit skill-to-skill links
("knowledge lineages"), spanning 875M people and 59M companies; O*NET
(US gov, free) 1,016 occupations with ~35 skills + 277 descriptors per
the Content Model; SkillsFuture Singapore ~11,000 skill competencies
across 34 frameworks/38 sectors. Crucially these are skill-to-skill
relationship graphs (prerequisite, alias, adjacency), not flat lists,
and the newest research (Skill-LLM, SkiLLMo, "LLMs as zero-shot ESCO
matchers", ESCOX) uses LLMs to EXTRACT and NORMALIZE skill claims from
free text against ESCO with ~79-91% precision. That extraction step IS
donto's typed-claim ingestion in another vocabulary.
The commercial layer is large and consolidating around "talent
intelligence" / knowledge graphs. Eightfold AI: ~$96.6M ARR 2024 (up
from $58.8M 2023), ~$2.1B valuation, $410M raised, pitches an AI "Talent
Intelligence Platform" with a deep-learning talent graph inferring
latent/adjacent skills. Beamery: ~$112.8M ARR 2024, hit unicorn $1B in
Dec 2022 on a "Talent Graph", but laid off 12% (2023) then ~25% (2024)
chasing profitability. SeekOut: ~$25.2M ARR 2024, was $1.2B (2022) now
~$435M, sources from 700M+ profiles, cut 30% staff in 2024. Draup: $20M
Series A, talent+sales intelligence on 80,000+ sources for
workforce/reskilling planning. hiring.cafe: scrappy aggregator scraping
real employer ATS feeds (Greenhouse/Lever/Workday) with GPT-based
per-job summaries and natural-language search, free to seekers. The
pattern: the unicorns are CASH-CONSTRAINED and their graphs are
proprietary/opaque deep-learning embeddings; explainability and
evidence-grounding are exactly where they are weak. Underlying market:
talent-acquisition software ~$22-26B in 2025, AI-recruiting segment
~$3.2B growing ~12% CAGR; skills-based hiring claimed by 85% of
employers in 2025 (up from 57% in 2022), though only a tiny fraction of
hires are actually affected (the rhetoric/reality gap is itself a
wedge).
JSON Resume (the founder's asset): an MIT-licensed open standard
(schema v1.0.0, the resume-schema repo ~2.4k stars, jsonresume.org
monorepo) with a clean typed structure already (basics, work, education,
skills{name,level,keywords}, projects, awards, languages, etc.) plus a
draft job-schema.json. Adoption is "10,000+ developers / tens of
thousands of users", 400+ themes; the registry renders any GitHub Gist
named resume.json at registry.jsonresume.org/ and already has
nascent AI "suggestions". So the founder controls BOTH a typed-resume
corpus AND a draft typed-job schema and the registry distribution
channel, the two sides of the matching graph in a format that is already
half-decomposed into claims.
Academic person-job fit has moved from CNN/RNN co-attention models
(PJFCANN) to graph neural nets (Fine-Grained Semantics-Enhanced GNN
2025, graph adaptive fusion 2025) and to LLM re-ranking. The
state-of-the-art honest admission is the donto thesis verbatim: ConFit
v2/v3 (ACL 2025/2026) note embedding rankers "lack controllability and
explainability as the ranking process happens entirely in latent
embedding space" and bolt on LLM re-ranking to recover interpretability;
bias is real (one benchmark 72% male). That is the precise gap a claim
substrate fills: evidence-anchored, contradiction-aware, re-rankable
matching with a human-readable WHY.
Players / datasets / taxonomies / standards:
ESCO (European Skills, Competences, Qualifications and
Occupations) — EU public skills/occupations ontology; ~14,575
skills/competences linked to 3,039 occupations, 28 languages;
skill-occupation matrix is an explicit relationship graph. The de facto
target vocabulary for LLM skill-extraction research. A donto skill claim
can be canonicalized to ESCO IRIs. [14,575 skills, 3,039
occupations, 28 languages]https://esco.ec.europa.eu/en/classification
Lightcast Open Skills (ex-Emsi Burning Glass) —
Largest commercial-but-open skills library; 32,000-34,000 skills in 31
categories, updated every 2 weeks from 1B+ job postings / 40,000
sources. Specialized vs common vs software skills. Free taxonomy
download; paid data feeds. Benchmark for skill coverage/freshness.
[~33,000 skills; 1B+ postings; refreshed biweekly]https://lightcast.io/open-skills
Eightfold AI — Talent intelligence platform built
on a deep-learning 'talent graph' that infers latent/adjacent skills and
potential. Closest commercial analog to the donto vision but with opaque
embedding graph (low explainability). [~$96.6M ARR 2024; ~$2.1B val;
$410M raised]https://eightfold.ai/
Beamery — Talent CRM + 'Talent Graph' for
skills-based talent management. Unicorn 2022 then heavy layoffs (12%
2023, ~25% 2024) chasing profitability; shows the category is real but
capital-stressed. _[~$112.8M ARR 2024; $1B val (2022); $223M raised]_ https://beamery.com/
SeekOut — AI talent sourcing across 700M+ profiles
from 30+ sources; agentic recruiting. Valuation fell
$1.2B(2022)->$435M(2024), 30% layoffs 2024. Evidence that
profile-aggregation alone (no claim/evidence layer) is a fragile moat.
[~$25.2M ARR 2024; 700M+ profiles; val $1.2B->$435M]https://www.seekout.com/
Draup — Talent + sales intelligence;
workforce-planning/reskilling from 80,000+ sources, career-pathing and
skill-gap analytics. Relevant for the skill-adjacency / career-path
network-effect use cases. [$20M Series A; 80,000+ sources]https://draup.com/talent/
hiring.cafe — Free job aggregator scraping real
employer ATS (Greenhouse/Lever/Workday/Workable) with GPT per-job
summaries + natural-language search; filters ghost jobs. Cheap, fast
distribution model; a realistic ingestion source for the jobs side of a
JSON Resume matcher. [aggregates from 14,000+ companies; free to
seekers]https://hiring.cafe/
JSON Resume (jsonresume.org) — Founder-owned MIT
open resume standard (schema v1.0.0) + Gist-backed registry + draft
job-schema.json. Already typed
(basics/work/education/skills{name,level,keywords}/projects). Controls
both sides of the graph and the distribution channel. [resume-schema
~2.4k stars; 10,000+ devs; 400+ themes]https://jsonresume.org/schema
ConFit v2 / v3 (resume-job matching, ACL 2025/2026)
— SOTA embedding matcher + LLM re-ranking. Explicitly states embedding
rankers 'lack controllability and explainability... entirely in latent
embedding space' and adds LLM re-ranking to recover the WHY. Directly
validates the claim-substrate explainability thesis. +13.8% recall /
+17.5% nDCG over BM25/OpenAI embeddings. [+13.8% recall, +17.5% nDCG
vs baselines]https://arxiv.org/html/2502.12361v1
LLM skill extractors (Skill-LLM, SkiLLMo, ESCOX, 'Zero-shot
ESCO matchers') — Research line that extracts and normalizes
typed skill claims from free-text resumes/jobs to ESCO IRIs at ~79-91%
precision. This IS donto's typed-claim ingestion stage in skills
vocabulary; reusable extraction prior art. [91% extract / 80%
standardize precision (SkiLLMo)]https://arxiv.org/html/2410.12052v1
Person-Job Fit GNN models (PJFCANN; Fine-Grained
Semantics-Enhanced GNN 2025) — Graph-based matching using
co-occurrence/skill graphs + attention. Shows the field already believes
the graph is the right structure; donto provides the persistent,
evidence-anchored, contradiction-aware version they lack. [unknown
(academic)]https://www.mdpi.com/1099-4300/27/7/703
Skills-based hiring market context — 85% of
employers claim skills-based hiring in 2025 (57% in 2022); removing
degree filters expands pools ~19x; skills 5x more predictive than
education (McKinsey); but rhetoric>>reality gap.
Talent-acquisition software ~$22-26B 2025, AI-recruiting ~$3.2B at ~12%
CAGR. [85% adoption claim; TAM ~$22-26B; AI seg ~$3.2B]https://blog.theinterviewguys.com/the-state-of-skills-based-hiring/
Volume verdict: High-volume typed extraction is
unambiguously HELPFUL on this domain, more clearly than almost any
other, but only when split across the two layers. WHERE IT HELPS
(extraction/decomposition): a single resume bullet ('Led migration of
monolith to microservices on AWS for a 5M-user fintech') legitimately
contains dozens of typed claims, explicit (AWS, microservices,
leadership) AND implicit/inferred (distributed systems, observability,
PCI/regulatory exposure, team-scale, domain=fintech, seniority signal).
The founder is right: that latent typed structure is vast, and
extracting it densely is pure recall gain, the raw fuel. Denser
per-document claim graphs are PRECONDITIONS for discovery: you cannot
find a hidden-candidate-to-job (A-C) link without the intermediate
skill/role/trajectory (B) nodes existing (Swanson ABC). Skill-adjacency
and career-path graphs only become rich at volume, the network effect is
real, and the open taxonomies (ESCO/Lightcast/LinkedIn) prove the
relationship structure exists and is large (200K+ LinkedIn skill links).
So volume at the typed-extraction layer is the asset. WHERE IT BECOMES
NOISE (relationship-hypothesis layer): the cross product resumes x jobs
x skill-combinations is combinatorially huge, and naively materializing
every candidate match edge produces overwhelming false-discovery (most
skill overlaps are spurious, most implicit-skill inferences are weak,
most 'matches' are degree-of-overlap noise). Here volume MUST flow
through a verifier/ranker that scores by evidentiary support, novelty,
contradiction-value and verification cost, re-ranks on new evidence, and
gates low-evidence match hypotheses as hypothesis_only. The
reconciliation: VOLUME FEEDS THE VERIFIER. More extracted claims =>
denser graph => more candidate relationships => but the verifier
decides which candidate edges become surfaced matches. Volume and the
verifier are complementary, not opposed; the failure mode is skipping
the verifier and treating every extracted relationship as a finding. A
useful concrete guardrail: every emitted match/implicit-skill claim must
be individually evidence-anchored and falsifiable, claims with no
backing bullet are hypothesis_only and never shown as confident
matches.
Design implications:
Make ESCO + O*NET (both free/open) the canonical skill/occupation
IRI vocabulary in donto, and treat each extracted resume/job skill as a
typed claim that resolves (non-destructively, identity-as-hypothesis) to
one-or-more taxonomy nodes. Keep raw surface form + normalized IRI +
confidence as separate evidence-anchored claims so you never destroy the
original bullet text. This directly reuses the Skill-LLM/SkiLLMo
extraction pattern donto already has via opencode.
Decompose BOTH resume and job into the SAME typed claim schema
(skill-claim, role-claim, seniority-claim, domain-claim, tenure-claim,
education-claim, trajectory-claim, and crucially
IMPLICIT/inferred-skill-claim). Matching then becomes scored
relationship discovery between two claim sets, not cosine similarity
between two opaque vectors. Each match edge carries its provenance:
'requirement R met by skill-claim S extracted from bullet B of resume
gist G'. This is the 'show the human WHY' the whole thesis hinges on,
and it is exactly what ConFit/embedding rankers admit they cannot
do.
Use the contradiction frontier for real recruiting signals: resume
asserts seniority:senior but tenure-claims sum to 18 months (rebuts);
resume claims skill X but no evidence-bearing bullet supports it
(undercut, hypothesis_only); two gists/profiles for the same person
disagree (identity-as-hypothesis + typed argument edges). Surface these
as a confidence/risk panel rather than silently averaging them away.
This is a feature no embedding matcher can offer and it maps onto
donto's existing argument/identity machinery (which is currently barely
exercised).
Lean on bitemporality for two concrete product features: (a) skill
decay / freshness (a skill-claim valid-from 2018 with no recent
reinforcement is weighted down; market demand for that skill is itself a
time-series), and (b) 'replay why we matched' for
audit/fairness/compliance (EU AI Act-style explainability for hiring).
Bitemporal replay turns into a defensible compliance and trust story the
incumbents (opaque graphs) cannot easily retrofit.
VOLUME belongs at extraction: run many extraction lenses over each
resume/job and EMIT TYPED CLAIMS, not prose. A 'skill lens' emits
normalized skill-claims; an 'implicit-skill lens' infers skills from
role+company+project (built a payments API => infers
idempotency/PCI/distributed-systems claims as hypothesis_only with
provenance); a 'trajectory lens' emits career-path edges; a 'seniority
lens' emits time-bounded seniority assertions. Each candidate claim must
be evidence-linked and individually verifiable/falsifiable. This is
where '1M facts' is correct and fuels denser graphs (Swanson ABC: you
need the intermediate skill/role B-nodes to exist to discover
candidate-to-job A-C links).
Put the verifier at the relationship-hypothesis layer to control
false-discovery: rank candidate match edges by novelty (hidden candidate
/ non-obvious skill bridge), plausibility, evidentiary support (count +
quality of backing claims), contradiction-value, and verification cost.
Re-rank ALL stored match hypotheses when new evidence arrives (candidate
adds a project, job edits a requirement, market shifts). The verifier is
what stops the combinatorial explosion of resume x job x skill from
becoming noise; volume feeds it, it does not fight volume.
Exploit network effects deliberately and make them the moat: as
resumes+jobs accumulate, build a SKILL-ADJACENCY graph (skills that
co-occur in evidence become candidate prerequisite/substitute edges),
CAREER-PATH graph (observed role transitions become trajectory
hypotheses with frequency evidence), and HIDDEN-CANDIDATE discovery
(people whose implicit-skill-claims match a job even though their
surface resume keywords do not). These are emergent typed relationships
the incumbents charge enterprise prices for; donto generates them as a
byproduct of the claim lifecycle.
Ship it on assets the founder already owns to get a real corpus
cheaply: the JSON Resume registry is the resume side (Gist-backed,
already typed, version-controlled = free bitemporal valid-time); harvest
the jobs side from open ATS feeds the way hiring.cafe does
(Greenhouse/Lever/Workday public boards). Use the existing draft
job-schema.json. This gives a true second non-genealogy domain proving
donto is domain-neutral, with a built-in distribution channel
(registry.jsonresume.org/).
Lead the product UX with explainability and contradiction, since
that is the demonstrable gap vs Eightfold/Beamery/embedding research:
every match shows the backing claim chain, the counter-evidence, the
confidence, and a verification path ('confirm this inferred skill by
adding the repo link'). This converts donto's weakness-today
(contradiction/identity machinery barely used) into the headline
differentiator in a domain where it is naturally exercised.
Honest caveats:
The incumbents are not naive: Eightfold/Beamery/LinkedIn already
infer latent and adjacent skills via deep-learning talent graphs.
donto's differentiation is NOT 'we have a skills graph' (they have
bigger ones) but specifically evidence-anchoring +
contradiction-preservation + bitemporal replay + human-readable WHY. If
the product collapses claims to scores like everyone else, there is no
moat.
Cold-start / corpus-size reality check: JSON Resume's real adoption
is modest (10,000+ developers, ~tens of thousands of users, dev-skewed)
versus LinkedIn's 875M profiles and Lightcast's 1B postings. The network
effects the thesis needs are real in principle but donto starts orders
of magnitude behind on volume; the dev-resume niche may be too narrow to
bootstrap broad skill-adjacency. Start where the data is dense (software
roles) and be honest that it is a wedge, not the whole market on day
one.
The skills-based-hiring TAM is partly hype: 85% of employers CLAIM
it but reported real impact on hires is tiny (rhetoric/reality gap).
Buyers may not actually pay for better explainable matching if their
hiring is still degree/pedigree-driven in practice. Monetization is
plausible (TA software ~$22-26B, AI seg ~$3.2B/12% CAGR) but the
explainability/fairness angle, not raw matching accuracy, is the
likeliest paid wedge (EU AI Act / hiring-audit compliance).
Bias and fairness cut both ways: evidence-anchoring can REDUCE bias
(decisions traceable to job-relevant claims) but the extraction LLM
itself inherits bias (ConFit benchmark 72% male; skill inference can
encode proxies for protected attributes). Claiming donto is 'fairer'
requires actual measurement, not architecture alone.
Extraction precision is not free: best ESCO skill-extraction
pipelines hit ~79-91% precision, meaning a meaningful fraction of
emitted skill-claims are wrong. At high volume that is a lot of false
claims entering the graph. The contradiction/hypothesis_only machinery
must actually be wired to demote them, this is exactly the part of donto
that is currently barely exercised (argument edges ~2,426, evidence
~4.7%), so the domain will stress-test unbuilt code paths.
There is NO cross-entity relationship-generation step in donto's
code today, per the brief. The entire 'generate match hypotheses from
claim combinations + verifier/ranker' layer that makes this a product is
net-new build, not a config of existing genealogy pipelines. The
taxonomies and extraction prior art exist; the discovery/verifier engine
is the real engineering risk.
Privacy/consent: matching real people's resumes to jobs at scale
(especially Gist-scraped or ATS-scraped data) has GDPR/CCPA exposure
that genealogy of deceased ancestors does not. Bitemporal 'remember
everything forever' is a compliance liability for living job-seekers
(right-to-erasure conflicts with contradiction-preserving
permanence).
linguistic-decomposition-dimensionality
The founder's intuition is literally true and measurable. Modern NLP
defines a stack of ~12 distinct, gold-standard, FALSIFIABLE annotation
layers over the same text, each with public datasets and inter-annotator
agreement. They are not redundant — each emits a different TYPE of
claim. Layer by layer, every token and every predicate gets decomposed
many times over. Universal Dependencies annotates EVERY token with 6
fields: form, lemma, 1 of 17 universal POS tags, a bundle drawn from ~30
morphological feature types (200+ possible values: Case has 40+, Tense,
Mood, Aspect, Voice, Number, Person, Gender, Definite, etc.), a HEAD
pointer, and 1 of 37 dependency relations. That alone is ~5-6 typed
claims per token before you touch meaning. Then Abstract Meaning
Representation (AMR 3.0, ~59K gold sentences, LDC2020T02) re-encodes the
same sentence as a rooted directed graph of concepts+relations (roughly
one node per content word plus :ARG/:mod/:time edges). Then PropBank
(every WSJ verb, sense-numbered e.g. leave.01 + numbered args ARG0-ARG5)
and FrameNet (1,000+ frames, 13,000+ lexical units, named roles like
Buyer/Seller) label predicate-argument structure / semantic roles. Then
word-sense disambiguation tags content words to WordNet (content words
average ~3-8 senses; SemCor is the gold corpus). Then coreference +
named-entity + entity-linking (OntoNotes 5.0 layers all of these over
400K words across 3 languages). Then Universal Decompositional Semantics
(UDS1.0, decomp.io) replaces categorical roles with 16-18 GRADED
real-valued proto-role properties per predicate-argument edge
(instigation, volition, awareness, sentience, change-of-state,
existed-before/during/after...) PLUS predicate-level factuality (did it
happen?) and genericity (kind/hypothetical/dynamic) — i.e. each single
edge becomes ~20 scalar claims. Then temporal: TimeML/TimeBank events +
TIMEX3 + TLINKs using Allen interval relations; TimeBank-Dense shows the
combinatorial blow-up directly — 36 documents, 1.6K events yield 5.7K
temporal links (~10x denser than sparse TimeBank, because you label ALL
event/time pairs). Then discourse: PDTB-3 (53,631 annotated
discourse-relation tokens, 3-level sense hierarchy) and RST/eRST
(rhetorical trees). Then multiword expressions (PARSEME 1.3: 455K
sentences, 26 languages, 62K+ verbal MWEs). Plus sentiment/stance and
presupposition/implicature layers.
So a realistic count for ONE rich 100-word paragraph: ~100 tokens x
~5 UD claims = ~500 syntactic/morphological claims; +~80-120 AMR
nodes/edges; +~10-20 predicates x (sense + ~3 args + ~20 UDS graded
properties each) = several hundred semantic claims; +coref chains,
+entity links, +5-15 temporal events/links, +3-8 discourse relations,
+MWEs, +sense tags. That is conservatively 1,000-2,000+ typed,
falsifiable annotations per paragraph WITHOUT any free-form prose. At
corpus scale this is not hand-waving: SemMedDB is the existence proof —
SemRep extracted 96.3 MILLION typed subject-predicate-object
predications from 29.1M PubMed abstracts, and that database is the
literal substrate of automated biomedical hypothesis generation. So
"millions of typed linguistic facts from a corpus" is not aspirational;
it is below the demonstrated ceiling.
Crucially for donto's thesis, these typed claims DO enable cross-text
relationship discovery, and the mechanism is exactly Swanson's ABC /
literature-based discovery: shared intermediate typed facts are the
bridge. Swanson's Raynaud-fish-oil discovery worked because the bridging
B-concept (blood viscosity) appeared as a typed claim in BOTH
literatures even though only 4 of 489 articles co-mentioned A and C —
the discovery LIVED in the intermediate typed facts, not the prose.
Shared FrameNet frames, shared WordNet senses, shared entity links, and
shared AMR concept nodes are precisely the join keys that let two texts
that never cite each other be connected. This is the honest
reconciliation of the volume debate: dense typed extraction is the FUEL
(high recall, the raw B-facts must exist or no bridge is findable), and
the danger is purely at the relationship-HYPOTHESIS layer where
all-pairs combinatorics (TimeBank-Dense's 1.6K events -> 5.7K links,
and far worse cross-document) demand a verifier/ranker. Volume feeds the
verifier; they are not opposed.
Abstract Meaning Representation (AMR 3.0 /
LDC2020T02) — Whole-sentence meaning as a rooted directed
acyclic graph: concept nodes (PropBank predicates + entities) and
:ARGn/:mod/:time/:location edges. Roughly one node per content word plus
relation edges. Gold-standard semantic parsing task. [~59,255 gold
AMRs (English); MASSIVE-AMR 84K+ annotations, 50+ languages]https://catalog.ldc.upenn.edu/LDC2020T02
PropBank — Predicate-argument / semantic role
labeling: every verb sense-numbered (leave.01 vs leave.02) with numbered
args ARG0-ARG5 + ArgM modifiers. Annotated over all WSJ verbs in Penn
Treebank. The workhorse SRL standard. [Full WSJ verb coverage;
foundation of OntoNotes propositions]https://www.cs.rochester.edu/~gildea/palmer-propbank-cl.pdf
FrameNet — Frame-semantic SRL: evokes 1 of 1,000+
hierarchically-related frames with named roles (Buyer/Seller/Goods).
Richer/typed roles vs PropBank's numbered args. Different typed claim
type from the same predicate. [1,000+ frames, 13,000+ lexical units,
200K+ annotated instances]https://framenet.icsi.berkeley.edu/
Universal Decompositional Semantics (UDS1.0,
decomp.io) — Replaces categorical roles with GRADED real-valued
claims: 16-18 semantic proto-role properties per pred-arg edge
(instigation, volition, awareness, sentient, change_of_state,
existed_before/during/after, was_used...) + predicate-level factuality +
genericity (kind/hypothetical/dynamic). Turns ONE edge into ~20 scalar
typed claims. [Unifies 5 decompositional annotation sets on a
UD-anchored graph; SPARQL-queryable]http://decomp.io/
OntoNotes 5.0 (LDC2013T19) — Multi-layer gold
corpus stacking Treebank + PropBank + word sense + named entities +
coreference over the SAME text, 3 languages. Proves the layers compose
on one corpus with cross-layer DB access. [400K words treebank/prop;
300K words coref+NER; 200K words sense; ~5,000 noun/verb senses]https://catalog.ldc.upenn.edu/LDC2013T19
Word-Sense Disambiguation (WordNet / SemCor) — Tags
each content word to a WordNet synset. Content words average ~3-8
senses; sense identity is a join key across texts. SemCor is the gold
all-words corpus. [SemCor = largest WordNet-tagged corpus; WordNet
117K+ synsets]https://web.stanford.edu/~jurafsky/slp3/19.pdf
TimeML / TimeBank + TimeBank-Dense — Events
(EVENT), time expressions (TIMEX3), and temporal links (TLINK) using
Allen interval relations (before/after/during/overlaps...).
TimeBank-Dense labels ALL event/time pairs — the combinatorial-blowup
canary. [TimeBank 1.2: 183 docs, 27K+ event/temporal annotations.
TB-Dense: 36 docs, 1.6K events -> 5.7K TLINKs (~10x denser)]https://timeml.github.io/site/timebank/documentation-1.2.html
SemMedDB / SemRep (real-world proof of
typed-extraction-at-scale) — NLM's SemRep extracts typed
subject-PREDICATE-object predications (TREATS, CAUSES, INHIBITS,
LOCATION_OF...) from PubMed. The single best evidence that high-volume
typed extraction is real AND drives discovery — it is the substrate of
automated biomedical literature-based discovery. [96.3 MILLION
predications from 29.1M PubMed citations]https://academic.oup.com/bioinformatics/article/28/23/3158/195282
Swanson literature-based discovery (ABC model) —
The canonical proof that shared INTERMEDIATE typed facts enable
cross-text relationship discovery. Raynaud (A) <- blood viscosity (B)
-> fish oil (C); the bridge lived in the typed B-facts even though
only 4 of 489 articles co-mentioned A and C. Directly validates donto's
relationship-hypothesis layer. [4 of 489 seed articles bridged A-C;
hypothesis later clinically confirmed]https://en.wikipedia.org/wiki/Literature-based_discovery
Volume verdict: HIGH-VOLUME TYPED EXTRACTION HELPS,
decisively, at the decomposition/recall layer and as the supply of
intermediate B-facts for discovery. The founder is correct and the
numbers back him: ~12 distinct gold annotation layers, ~5-6 UD claims
per token, ~20 UDS scalar claims per pred-arg edge, conservatively
1,000-2,000+ typed falsifiable annotations per rich paragraph, and
SemMedDB's 96.3M predications prove corpus-scale typed extraction is
already real and already drives discovery. Denser typed graphs strictly
enable MORE discovery, because Swanson-style bridges only exist if the
intermediate typed facts were extracted in the first place (4-of-489:
the discovery was invisible in prose, visible only in the shared typed
B-concept). So volume = recall = fuel: extract aggressively,
multi-layer, with provenance. It becomes NOISE at exactly one place: the
relationship-HYPOTHESIS layer, where you combine claims into NEW
candidate edges. There the math is combinatorial (TimeBank-Dense: 1.6K
events -> 5.7K links in 36 docs; cross-document entity pairs are
quadratic), so undisciplined generation produces a false-discovery
flood. The fix is not less extraction — it is a verifier/ranker between
the dense typed-claim layer and the asserted-relationship layer (rank by
novelty x plausibility x evidentiary-support x contradiction-value x
verification-cost; re-rank on new evidence). Concretely: typed
EXTRACTION should be maximal-recall and volume-loving; typed
RELATIONSHIP-GENERATION should be precision-gated. They are
complementary, not in tension. The one thing that IS pure noise
regardless of volume is FREE-FORM PROSE output from a lens — it has no
join key, no falsifiability, no provenance type, so it can neither feed
the ranker nor bridge two texts. Every lens must emit rows in a declared
typed-claim family or it is decorative.
Design implications:
Model the annotation LAYER as a first-class claim namespace. donto
already has typed predicates; reserve predicate families per layer
(ud:deprel, amr:ARGn, srl:frame/role, wn:sense, uds:proto-role-property,
tml:tlink, pdtb:relation) so a single token/predicate legally carries
claims from many layers at once — exactly OntoNotes'
multi-layer-over-one-text design. This is the schema that makes '1,000+
typed claims per paragraph' representable rather than hand-wavy.
Make every typed claim point at a source byte AND its annotation
layer + extractor version. UD/AMR/UDS disagree by design; treating
layer-of-origin as provenance is what lets contradictions be legal state
(UDS factuality=0.2 vs an SRL layer asserting the event occurred is a
real, useful contradiction, not a bug).
Adopt UDS's GRADED claims, not just categorical ones. Proto-role
properties, factuality, and genericity are real-valued [-3,3]. donto's
confidence/ranking machinery should ingest these scalars directly as
evidence weights — they are pre-built plausibility signals for the
ranker.
Build cross-text JOIN KEYS explicitly: shared WordNet sense, shared
FrameNet frame, shared entity-link IRI, shared AMR concept. These are
the literal bridges for relationship discovery (Swanson ABC). A
'discovery query' is: find entities/claims that share an intermediate
typed B-fact but are not yet directly linked. This is the cross-entity
relationship-generation step the codebase currently lacks.
Treat all-pairs typed relations (Allen temporal links, coref,
cross-doc entity matches) as the place the verifier/ranker is mandatory.
TimeBank-Dense (1.6K events -> 5.7K links in 36 docs) shows the
blow-up in-document; cross-document it is quadratic in entities.
Generate candidates densely, but gate them by novelty x plausibility x
evidentiary-support x verification-cost before they become asserted
edges.
Define each lens as emitting a specific typed-claim family with a
named target schema (a temporal lens MUST emit tml:event/tml:tlink rows;
a role lens MUST emit srl:frame/role + uds:property scalars). Borrow the
12 NLP layers as the canonical lens catalog so 'is this lens
decorative?' has an objective test: does it write rows in its declared
predicate family that the ranker can use?
Use SemMedDB as the architectural benchmark and de-risker: a
rule-based extractor already produced 96.3M typed predications and
powers real discovery. donto's claim lifecycle
(contradiction-preserving, bitemporal, evidence-anchored, re-rankable)
is precisely the layer SemMedDB LACKS — that gap is the product.
Honest caveats:
The per-paragraph count (1,000-2,000+) is an arithmetic projection
from layer densities (UD per-token, UDS per-edge, etc.), not a measured
single-corpus figure — no one corpus has ALL layers stacked on the same
text; OntoNotes stacks ~5, UDS stacks decompositional ones on UD. The
decomposition is real and falsifiable, but the headline number is a
synthesis, not a citation.
These are HUMAN gold-standard layers; automatic extractors are
noisier. SRL/AMR parsers run high-80s/low-90s F1; WSD on rare senses and
implicit/cross-document coreference and temporal TLINK classification
are materially worse. donto would ingest extractor output, so a
calibrated per-claim confidence (and the contradiction machinery) is
load-bearing, not optional.
An LLM-based extractor (donto's actual path via GLM) does NOT
natively emit UD/AMR/PropBank IDs — it must be constrained to the target
schemas. Without a typed output contract you get prose dressed as facts,
which is exactly the noise case. The gold standards are the schema to
target, not what the extractor produces for free.
Some layers genuinely overlap (PropBank vs FrameNet vs UDS roles
describe the same predicate-argument structure three ways). That overlap
is a FEATURE for contradiction-preservation and cross-source
corroboration, but it inflates raw counts; 'distinct typed claims'
should be deduplicated by (layer, target) or the volume figure
double-counts.
SemMedDB and Swanson are biomedical, where typed predicate
vocabularies (UMLS, predication types) are mature and curated.
General-domain typed-claim vocabularies are messier; the
jsonresume->jobs and genealogy domains will need their own controlled
predicate/role inventories before the 'shared B-fact bridge' trick works
as cleanly as it does in biomedicine.
Cross-document entity linking and cross-document coreference — the
join keys that make intertextual discovery work — are among the HARDEST
and least-solved tasks here. The bridges donto needs for discovery
depend on the weakest links in the stack, so identity-as-hypothesis
(non-destructive merge, query-time lenses) is well-matched but the
underlying signal is noisy.
volume-vs-precision-reconciliation
The literature cleanly splits volume into two layers, and the founder
is right on the first while the prior report was right on the second.
LAYER 1 — typed extraction / graph density (volume HELPS):
Knowledge-graph completion and link prediction degrade sharply as graphs
get sparser. Common-neighbor and triplet-closure methods "falter in
sparse graphs since they rely on closed triplets," and commonsense-KB
completion models "implicitly assume densely connected graphs, with
performance degrading quickly as graph density is reduced" (Malaviya et
al., arXiv:1910.02915). Real KGs are brutally sparse (commonsense and
biochemical graphs avg degree ~2); denser benchmarks (FB15k-237 vs
WN18RR) yield materially better link prediction. The mechanism is
exactly Swanson's ABC: an A→C discovery is impossible unless the
intermediate B-facts physically exist in the graph. If extraction is
shallow, the bridge term is simply absent and the true latent link is
unrecoverable — a recall ceiling no downstream ranker can lift. So the
founder's "vast latent typed structure of one text" intuition is the
correct objective at this layer: more typed claims = denser graph = more
reachable true links = higher recall. This is the raw fuel, and starving
it is the one mistake you cannot fix later. LAYER 2 — unverified
relationship hypotheses (volume HURTS without a gate): The Calude-Longo
theorem ("The Deluge of Spurious Correlations in Big Data," Foundations
of Science 2017) proves via Ramsey / Van der Waerden theory that
sufficiently large databases MUST contain arbitrary regularities purely
as a function of size, independent of the data's nature — "most
correlations are spurious," findable even in randomly generated data.
Candidate relationships between N entities scale ~O(N²) (or worse for
typed/path hypotheses), so the number of false candidates grows
combinatorially while true links grow ~linearly: the signal-to-noise of
unverified candidates collapses as you scale. This is the classic
multiple-comparisons regime where even Benjamini-Hochberg FDR control
(E[V/R]) is needed and Bonferroni FWER is "too conservative to be
useful" (BH 1995). The applied evidence agrees: OpenIE's documented
coverage-vs-utility tension ("cover more information... at the cost of
utility and compactness," noisier triples at high recall), and the fact
that automatically constructed KGs "inevitably bring in plenty of noise"
requiring trustiness/error-detection layers (Entropy 21(11):1083).
RECONCILIATION: volume and precision are NOT opposed — they live at
different layers. Volume at extraction feeds the verifier; the verifier
(scoring + evidence + contradiction + human-in-loop) is what makes
volume at the hypothesis layer safe instead of a deluge. donto's
architecture is literally the missing gate: every claim is
evidence-anchored (so a hypothesis carries its provenance / "why"),
contradictions are first-class (a generated link that rebuts existing
evidence is a feature, not corruption), and the ranking step
(novelty/plausibility/support/contradiction-value/verification-cost) IS
the FDR controller. The boundary is therefore concrete: extract
maximally (high recall, accept noisy typed claims, they're individually
source-checkable), but NEVER promote a generated relationship to
"believed" state without passing the scorer + evidence threshold; hold
the rest as hypothesis_only.
Players / datasets / taxonomies / standards:
Calude & Longo — The Deluge of Spurious Correlations in
Big Data (Foundations of Science, 2017) — The load-bearing
theoretical citation. Uses Ramsey theory / Van der Waerden theorem +
ergodic + algorithmic information theory to PROVE large enough databases
necessarily contain arbitrary correlations as a function of size alone,
not nature; 'most correlations are spurious' and appear even in randomly
generated data. Conclusion: correlation-mining must be accompanied by
causation/meaning/verification, not replace it. This is the rigorous
justification for why the relationship-hypothesis layer needs a
verifier. [Foundational theorem; widely cited (~Foundations of
Science 2017)]https://link.springer.com/article/10.1007/s10699-016-9489-4
Benjamini-Hochberg FDR procedure (1995) + control under
dependency (2001) — The operational model for the ranking/gate
step. Controls expected proportion of false discoveries E(V/R) rather
than any-false-positive (FWER); explicitly the right tool when the
number of candidate hypotheses m is large, where Bonferroni/FWER is 'too
conservative to be useful.' Maps directly to donto's hypothesis-ranking:
treat each generated relationship as a test, control the false-discovery
proportion among promoted links. [BH 1995 is one of the most-cited
statistics papers; Benjamini-Yekutieli 2001 extends to dependent
tests]https://en.wikipedia.org/wiki/False_discovery_rate
Swanson ABC / Literature-Based Discovery — The
canonical proof that density at the extraction layer enables discovery:
an A-C hypothesis is only findable if intermediate B-terms EXIST in the
graph (Raynaud-fish oil via blood viscosity). Also documents the dark
side: the 'sheer number of intermediate terms exponentially expands the
search space,' producing too many uninterpretable targets — i.e. recall
benefit at B-layer, precision cost at hypothesis-layer. Exactly donto's
two-layer split. [Field founded by Swanson 1986; active LBD
literature 2021-2024]https://pmc.ncbi.nlm.nih.gov/articles/PMC5771422/
Malaviya et al. — Commonsense KB Completion with Structural
and Semantic Context (2019) — Direct evidence that sparsity
kills completion: existing KB-completion models 'implicitly assume
densely connected graphs, with performance degrading quickly as graph
density is reduced.' Real commonsense/biochemical KGs have avg degree ~2
(very sparse). Motivates graph densification + transfer learning.
Supports: denser extracted graph = better link recall. [avg degree
~2 in real sparse KGs; perf degrades quickly with sparsity]https://arxiv.org/pdf/1910.02915
Stanovsky et al. / OpenIE6 / OpenIE survey —
recall-precision tradeoff in extraction — Quantifies the
extraction-layer tradeoff donto must own: confidence/rank modeling is
used to trade precision vs recall; neural OpenIE 'covers more
information... at the cost of utility and compactness'; 'incorrect
extractions of one sentence often have higher likelihood than correct
extractions of another' (cross-sentence confidence is unreliable).
Implication: extract high-recall, but DON'T trust a global confidence
score to gate — gate with evidence + downstream verification instead.
[Rank-aware iterative learning improves AUC; medical QA needs
high-precision/low-recall settings]https://arxiv.org/pdf/1905.13413
Microsoft GraphRAG + LazyGraphRAG / KET-RAG — indexing cost
economics — The cost reality of high-volume LLM extraction:
graph extraction is ~75% of indexing cost; building a KG over 1M tokens
costs ~$20-50 and hours, hundreds of dollars at corpus scale.
LazyGraphRAG achieves ~0.1% of full indexing cost by deferring LLM
summarization. Lesson for donto: extract typed claims eagerly but
defer/lazy the expensive relationship-generation+summarization until
query-time or until a candidate clears a cheap pre-filter. _[~75% of
index cost is extraction; LazyGraphRAG ~0.1% of full cost; ~$20-50 per
1M tokens]_ https://www.microsoft.com/en-us/research/blog/graphrag-new-tool-for-complex-data-discovery-now-on-github/
Embedding Learning with Triple Trustiness on Noisy KGs
(Entropy 2019, 21(11):1083) — Confirms
automatically-constructed KGs 'inevitably bring in plenty of noise' and
that naive embedding models wrongly assume all triples correct. Solution
= per-triple trustiness scoring from entity types + descriptions. Direct
analog to donto's evidence-anchoring + confidence-update: noise at
extraction is expected and managed by a trust/evidence layer, not by
refusing to extract. [Trustiness-weighted embeddings outperform on
noisy KGs]https://www.mdpi.com/1099-4300/21/11/1083
ESCO + implicit-skill extraction (resume-jobs flagship
grounding) — For the jsonresume->jobs domain: ESCO is the EU
standard with 3,039 occupations + 13,939 skills across 28 languages — a
ready-made typed-claim ontology and relationship backbone.
Implicit-skill extraction (skills not explicitly stated, inferred from
role/industry/geography) hits ~78-80% precision / ~86-88% recall at
industrial scale — proving the typed-extraction layer is tractable and
that implicit/inferred claims (the founder's 'IMPLICIT skills') are a
real, measured category. [ESCO: 3,039 occupations, 13,939 skills, 28
languages; implicit-skill extraction ~80% precision / ~86% recall]https://esco.ec.europa.eu/en/classification
Volume verdict: HELPS at the
typed-extraction/decomposition layer; HURTS at the
unverified-relationship layer. Concretely: (1) HELPS — denser extraction
raises recall of true latent links because discovery is bridge-dependent
(Swanson ABC: no B-fact, no A-C link) and completion models provably
degrade on sparse graphs (avg degree ~2 is where they fail). Extract
everything, including implicit/inferred typed claims; each is
individually source-checkable, so extraction noise is recoverable. This
is the founder's correct intuition and donto is currently STARVED here
(4.7% evidence coverage). (2) HURTS — candidate relationships scale
~O(N^2)/combinatorially while true links grow ~linearly, and
Calude-Longo proves large data MUST contain spurious regularities by
size alone, so the believed-relationship set degrades toward noise
unless gated. THE BOUNDARY is a state transition, not a volume cap: a
typed claim entering the graph from extraction = allowed at any volume
(it's evidence-anchored and stays a claim). A generated relationship
being PROMOTED from hypothesis_only to believed = must pass the scorer
(novelty/plausibility/support/contradiction-value/verification-cost)
under an explicit FDR budget. Volume feeds the verifier; the verifier is
what makes volume safe. They are sequential, not opposed.
Design implications:
TWO-LAYER VOLUME POLICY (state it explicitly in the report): (a)
extraction layer = maximize volume/recall, accept noisy typed claims
because each is individually evidence-anchored and re-checkable; (b)
relationship-hypothesis layer = volume is a liability, every generated
link stays in hypothesis_only state until it clears the scorer. Never
let extraction-volume leak directly into believed relationships.
The ranker IS the FDR controller — frame it that way. Score =
f(novelty, plausibility, evidentiary-support, contradiction-value,
verification-cost). Set an explicit promotion threshold and report an
estimated false-discovery proportion among promoted hypotheses (a
Benjamini-Hochberg-style budget), so 'how many of our generated links
are probably real' is a first-class, defensible number — not vibes.
Make density a measured, monitored input. Track avg node degree /
evidence-coverage % per context; donto today is ~4.7% evidence and 2,426
argument edges — i.e. starved at BOTH layers. The report should say: the
extraction layer is currently UNDER-fed (good news for the volume
thesis) and the relationship layer is essentially EMPTY (the missing
generation+gate step is the actual product gap).
Lazy/deferred relationship generation, per GraphRAG cost data.
Extract typed claims eagerly (cheap, source-checkable); generate the
O(N^2) candidate relationships lazily or behind a cheap pre-filter
(type-compatibility, shared-evidence, embedding-neighborhood) BEFORE
spending an LLM on full hypothesis articulation. Don't materialize the
combinatorial blob.
Contradiction is the precision asset, not a bug. Because false
candidates are inevitable at scale (Calude-Longo), the differentiator is
that donto can hold a generated link AND its rebutting evidence
simultaneously and rank by contradiction-value. Competitors that
collapse to a single 'truth' must silently drop or overwrite — donto
turns the noise into typed argument edges. Lead with this.
Each lens must emit typed claims that move the graph (re-state the
accepted reframe with teeth): a lens whose output cannot change a
confidence, add an evidence link, create a contradiction, or propose a
verifiable relationship is decorative and should be cut. The test for
any lens = does it raise recall of true latent links OR sharpen the
gate? If neither, delete.
Resume-jobs is the clean FDR demo: matching N resumes x M jobs is
exactly O(N*M) candidate relationships — the combinatorial-explosion
case made concrete. Use it to show the gate working: extract dense typed
skill claims (incl. implicit/inferred, ~80% precision is fine pre-gate),
generate match candidates, then promote only matches with
evidence-anchored skill->requirement satisfaction above threshold,
and surface contradictions (claims-senior vs tenure-junior) as ranked
signals. Report a precision@k on promoted matches as the proof
number.
Bitemporal re-ranking is the volume-safety valve over time: because
new evidence arrives, old low-ranked hypotheses get re-scored rather
than re-generated. This bounds cost (you don't re-run the combinatorial
generation) and is the honest answer to 'won't volume just keep
producing more spurious links?' — no, the believed set is gated and
re-ranked, the hypothesis pool is held cheaply as hypothesis_only.
Honest caveats:
The two-layer model is clean in theory but the boundary is fuzzy in
practice: extraction itself already implies relationships (a typed claim
'X HAS_SKILL Y' is both a fact and an edge). The report should be honest
that 'extraction' vs 'relationship-hypothesis' is a spectrum — the real
distinction is source-grounded-single-statement (extracted, trust the
byte) vs combinatorially-generated-from-multiple-claims (hypothesis,
must gate).
Calude-Longo is a statement about spurious correlations in
arbitrary/random data; it does NOT prove that real-domain discovery is
hopeless. It's a justification for gating, not for pessimism. Don't
overclaim it as 'big data is useless' — it specifically says mining
ENRICHES but cannot REPLACE causal/verification reasoning.
No paper I found gives a clean closed-form 'recall improves by X%
per unit of graph density' — the density->recall link is strongly
directionally supported (multiple completion/LBD papers) but the exact
dose-response curve is dataset-specific and unmeasured for donto's own
data. The report should present density-helps-recall as well-evidenced
direction, not a quantified guarantee.
FDR/Benjamini-Hochberg assumes you can assign something like a
p-value or calibrated score to each candidate. donto's hypothesis scores
are heuristic composites (novelty/plausibility/etc.), not true p-values,
so 'FDR control' is an analogy/design target, not a literal statistical
guarantee. Be honest that the promoted-hypothesis false-discovery number
is an estimate from a heuristic, validated empirically (precision@k on a
held-out verified set), not a proven bound.
OpenIE confidence-scoring evidence cuts BOTH ways: it shows
extraction CAN be ranked, but also that cross-sentence confidence is
unreliable — which means donto cannot lean on a global
extractor-confidence to gate; it must use evidence-anchoring +
downstream verification. This is a strength of donto's design but also
more work than 'just threshold the LLM score.'
The resume-jobs flagship inherits a real risk: implicit/inferred
skill claims (~80% precision) are exactly the kind of high-volume noisy
typed claims that, if treated as believed rather than hypothesis, would
produce bad matches and erode trust fast. The gate is not optional in
the monetizable domain — it's the difference between a useful matcher
and a spam engine.
claim-hypothesis-substrate-prior-art
The "claim/hypothesis/evidence substrate" is well-trodden territory —
donto must cite it, not claim it. The shipped reference point is
Wikidata: 120M+ items, ~1.65B statements, with a real claim-status model
— every statement carries qualifiers (time/role/context), a list of
references (each a "snak"/source), and a RANK of
preferred/normal/deprecated. Crucially, deprecated statements are KEPT,
not deleted ("known to be erroneous but still listed... in order to
prevent them being constantly added and removed") — i.e. Wikidata
already ships a contradiction-tolerant, source-attributed, three-valued
claim store at billion-scale. ~73% of statements have provenance.
Nanopublications are the academic gold standard for the per-claim
envelope: three named RDF graphs — assertion + provenance (where it came
from) + publication-info (who/when minted it) — with immutable
Trusty-URI hashing; >10M published, decentralized registry (Nanopub
Registry / Nanodash / Knowledge Pixels). The 2025 "knowledge-provenance"
extension adds a 4th graph specifically to track an assertion aggregated
from a body of supporting AND conflicting evidence with truth values —
i.e. nanopubs are actively bolting on what donto has natively. Bucur et
al. (2021) "super-pattern" formalizes scientific claims in logic to
auto-DETECT contradictory claims (evaluated on the Cooperation
Databank). So the claim-level provenance + contradiction-detection
problem is solved-ish; what nobody ships is contradiction as permanent
first-class legal state.
On typed-claim extraction (the VOLUME layer): SemMedDB/SemRep is the
scale proof — 130M+ subject-predicate-object "semantic predications"
from 37M+ PubMed abstracts, concepts normalized to UMLS. It is the
canonical "decompose a corpus into millions of typed claims" system —
and it was DEPRECATED Dec 31 2024 (final v43), leaving a vacuum donto's
extraction layer directly addresses. SciClaim (EMNLP 2021) is the
fine-grained schema: typed graph annotations
(causal/comparative/predictive/statistical/proportional associations +
qualifying attributes), 12,738 labels, >2x the label density of prior
datasets — concrete proof that one text yields dense typed structure
(validates the founder's "vast latent structure" intuition). ClaimsKG
(28,383 fact-checked claims → 6.6M triples) shows
claims-with-truth-ratings as a queryable KG. FEVER (185,445 claims
labeled SUPPORTED/REFUTED/NOTENOUGHINFO with evidence-sentence sets) is
the canonical evidence-anchored verification dataset — directly
analogous to scored resume↔︎job matching ("this skill-claim meets this
requirement").
On argumentation + re-ranking (the HYPOTHESIS-LIFECYCLE layer): AIF
(Argument Interchange Format) is the standard ontology for typed
argument edges — it already distinguishes REBUT (attacks a conclusion)
vs UNDERCUT (attacks the inference itself), exactly donto's
supports/rebuts/undercuts; AIFdb holds 14,000+ argument maps / 160,000+
claims. The re-ranking story has a flagship 2025-26 exemplar: Google
DeepMind's AI Co-Scientist (Nature, May 2026) runs a literal "tournament
of ideas" — Generation → Reflection (peer-review for
correctness+novelty) → Ranking (pairwise Elo debates) → Evolution
(refine top hypotheses, re-enter tournament) → Proximity (dedup). This
is precisely "generate hypotheses, rank by novelty/plausibility, re-rank
as new evidence arrives" — but it runs in-memory per-session over
literature, NOT over a persistent paraconsistent substrate. Swanson's
ABC literature-based discovery (fish-oil↔︎Raynaud's) is the foundational
A-B-C relationship-generation pattern, and the key reconciliation point:
ABC discovery REQUIRES the intermediate B-facts to already exist in the
graph — denser typed extraction (volume) is the fuel that makes A→C
discovery possible. Inconsistency-tolerant KG reasoning is a 2025 survey
area (paraconsistent logics, QC-negation/QCDL avoid the principle of
explosion) — academically mature, but as research, not shipped
infrastructure.
Players / datasets / taxonomies / standards:
Wikidata (ranks: preferred/normal/deprecated) —
Shipped, billion-scale claim-status model. Statements carry qualifiers +
a list of references (snaks) + a RANK. Deprecated = known-wrong but KEPT
to preserve the record — a contradiction-tolerant three-valued status
that ships in production. The closest existing thing to donto's
claim-status idea. [120M+ items, ~1.65B statements, ~73% with
provenance/references (early 2025)]https://www.wikidata.org/wiki/Help:Ranking
Nanopublications (assertion + provenance +
pub-info) — Per-claim RDF envelope: 3 named graphs (the claim;
where it came from; who minted it/when), immutable via Trusty-URI
hashing, decentralized registry. The gold standard for evidence-anchored
atomic claims. Lacks: native contradiction-as-legal-state,
identity-as-hypothesis, bitemporality. [>10M published nanopubs
(mostly life-science); Nanopub Registry / Nanodash / Knowledge Pixels
infra]https://nanopub.net/
Nanopub knowledge-provenance extension (2025) —
Adds a 4th named graph to track an assertion derived from a BODY of
evidence with supporting AND conflicting pieces + truth values from a
truth-discovery process. Direct evidence the field is retrofitting
contradiction-tracking that donto has natively. [unknown (2025
paper, Int'l J. on Digital Libraries)]https://link.springer.com/article/10.1007/s00799-025-00431-x
Bucur et al. 2021 — contradictory-claim detection via
nanopubs (super-pattern) — Formalizes high-level scientific
claims in logic (the 'super-pattern') and uses nanopub provenance to
DETECT + EXPLAIN contradictory claims across studies. Detects
contradictions; does not preserve them as queryable permanent state.
[Evaluated on Cooperation Databank (CoDa) social-science
repository]https://ieeexplore.ieee.org/document/9582393/
SemMedDB / SemRep / Semantic MEDLINE — PubMed-scale
repository of typed subject-predicate-object 'semantic predications'
auto-extracted from abstracts, normalized to UMLS. THE proof that
high-volume typed extraction over a corpus works — and the gap donto
fills. DEPRECATED Dec 31 2024 (final v43), archived only. [130M+
predications from 37M+ PubMed citations; deprecated 2024]https://lhncbc.nlm.nih.gov/temp/SemRep_SemMedDB_SKR/dbinfo.html
SciClaim (EMNLP 2021, SIFT) — Fine-grained typed
claim-graph schema: entity spans as nodes, typed relations as edges +
qualifying attributes; captures
causal/comparative/predictive/statistical/proportional associations.
Proves one text yields dense typed structure (>2x prior label
density) — validates the 'vast latent structure' thesis. [12,738
labels across SBS/PubMed/CORD-19; transformer joint extraction
baselines]https://github.com/siftech/SciClaim
ClaimsKG — Knowledge graph of fact-checked claims
with harmonized truth ratings, authors, dates, DBpedia-linked entities;
supports queries like 'all false claims by X in 2017 mentioning Y'.
Claims-with-status as a queryable KG, but status is curated
truth-ratings, not generated/re-ranked hypotheses. [28,383 claims
since 1996 → 6.6M triples (2019)]http://users.ics.forth.gr/~fafalios/files/pubs/ISWC2019_ClaimsKG.pdf
FEVER / FEVEROUS — Canonical evidence-anchored
claim-verification dataset: claims labeled
SUPPORTED/REFUTED/NOTENOUGHINFO, each tied to specific evidence
sentences (often multi-sentence/multi-page). The closest analogue to
scored, evidence-grounded relationship verification (e.g. resume-skill
meets job-requirement). [185,445 claims verified vs Wikipedia;
FEVEROUS adds structured-table evidence]https://fever.ai/dataset/fever.html
Argument Interchange Format (AIF) + argument mining
— Standard ontology for typed argument structure. Already distinguishes
REBUT (attacks a conclusion) vs UNDERCUT (attacks the inference) —
exactly donto's typed argument edges. Models argument structure but has
no extraction-at-scale, no bitemporality, no identity layer. [AIFdb:
14,000+ argument maps, 160,000+ claims, 14 languages]http://www.arg-tech.org/wp-content/uploads/2011/09/aif-spec.pdf
Google DeepMind AI Co-Scientist (Nature, May 2026)
— The flagship re-ranking exemplar.
Generation→Reflection(novelty+correctness review)→Ranking(pairwise Elo
'tournament of ideas')→Evolution(refine top, re-enter)→Proximity(dedup).
Exactly 'generate, rank by novelty/plausibility, re-rank.' BUT in-memory
per-session over literature — no persistent paraconsistent substrate, no
bitemporal replay. [Nature-validated May 2026 (AML drug repurposing,
liver fibrosis); lab-validated, not clinic]https://deepmind.google/blog/co-scientist-a-multi-agent-ai-partner-to-accelerate-research/
Swanson ABC literature-based discovery —
Foundational relationship-generation pattern: if A-B in one literature
and B-C in a disjoint literature, infer candidate A-C. The
reconciliation anchor: ABC discovery REQUIRES the intermediate B-facts
to already exist — high-volume typed extraction is the fuel for
hypothesis generation. [Origin: fish-oil↔︎Raynaud's;
recovered/extended via semantic predications]https://pubmed.ncbi.nlm.nih.gov/23026233/
Inconsistency-tolerant / paraconsistent KG reasoning (2025
survey) — Academic basis for donto's paraconsistency:
paraconsistent logics admit contradictions without the principle of
explosion; QC-negation/QCDL reason over inconsistent KGs;
detect/repair/tolerate framing. Mature as theory — but research, not
shipped infrastructure. [Survey area (arXiv 2502.19023, 2025)]https://arxiv.org/html/2502.19023v1
Volume verdict: Volume HELPS unambiguously at the
typed-extraction/decomposition layer and the prior art proves it:
SemMedDB shipped 130M predications from 37M abstracts and SciClaim
showed one text yields >2x prior typed-label density — the founder is
correct that the latent typed structure of even one text is vast, and
denser graphs are a precondition, not a luxury. Swanson's ABC is the
clinching argument: you cannot discover A→C unless the intermediate
B-facts physically exist in the graph, so under-extraction directly caps
discovery. For jsonresume→jobs, volume is genuine network effect — more
resumes+jobs → richer skill-adjacency and career-path priors → better
hidden-candidate discovery. So at the recall/fuel layer, more typed
claims = strictly more discoverable relationships. Volume becomes NOISE
at the relationship-HYPOTHESIS layer, where combinatorics explode (N
typed claims → up to O(N^2) candidate relationships) and
false-discovery-rate dominates — this is exactly why every serious
system in this space pairs generation with a VERIFIER/RANKER: the AI
Co-Scientist's Elo tournament + reflection reviewer, FEVER's
evidence-sentence requirement, Bucur's logical super-pattern for
contradiction, time-aware evidence ranking for fact-checking. The
reconciliation the report must state plainly: volume FEEDS the verifier,
it does not compete with it. Donto's job is to make the extraction layer
as high-volume as possible (the fuel) AND to put a ranker
(novelty/plausibility/evidentiary-support/contradiction-value/verification-cost)
and ideally a Lean-certified typed gate between candidate generation and
accepted relationships. A lens or extraction pass that raises recall of
typed B-facts is always valuable; a step that emits unranked, unverified
A→C relationships at volume is the decorative/dangerous case the prior
art uniformly guards against.
Design implications:
Cite, don't claim originality: the report must explicitly position
donto against Wikidata-ranks (shipped claim status), nanopublications
(per-claim provenance envelope), AIF (typed rebut/undercut), SemMedDB
(typed extraction at scale), and the AI Co-Scientist (Elo re-ranking).
Donto's novelty is the COMBINATION held as permanent legal state —
paraconsistency + identity-as-hypothesis + bitemporality + Lean shapes —
not any one piece.
Sharpen the one-line wedge: existing systems either DETECT
contradictions then resolve/collapse them (Bucur super-pattern, KG
repair) or RANK hypotheses in-memory per session (AI Co-Scientist) —
donto is the only one that KEEPS mutually-incompatible claims as
queryable first-class state forever AND re-ranks them as evidence
arrives. Lead with that.
Wikidata's 'deprecated but kept' rank is the proof-of-concept to
cite for why preserving wrong/contested claims is operationally
necessary (it stops the add/remove churn war) — frame donto's
contradiction frontier as the principled generalization of a pattern
Wikidata already needed at billion-scale.
Adopt the nanopublication 3-graph (assertion/provenance/pub-info)
vocabulary explicitly as the per-claim envelope so donto interoperates
rather than reinvents; note donto's evidence-byte anchoring is a
stricter form of nanopub provenance. The 2025 4th-graph
'knowledge-provenance' extension is direct evidence the field is
converging toward donto's native model — cite it as validation.
SemMedDB's Dec-2024 deprecation is a concrete, dated market gap: a
130M-predication typed-claim resource just went unmaintained. Position
donto's high-volume opencode extraction as the live successor to that
decomposition layer (and a domain-neutral one, not
biomedical-only).
Map the lifecycle to named prior art so each stage is credible:
extract→SemMedDB/SciClaim;
hold-incompatible→Wikidata-ranks/paraconsistent-KG; typed argument
edges→AIF rebut/undercut; verify-against-evidence→FEVER;
generate+rank+re-rank→AI Co-Scientist Elo tournament; A→C
discovery→Swanson ABC. Donto = the persistent substrate that unifies all
six stages that today live in six separate systems.
For the jsonresume→jobs flagship: frame matching as FEVER-style
evidence-anchored claim verification (skill-claim from a resume bullet
SUPPORTS/REFUTES a job requirement) plus Swanson-ABC adjacency discovery
(resume→skill→adjacent-skill→hidden-job). This grounds the matching
thesis in two named research lineages.
Lean-4 shape certification has NO direct analogue in any cited
system — it is donto's most genuinely differentiated component (others
use SHACL/ShEx at best). Give it weight as the verifier/typing
guarantee, but don't oversell: today it certifies shapes, it is not yet
the FDR-controlling relationship verifier the report's thesis
needs.
Honest caveats:
No single cited system combines all four donto pillars, but that
cuts both ways: it is genuine whitespace AND a sign that holding
contradictions forever may be operationally costly enough that mature
projects (Wikidata) chose a lighter 3-value rank instead. The report
should not assume 'nobody did it' equals 'nobody should' — argue the use
case, don't assume the gap is pure opportunity.
Donto's contradiction machinery is barely exercised today (2,426
argument edges, 77 identity edges, ~4.7% evidence coverage on 39.5M
statements). Cited systems have orders more traction in their niches:
Wikidata 1.65B statements with 73% provenance, SemMedDB 130M
predications, FEVER 185K verified claims, AIFdb 160K claims. The prior
art is more proven than donto's own contradiction layer — be honest that
donto's differentiators are largely latent/aspirational right now.
The AI Co-Scientist already ships the exact generate→rank→re-rank
loop the reframe centers on, was Nature-validated (May 2026), and is
backed by DeepMind. Donto's claim to differentiation is the PERSISTENT
PARACONSISTENT SUBSTRATE underneath (it re-ranks in-memory per session,
donto would re-rank over durable bitemporal state) — that is a real
distinction but a narrower one than 'nobody does hypothesis ranking.'
Frame precisely.
Lean-4 shape certification has no analogue and is differentiated,
but today it certifies shapes, not the false-discovery-controlling
relationship verifier the volume-vs-noise thesis actually requires. Do
not let 'Lean overlay' stand in for a verifier that does not yet
exist.
SemMedDB's deprecation is a real gap but also a caution: an
NLM-backed, 130M-predication, decade-running resource was shut down —
typed-extraction-at-scale has a sustainability/maintenance problem, not
just a technical one. The opportunity is real but the graveyard is
instructive.
Nanopublication network scale (>10M, ~2021 figure) is the best
public number found; live registry totals were not retrievable, so treat
it as a floor, not current. Wikidata 'deprecated' semantics are
'known-erroneous-but-kept,' which is contradiction-TOLERANT but not
contradiction-PRESERVING-as-equal-legal-state — don't overstate the
equivalence to donto's frontier.
cross-domain-application-survey
There is a real, large, mostly-unsolved market for a substrate that
holds source-grounded typed claims, keeps contradictory claims as legal
state, treats identity as query-time hypothesis, and generates+ranks
relationship hypotheses. The clearest existing proof is biomedical
literature-based discovery (LBD): Swanson's ABC model is exactly
"generate A-C relationship hypotheses from a dense graph of A-B and B-C
claims," and the canonical fuel is SemMedDB — ~98M semantic predications
(subject-predicate-object) auto-extracted by SemRep from ~29M PubMed
abstracts. SemMedDB demonstrates donto's volume thesis (you need the
intermediate B-facts to exist before any A-C hypothesis can be found)
AND its core gap (SemMedDB stores predications but has weak first-class
machinery for contradiction, provenance-as-byte, and
identity-as-hypothesis; conflicting predications are noise, not
preserved legal state). Every domain surveyed splits the same way:
volume is unambiguously good at the typed-extraction/recall layer (more
claims = denser graph = more candidate relationships) and dangerous at
the hypothesis layer (combinatorial blowup of candidate A-C links
demands a ranker/verifier or the false-discovery rate buries the
signal).
The strongest "this genuinely needs all four properties" domains are:
(1) LBD/drug-repurposing (SemMedDB, DRKG, Hetionet) — contradictions
between studies are scientifically real, identity-as-hypothesis matters
because gene/drug synonymy is messy, evidence-anchoring to the
PMID/sentence is the whole game; (6) financial-crime/AML
beneficial-ownership — OpenSanctions (2M+ entities, 337 sources), OCCRP
Aleph + FollowTheMoney schema, Open Ownership; entity resolution IS
identity-as-hypothesis (is this "John Smith" the sanctioned one?), false
positives are the #1 industry pain, every match must be explainable to a
regulator, and ownership changes over time (bitemporal); (4)
OSINT/investigative journalism — leaked datasets contradict each other
and official records, confidence tiers (confirmed/probable/possible) are
already standard practice, "why do we believe X" must survive
legal/editorial review; (2) legal case-law — citation graphs (Caselaw
Access Project, 6.7M cases) with TYPED edges
(cites/distinguishes/overrules/follows) are literally a paraconsistent
argument graph where an overruled precedent stays in the record as
superseded-not-deleted (bitemporal valid-time), and "what was good law
on date T" is the killer bitemporal query.
The flagship jsonresume→jobs case is the cleanest non-genealogy proof
because matching is literally scored relationship-discovery over typed
claims: resumes and jobs both decompose into
skill/role/seniority/domain/education/trajectory claims (ESCO ~3,000
occupations + ~13,900 skills; O*NET ~900 occupations as the public typed
vocabularies); the match is an evidence-anchored edge ("you meet this
requirement BECAUSE this skill-claim came from this resume bullet"); it
is contradiction-aware (resume claims senior, tenure dates say junior;
two roles overlap in time); it is bitemporal (skills decay, the market
re-prices them, so old match-hypotheses must RE-RANK when a new job
posting or a new resume arrives); identity-as-hypothesis handles
duplicate/variant profiles and inferred-vs-stated skills; and volume
genuinely produces network effects (more resumes+jobs → richer
skill-adjacency and career-path graphs → hidden-candidate and "adjacent
role you didn't know to apply for" discovery, which is exactly Swanson
ABC applied to careers). It is monetizable today and proves donto is
domain-neutral.
The honest caveats: existing players already do a lot. SemMedDB,
Hetionet/DRKG, OpenSanctions/Aleph, Mem0 (agent memory with temporal
validity windows), ESCO/CareerBERT — none of them combine all four
properties, but each does its slice well, so donto's wedge is the
COMBINATION (paraconsistent + evidence-byte + identity-as-hypothesis +
bitemporal + re-ranking hypothesis lifecycle), not any single property.
The places where contradiction-preservation is decorative rather than
load-bearing are domains where one authoritative source dominates and
disagreement is rare/uninteresting (much of routine supply-chain
provenance, single-EHR clinical data without cross-source conflict).
Those make weaker example projects.
Players / datasets / taxonomies / standards:
SemMedDB / SemRep (NLM) — Canonical LBD fuel:
subject-predicate-object semantic predications auto-extracted from
PubMed. THE proof that volume-of-typed-claims enables Swanson ABC
discovery. Weak on contradiction-as-legal-state, byte-level evidence,
identity-as-hypothesis — donto's exact gap to fill. [~98M
predications from ~29M PubMed abstracts (2019 release)]https://pmc.ncbi.nlm.nih.gov/articles/PMC3509487/
Hetionet / DRKG (drug-repurposing KGs) —
Integrative biomedical knowledge graphs (genes, compounds, diseases,
pathways) used for repurposing via metapath/embedding link prediction.
Show the relationship-hypothesis layer; do NOT preserve inter-study
contradictions. [Hetionet ~47k nodes / ~2.25M edges; DRKG ~97k
entities / ~5.8M triples]https://www.nature.com/articles/s41598-019-42806-6
Swanson ABC model (Fish oil–Raynaud,
Migraine–Magnesium) — Foundational LBD method: A-B and B-C
claims imply candidate A-C. Directly validates the founder's volume
thesis — intermediate B-facts MUST exist for discovery; denser graph =
more candidates. Also shows the verifier need (combinatorial A-C
explosion). [unknown (foundational; 1986–present body of work)]https://arxiv.org/pdf/2506.12385
Caselaw Access Project (Harvard LIL) — Full US case
corpus + citation graph with TYPED treatment edges
(cites/follows/distinguishes/overrules). A real paraconsistent argument
graph: overruled precedent stays as superseded-not-deleted; 'what was
good law on date T' is the killer bitemporal query. [6.7M cases /
40M+ pages; full citation graph released]https://lil.law.harvard.edu/blog/2020/04/22/caselaw-access-project-citation-graph/
LAMUS legal argument-mining corpus — Sentence-level
legal argument mining (facts/issues/rules/analysis/conclusion) from US
caselaw via LLM annotation. Proves typed-claim extraction from legal
prose is tractable at scale — the lens layer for the legal domain.
[unknown (corpus built from SCOTUS + Texas appellate opinions)]https://arxiv.org/pdf/2603.08286
Retraction Watch Database — Tracks
retractions/corrections — i.e. claims whose truth-status changed over
time. Pure bitemporal use case: 'we believed paper X at tx-time T1;
retracted at T2' and downstream re-ranking of every hypothesis that
cited it. [60k+ retractions tracked; retractions spiked 2020-2024
(paper mills)]https://retractionwatch.com/
OCCRP Aleph (FollowTheMoney schema) — Investigative
KG tying people/companies/payments/leaks/public records;
cross-references entity lists against all datasets. Built on the open
FollowTheMoney (FtM) ontology — a ready-made typed-claim schema for the
finance/OSINT domain. [unknown (used by global
investigative-journalism network; large leak corpora e.g. Panama/Pandora
Papers)]https://aleph.occrp.org/
OpenSanctions — Open DB of
sanctions/PEPs/criminal-interest entities. Entity resolution IS
identity-as-hypothesis; false positives are the #1 AML pain; new
'logic-v2' matcher is explicitly explainable. FtM-schema, integrates
Open Ownership beneficial-ownership data. [2M+ entities from 337
sources (default bundles 325)]https://www.opensanctions.org/datasets/default/
SAO / SAOx patent mining + PatentsView —
Subject-Action-Object extraction from patents encodes
function-behavior-structure as typed claims; cross-domain analogy mining
= find a function in domain A solved by structure in domain B (Swanson
ABC for invention). PatentsView is the AI-cleaned USPTO corpus.
[USPTO ~8M+ granted patents; SAO extraction F1 ~89.6% (Gemini 1.5
Pro)]https://www.nature.com/articles/s41598-026-49727-1
EU Digital Product Passport (ESPR) — Mandatory
per-product provenance/material/impact claims across supply chains from
2027 (batteries/textiles/electronics first). Source-level traceability =
evidence-anchoring; conflicting supplier claims = contradiction;
supplier swaps over time = bitemporal. [DPP market USD 185.9M (2024)
→ ~USD 1.78B (2030), CAGR ~45.7%]https://data.europa.eu/en/news-events/news/eus-digital-product-passport-advancing-transparency-and-sustainability
Getty Provenance Index + Central Registry of Looted Cultural
Property — Provenance chains for artworks; contested
attribution and competing ownership claims are the core problem.
Identity-as-hypothesis (is this the same object?), contradiction (rival
claims), evidence-anchoring (archival inventory byte). [Getty index
2.3M+ records; Central Registry 25k+ objects; WJRO est. 100k+ covered
objects in US museums, ~10% researched]https://art.claimscon.org/resources/overview-of-worldwide-looted-art-and-provenance-research-databases/
OMOP CDM (OHDSI) + clinical-guideline argumentation
— Standardized observational patient data; temporal events. Guideline
conflicts under multimorbidity are real and resolvable via
argumentation. Bitemporal (diagnosis revised), contradiction
(conflicting recs), evidence (which trial supports which rec).
[OHDSI network covers ~800M+ patient records globally
(OMOP-mapped)]https://ohdsi.github.io/CommonDataModel/background.html
Mem0 (agent memory layer) — Closest direct
competitor for personal-AI-memory: hybrid vector+graph+KV store,
temporal validity windows, supersedes contradictory facts. KEY CONTRAST:
Mem0 REPLACES outdated/contradictory facts; donto's thesis is to
PRESERVE them as legal state and re-rank. [claims ~26% better than
OpenAI built-in memory on its benchmark; open-source, widely
adopted]https://mem0.ai/blog/state-of-ai-agent-memory-2026
ESCO / O*NET (skills+occupation taxonomies) —
Public typed vocabularies for the jsonresume→jobs flagship: occupations,
skills, and skill-relations
(requires/complements/evolves-into/co-occurs). The schema backbone for
decomposing resumes+jobs into typed claims. [ESCO ~3,000 occupations
+ ~13,900 skills/competences; O*NET ~900 occupations]https://esco.ec.europa.eu/en/classification
CareerBERT / TalentCLEF 2025 / ESCO matching —
State of the art in resume↔︎job matching: shared-embedding and KG-GNN
methods over ESCO/O*NET, with inferred (not just stated) skills. Proves
the matching-as-scored-relationship-discovery layer; lacks
contradiction/bitemporal/evidence-byte lifecycle. [unknown
(CareerBERT reports SOTA vs prior embedding baselines)]https://arxiv.org/pdf/2503.02056
Volume verdict: Volume HELPS, decisively, at the
typed-extraction/decomposition layer in every domain, and the founder is
right that one text's latent typed structure is vast: SemMedDB (~98M
predications from ~29M abstracts) is the existence proof that you cannot
do Swanson-ABC discovery until the intermediate B-claims have been
extracted at scale — denser graphs strictly enable more discoverable A-C
relationships. Same logic in jobs (more skill/role claims per resume →
richer skill-adjacency and career-path graphs → hidden-candidate and
adjacent-role discovery = network effects), patents (more SAO
function-claims → more cross-domain analogies), and AML (more
ownership/payment claims → more reachable shell chains). Volume becomes
NOISE at the relationship-HYPOTHESIS layer, where it is combinatorial: N
typed claims yield ~O(N^2) candidate A-C edges, and without a
ranker/verifier the false-discovery rate buries the few real findings —
this is precisely why LBD work added contextual/semantic constraints
beyond raw co-occurrence, why AML lives or dies on false-positive
suppression (OpenSanctions' explainable logic-v2 matcher), and why
scientific claim-verification (SciFact) exists as a separate stage.
Reconciliation: volume FEEDS the verifier, it does not oppose it.
Decompose maximally (high recall, cheap), then rank/verify ruthlessly
(high precision, evidence-anchored). The danger isn't extracting too
many claims; it's PROMOTING unranked claim-combinations to asserted
relationships. donto's bitemporal re-ranking turns volume into a
strength even at the hypothesis layer, because more evidence arriving
over time monotonically improves the ranking of a fixed hypothesis set
rather than multiplying it.
Design implications:
Ship the claim lifecycle as the product, not the lenses. Every
domain example must demonstrate: extract typed claims → hold
contradictions → generate candidate relationships → attach
evidence+counter-evidence → rank → RE-RANK on new evidence → explain
WHY. A lens that doesn't change the graph (new candidate edge / evidence
link / contradiction / confidence update / verification path) is
decorative and should be cut.
Adopt or mirror existing open typed schemas per domain instead of
inventing — it makes each example a same-day proof and an importable
dataset: FollowTheMoney (finance/OSINT), ESCO/O*NET (jobs), SemMedDB
predication types + UMLS (biomedical), legal treatment-edge vocabulary
(cites/distinguishes/overrules/follows) for caselaw, OMOP (clinical).
donto's value is the lifecycle on top, not the vocabulary.
Make the verifier/ranker a first-class, swappable component because
volume forces it. In LBD, AML, and jobs the candidate-relationship set
is combinatorial; the founder's volume thesis is correct at extraction
and only survives at the hypothesis layer if a ranker (novelty ×
plausibility × evidentiary-support × contradiction-value ×
verification-cost) caps the false-discovery rate. Demonstrate the ranker
explicitly on at least the LBD and jobs examples.
Lead the 10 examples with the four domains where all four properties
are load-bearing AND a public dataset exists for a same-week demo: (a)
drug-repurposing on SemMedDB/Hetionet, (b) AML beneficial-ownership on
OpenSanctions+FtM, (c) legal good-law-as-of-date on Caselaw Access
Project, (d) jsonresume→jobs on ESCO/O*NET. These are the credibility
anchors; the other six (patents-analogy,
scientific-integrity/retraction-aware, OSINT,
clinical-guideline-conflict, cultural-provenance, personal-memory) round
out domain-neutrality.
Frame the personal-AI-memory example explicitly AGAINST Mem0: Mem0
supersedes/replaces contradictory facts; donto's differentiator is
preserving them as legal bitemporal state and re-ranking beliefs.
Without that contrast the example looks like a me-too memory layer.
For every example state a falsifiable success test up front (e.g.
LBD: rediscover a held-out, later-confirmed drug-disease link from
pre-discovery literature; jobs: beat ESCO-embedding baseline on a
held-out hire/interview outcome with an evidence-anchored,
contradiction-flagged explanation; AML: surface a true
sanctioned-beneficial-owner through ≥2 shell layers with a
regulator-readable why-trail).
Show the re-ranking loop concretely with bitemporal triggers: a
retraction (Retraction Watch), a new sanctions listing, a new job
posting, or a new resume edit must each demonstrably re-rank previously
generated hypotheses. This is the single behavior no surveyed competitor
does end-to-end and should be the demo money-shot.
Honest caveats:
Each domain already has competent incumbents doing a slice well
(SemMedDB/Hetionet, OpenSanctions/Aleph, Mem0, CareerBERT, ESCO).
donto's wedge is the COMBINATION of all four properties plus the
re-ranking lifecycle, not any single property — the survey must not
overclaim novelty on properties taken individually.
Contradiction-preservation is load-bearing in only ~6 of the 10
domains. In routine supply-chain provenance and single-source clinical
data, one authoritative source usually dominates and disagreement is
rare, so paraconsistency is decorative there; those make the weakest
examples and should be framed as 'multi-source-conflict' variants or
demoted.
Identity-as-hypothesis is the hardest property to make demonstrably
better than mature entity-resolution stacks (OpenSanctions logic-v2,
dedupe pipelines). Claiming it as a differentiator requires showing
query-time, non-destructive, evidence-justified merges beating a strong
ER baseline — not just asserting the design.
The biomedical and clinical examples are credibility-rich but
regulatory/validation-heavy; an LBD hypothesis is only valuable if it
survives wet-lab/clinical scrutiny, which donto cannot provide. Position
these as hypothesis-generation-and-prioritization, never as evidence of
efficacy.
jsonresume→jobs has a cold-start/coverage reality: the JSON Resume
registry is far smaller than commercial resume corpora (LinkedIn-scale),
so the network-effects volume argument is true in principle but unproven
at the founder's current data scale; the falsifiable test should be run
on whatever real registry+jobs volume exists, not a hypothetical
millions.
Several cited extraction F1/benchmark numbers (SAO ~89.6%,
claim-verification scores) are domain-and-dataset specific and will not
transfer to donto's open-domain GLM extraction without its own
evaluation; treat them as feasibility signals, not promised
performance.
Public 'good law as of date T' and looted-art contested-claim demos
touch legal liability — outputs are interpretive (matching the
user-memory stance that no source is ground truth), so these examples
should ship as decision-support with explicit source-attribution, never
as authoritative rulings.