The Lens Engine
— Research Appendix (raw findings)
Companion to the lens-engine report. Structured output of the
10-area study + 4 adversarial critiques (2026-06-01).
Critiques (adversarial)
PARTIALLY-HOLDS (confidence
0.72)
Claim: Agentic MANY-LENS decomposition
(philosophical/linguistic/temporal/causal/ethical/...) discovers
genuinely NOVEL and VALUABLE inter-entity relationships that existing
methods (KG embedding link-prediction, literature-based discovery,
analogy mining) do NOT already find.
Strongest support: The specific COMBINATION is
genuine white space, even though every component is old. (1) "Many
lenses" is faceted classification (Ranganathan/Bliss) + multi-viewpoint
ontologies; "hold contradictory perspectives in contexts" is Cyc
microtheories; "discover hidden cross-domain links" is Swanson LBD
(1986) and Hope/Kittur/Shahaf analogy mining; "agentic ontology-KG
hypothesis generation revealing hidden interdisciplinary relationships"
is SciAgents (Buehler/MIT 2024). So the discovery idea is well-trodden.
BUT none of the discovery systems persists its speculative output as
durable, contradiction-preserving, evidence-anchored legal state.
SciAgents (arxiv 2409.05556) generates and discards into prose;
LBD/analogy engines rank candidates but do not hold them as defeasible
typed argument edges (supports/rebuts/undercuts) forever; Cyc had the
contradiction-tolerant store (microtheories) but no agents and famously
collapsed under manual curation load. donto is a real, running
bitemporal paraconsistent quad store (~39.5M stmts; tx_time, overlays,
contradiction frontier, evidence_links, multi-aperture extraction with
the 6 named lenses all present in the live codebase at
/mnt/donto-data/workspace/donto-memory). The novel and defensible thesis
is the PIPELINE: generate astronomically many machine-proposed relations
across lenses, HOLD them without forcing consistency, evidence-anchor
each, and verify/curate (Lean overlay) the rare valuable ones. That
generate-hold-verify loop on a paraconsistent substrate is not something
the cited prior art does.
Strongest counterargument: The claim's load-bearing
words are "novel AND valuable," and the best evidence says many-lens
agentic generation buys NOVELTY at the direct expense of VALUE. Si &
Hashimoto's Ideation-Execution Gap (arXiv 2506.20803, 2025): LLM ideas
were rated significantly MORE novel than expert ideas (5.78 vs 4.91),
but after 43 experts each spent 100+ hours executing them, LLM scores
collapsed (overall -1.976, effectiveness -1.879, novelty -1.049) while
human ideas barely moved, converging both to ~4.7-4.9 — i.e., the
apparent novelty advantage was an evaluation artifact, not realized
value. Biomedical hypothesis-generation work (arXiv 2505.14599) finds
LLMs produce high false-positive rates; precision only comes at heavy
recall cost. This is the multiple-comparisons / spurious-correlation
deluge (Calude & Longo): cross any N lenses over M entities and the
count of "connections no human ever drew" explodes combinatorially —
almost all of which are noise, and "no one thought to draw this" is the
EXPECTED signature of a spurious link, not of a discovery. Sutton's
Bitter Lesson cuts deeper: a hand-authored taxonomy of
"philosophical/linguistic/teleological/semiotic..." lenses is exactly
the engineered human structure that scaled, end-to-end learned
representations have repeatedly outclassed; the lenses may be
scaffolding a frontier model already internalizes. And SciAgents (2024)
already shipped the headline claim ("reveals hidden interdisciplinary
relationships previously considered unrelated, surpassing human-driven
research") — so the differentiator is reduced to the substrate, which
speaks to HOLDING and CURATING claims, not to whether the discoveries
are novel-and-valuable. The substrate makes you better at storing and
triaging a firehose of mostly-false machine guesses; it does not raise
their base rate of being true or useful.
What must be true: For the claim to hold rather
than merely partially-hold, the founder must demonstrate, with
adversarial evaluation, that: (1) PRECISION/VALUE, not just rated
novelty — a measurable, defensible hit-rate of lens-intersection
relationships that survive independent verification (Lean certification,
a held-out source byte, or domain-expert/experimental confirmation), at
a rate beating a strong baseline of (a) a single frontier LLM prompted
directly for cross-domain links and (b) KG-embedding link-prediction +
LBD/analogy mining on the same corpus. Beating the bitter-lesson
baseline (a) is the crux; if a plain large model finds the same valuable
links without the lens taxonomy, the lenses are dead weight. (2) The
many-lens decomposition adds INCREMENTAL discoveries that the
single-pass baseline misses — i.e., value lives specifically at lens
INTERSECTIONS, shown by ablation (drop lenses, value drops). (3) A
working triage/curation mechanism keeps verification cost sublinear as
generated relations explode, so the spurious-deluge /
multiple-comparisons problem is controlled (FDR-style discipline, not
eyeballing) — otherwise the paraconsistent store just becomes a landfill
of unfalsifiable speculation. (4) "Valuable" is operationalized against
real downstream users (the genealogy/legal/medical consumers), not
self-rated novelty. Conditions under which it FAILS: if value tracks
rated-novelty (the ideation-execution gap reproduces); if a vanilla
frontier model matches it (bitter lesson); if false-positive rate makes
curation cost scale with generation (spurious deluge); or if the
substrate's contradiction-preservation degenerates into hoarding noise
nobody ever falsifies.
Claim: "The discovery signal is real and not fatally
drowned by combinatorial noise — there is a workable precision/ranking
story (novelty × plausibility × value) that surfaces the rare gold
rather than producing a hallucinated mess."
Strongest support: The signal is real and a working
precision/ranking pipeline already exists. Google's AI Co-Scientist (Feb
2025) produced hypotheses later confirmed in independent wet-lab work in
three biomedical areas (AML drug-repurposing inhibiting tumor viability
at clinical concentrations; an independently-validated cf-PICI
phage-tail host-range mechanism; epigenetic liver-fibrosis targets
confirmed in human hepatic organoids). Its precision mechanism IS the
claim's novelty × plausibility × value formula instantiated: self-play
debate → Elo tournament ranking (Elo empirically correlated with
correctness) → evolution → external confirmation. Separately,
unsupervised gene-disease embeddings predicted associations ~10 years
pre-publication (IFIH1–Aicardi-Goutieres scored 0.925 by a 2004 model
with zero corpus co-occurrence), proving latent cross-domain
relationships no human had drawn are computationally extractable.
donto's paraconsistent, evidence-anchored, hypothesis-as-state substrate
is a genuinely under-explored and apt fit for the GENERATE-and-HOLD half
of the pipeline (production paraconsistent KGs at 39M statements are
rare; the literature is mostly theory/prototypes).
Strongest counterargument: The
base-rate/combinatorial problem is the killer, and 40 years of
Literature-Based Discovery — the founder's exact idea, minus the LLMs —
is the cautionary precedent. The 2023 Bioinformatics critique of LBD
evaluation (pmc.ncbi.nlm.nih.gov/articles/PMC9945845) shows that against
large candidate sets "very few co-occurrences represent a true discovery
and the vast majority are meaningless," the true-discovery proportion is
"unknown and likely low," and Kostoff (2007, pubmed 17616484) found that
several canonical LBD "discoveries" were not in fact discoveries. After
four decades the validated trophy case is still essentially Swanson's
two hand-curated findings (fish-oil/Raynaud's, magnesium/migraine) —
found by a careful human reasoner, NOT by combinatorial enumeration. The
"many lenses → intersection" move makes this strictly WORSE, not better:
it multiplies the candidate space, and multiple-testing math is brutal
(test enough pairs and spurious "significant" hits approach certainty;
FDR can even blow up under the heavy feature dependence that lensed
decompositions create). The novelty/precision tradeoff is empirically
adverse, not neutral: in serendipity recommenders, novelty (N-nDCG) and
accuracy (nDCG) are directly negatively correlated — you cannot crank
novelty without paying in precision. The "embeddings already capture
this" objection bites hard: the gene-disease result was achieved with
plain gradient-boosting on concatenated embeddings — NO symbolic lenses,
NO paraconsistency — implying the bitter lesson applies (geometry
surfaces latent structure cheaper than hand-built analytical
scaffolding). And every cited authority warns that LLM self-validation
does NOT certify truth ("even unanimous agreement among SOTA LLM critics
does not guarantee scientific accuracy"; arxiv.org/html/2504.05496v1) —
so without an external truth signal the ranking degenerates into ranking
the model's own plausibility prior, i.e., confident hallucination.
donto's Lean-4 overlay only certifies shape/consistency, not empirical
truth, so it cannot supply the missing external signal. Finally, "no one
has thought to do this" is false: LBD, conceptual blending/bisociation
(Koestler/Boden), serendipity engines, KG link prediction, and LLM
co-scientists have all worked the pieces.
What must be true: For the claim to hold at scale,
ALL of the following must be true: (1) Ranking must be tied to an
EXTERNAL truth/value signal — wet-lab assay, market test, expert
adjudication, citation/realization outcome — not LLM self-critique
alone; the AI Co-Scientist only works because Elo PRIORITIZES and an
external experiment CONFIRMS. (2) The domain must have a cheap, fast
verifier or a high enough base rate of real relationships that a
final-stage filter can achieve usable precision; in verifier-poor or
low-base-rate domains the multiple-testing math wins and you get a
hallucinated mess. (3) "Many lenses" must be used to ADD
evidence/constraints that raise plausibility scoring precision
(context-based ABC LBD lifted precision from 27% to ~89-97% by adding
biological context), NOT merely to enumerate more candidate pairs —
i.e., lenses must be a filtering prior, not a generator of combinatorial
volume. (4) The economics must clear: cost-per-candidate-evaluated ×
candidates × (1/precision) must be less than the value of the rare gold
found; at low precision and high fan-out this is the silent business
killer. (5) The system must beat the cheap baseline — unsupervised
embedding link-prediction over the same corpus — by enough to justify
the agentic many-lens overhead; if plain embedding geometry surfaces the
same latent relations (as the gene-disease result suggests), the bitter
lesson refutes the elaborate symbolic pipeline. (6) novelty must be
scored jointly with plausibility AND value, accepting the
empirically-real novelty/precision tradeoff, with a human or experiment
as the last-mile gate. Where these hold (grounded science with assays,
like AI Co-Scientist) the claim holds; where they fail (open-domain "all
entities × all lenses" with only LLM self-validation and no external
verifier) it is refuted by the LBD track record and multiple-testing
statistics.
Claim: donto's paraconsistent + evidence-anchored +
bitemporal + identity-as-hypothesis substrate is genuinely the RIGHT
home for machine-generated relationship-hypotheses — materially better
than a vector DB + reranker — because it can hold contradictory
speculative edges, anchor them to evidence, and certify/verify the
survivors.
Strongest support: The claim is architecturally
sound and rests on a deep, legitimate intellectual lineage that the
"vector DB + reranker" frame genuinely cannot match on its stated job.
(1) The function the substrate must serve — hold many speculative,
mutually-contradictory relationship edges WITHOUT collapsing, keep each
pinned to its source, and let identity/merge be a query-time weighted
hypothesis rather than a hard key — is exactly what a vector DB +
reranker is bad at. An ANN index over embeddings stores points and
returns nearest neighbors by cosine similarity; it has no native notion
of a typed edge, no first-class "this claim attacks that claim," no
provenance-on-the-assertion, and no way to assert two contradictory
things as co-equal legal state. You would have to bolt all of that on in
a metadata store anyway — at which point you have rebuilt a graph/claim
store. KG-embedding research itself documents that embeddings learn
composition implicitly and "cannot offer logical inference paths as
support evidence," and that high-performing KGE models give divergent,
unstable triple-level predictions
(link.springer.com/chapter/10.1007/978-3-032-25156-5_11;
madoc.bib.uni-mannheim.de/66365/1/TGDK.1.1.4.pdf). (2) Every component
is independently validated, production-proven prior art that donto
correctly composes rather than invents: paraconsistent belief revision /
LFI (arxiv.org/html/2412.06117) for non-explosive contradiction;
bipolar/weighted argumentation frameworks (Dung, Toulmin/Pollock
supports-rebuts-undercuts) for typed argument edges
(dl.acm.org/doi/abs/10.1613/jair.1.12394); nanopublications, which are
LITERALLY assertion+provenance+publication-info RDF and have a 2024+
line of work using provenance to detect and explain CONTRADICTORY
research claims (ieeexplore.ieee.org/document/9582393;
link.springer.com/article/10.1007/s00799-025-00431-x); RDF-star / named
graphs / reification for evidence-on-statements in production stores
(ontotext.com/knowledgehub/fundamentals/what-is-rdf-star); Cyc
microtheories for context-relative, prima-facie-contradictory truth
(en.wikipedia.org/wiki/Cyc); and Standpoint Logic, which is
purpose-built so "multiple viewpoints [can] be integrated into the same
ontology, even when certain viewpoints may hold contradicting beliefs"
(researchgate.net/publication/357533078) — a near-exact formalization of
donto's "lens at query time." (3) The validation half is also real and
converging: POPPER / agentic sequential falsification
(openreview.net/forum?id=iTevNo8PzG) shows automated,
statistically-controlled hypothesis falsification is now feasible and
matches human experts 10x faster — meaning donto's "certify/verify the
survivors" step (Lean-4 overlay + falsification) is not fantasy. I
verified the substrate is real and at the claimed scale: 39,496,776
current statements, genuinely bitemporal (tx_time, valid_time as
Postgres ranges), context column functioning as named graphs. So as a
STORAGE-AND-CURATION substrate for contradictory, evidence-anchored,
identity-fluid claims, donto is materially better-suited than a vector
DB + reranker, and the combination is defensible.
Strongest counterargument: The claim quietly
conflates a CAPABILITY (the schema CAN hold these things) with a REALITY
(it DOES, at scale, in a way that beats the alternative on a job that
matters), and on the reality the evidence is damning. (1) The substrate
is almost empty of the very features the claim leans on. I grep-counted
the 39.5M live statements for any
supports/rebuts/undercuts/hypothesis/evidence/contradiction/sameAs
predicate: the query returned NOTHING. The store is ~96% genealogy facts
(ctx:genealogy/research-db alone is 21.8M); the argument-edges,
contradiction-frontier, and evidence-anchors are designed affordances,
not populated, exercised state. The schema itself is a plain
quad+bitemporal table (statement_id, subject, predicate, object_iri,
object_lit, context, tx_time, valid_time, flags, content_hash) — the
paraconsistent/argument machinery lives in higher layers and
conventions, not as load-bearing, query-optimized columns. The user's
own memory corroborates: "only 3 of ~80 Caroline-line kinship triples
have evidence_links, all in test contexts" and "donto-pg ex:kitty is a
junk-drawer URI." So the differentiator vs. a vector DB is, today,
largely aspirational. (2) The whole value chain is gated by a step the
substrate does NOT improve and may worsen: validation. The
combinatorial-explosion / multiple-comparisons / data-dredging
literature is unanimous that any sufficiently large cross-product of
entities × lenses yields overwhelmingly spurious "relationships"
(en.wikipedia.org/wiki/Data_dredging;
tylervigen.com/spurious-correlations), and LLM extraction is "prone to
hallucination and produce[s] hypotheses in volumes that make manual
validation impractical"
(emergentmind.com/topics/automated-hypothesis-generation). donto's
headline result — "697 facts from 'cat is red'", "483 facts from one
sentence" — is precision-blind over-extraction, exactly the failure mode
the KG-construction literature warns about (arxiv.org/pdf/2508.03438).
Holding a billion contradictory speculative edges "forever as legal
state" without collapsing is not obviously a feature; it is deferred
cost — it converts a precision problem into a curation/ranking problem
that the substrate makes no easier, while paying graph-write and storage
tax that a vector store would not. (3) The bitter lesson + "embeddings
already capture this" cut hard: cross-domain analogy / latent-connection
discovery — the actual payoff the founder wants — is already
demonstrably done by LLMs over learned representations
(arxiv.org/pdf/2302.12832; arxiv.org/pdf/2211.15268), with NO
hand-specified lens taxonomy and NO symbolic substrate. An agent can be
prompted "find a non-obvious connection between X and Y" and traverse
semantic space directly; the 10 hand-authored lenses
(entities/properties/relations/temporal/spatial/provenance/pragmatics/domain/inferential/quantitative
— which I confirmed in extraction.py) are precisely the kind of
handcrafted human structure Sutton predicts gets out-scaled. And the
most direct historical precedent — Cyc, a hand-built, microtheory-based,
contradiction-tolerant, context-relative ontology that consumed 2000+
PhD-years — is the canonical cautionary tale for this entire
architecture. Finally, the lenses as currently built do NOT do the one
thing the vision is about: I read extraction.py — each lens runs
INDEPENDENTLY and facts are deduped across passes; there is no
cross-lens INTERSECTION / relationship-discovery step at all. The
discovery engine the claim presupposes does not yet exist in the code;
the substrate is being justified by a generator that hasn't been
written.
What must be true: For the claim to hold rather
than merely partially-hold, ALL of the following must become true: (1)
The differentiating machinery must be POPULATED and load-bearing, not
just schema-possible: a non-trivial fraction of relationship edges must
actually carry typed argument links (supports/rebuts/undercuts),
evidence-anchors to source bytes, and contradiction-frontier membership
— and queries must routinely use them. (Today: ~0 such predicates in
39.5M statements.) (2) A real cross-lens RELATIONSHIP-DISCOVERY step
must exist and produce edges, not just per-lens fact extraction that is
then deduped (the current extraction.py does the latter). The
serendipity payoff lives entirely in the intersection step that isn't
built. (3) The economics of holding contradictions must beat collapsing
them: i.e., there must be queries/workflows where keeping
mutually-contradictory speculative edges alive demonstrably yields
better answers than a vector-DB-plus-metadata baseline that resolves or
scores them — and the genealogy domain (descendant-restricted records,
irreducibly conflicting witnesses, legal/native-title stakes where
preserving every source-attestation paraconsistently is the actual
requirement) is the strongest case where this is plausibly true. (4) The
validation/curation funnel must have a working ranker + falsifier
(Lean-4 shape-certification + POPPER-style sequential falsification)
that turns the over-generated mass into a small set of survivors at
acceptable precision — because without it, the substrate just stores
more noise more expensively. (5) The defensibility must rest on the
COMBINATION-at-scale + the curation funnel + a domain where
contradiction-preservation is a hard requirement, NOT on any single
component (each of which — paraconsistency, argumentation edges, nanopub
provenance, standpoint/microtheory contexts, bitemporality — is
well-established prior art the founder should cite and reuse, not claim
as net-new). The genuine white space is narrow but real: an AGENTIC,
machine-GENERATED, paraconsistent claim store with first-class
evidence-anchoring and a formal (Lean) verification overlay, operated at
scale on a domain (genealogy/native-title) whose evidentiary structure
actually demands contradiction-preservation. If that white space is
executed, the claim holds; if the project stays a high-recall
fact-dumper on an unused argument schema, it is refuted.
Claim: The 'lens engine' vision is fundamentally an
ADVANCE over (not a rebrand of) Swanson literature-based discovery and
KG completion — the agentic + many-deep-lenses +
hold-and-verify-on-a-paraconsistent-substrate combination is genuine
white space.
Strongest support: The
substrate-for-hold-and-verify is genuine white space: every comparable
system (Swanson LBD, LLM KG-completion, Google's AI co-scientist)
generates candidates and then ranks-and-DISCARDS, whereas donto can hold
the entire speculative, mutually-contradictory relationship frontier as
durable, evidence-anchored, Lean-certifiable legal state and re-curate
it as evidence arrives — and I verified these substrate features
(Supports/Rebuts/Undercuts edges, hypothesis_only, paraconsistent
append-only store, 6-lens sweep) actually exist in the donto-memory
source, not just the pitch.
Strongest counterargument: The vision's central
claim of novelty ('relationships no one ever thought to draw') is the
OLDEST and most thoroughly-defeated promise in the field, and the two
hardest critiques bite directly. FIRST, the bitter lesson (Sutton, https://en.wikipedia.org/wiki/Bitter_lesson):
hand-engineering a fixed taxonomy of 'human analytical lenses'
(philosophical, teleological, semiotic...) and a structured symbolic
substrate is exactly the human-knowledge-injection that scaled general
methods have repeatedly eclipsed. LLMs already implicitly encode a
'collective world model' across domains in their embedding space (https://arxiv.org/pdf/2501.00226),
so the cross-lens connections the engine laboriously materializes as
quads are arguably already latent in a frontier model — and asking the
model directly ('what surprising connection links X and Y across
economic and ecological framings?') may extract them more cheaply than
building the substrate. SECOND, and most damaging, is the
precision/value problem that has dogged LBD for THREE DECADES: 'Existing
LBD methods are prone to proposing spurious discoveries or an abundance
of low-quality ones... LBD produces more potential hypotheses than can
be manually reviewed' (https://pmc.ncbi.nlm.nih.gov/articles/PMC6694578/).
Multiplying lenses MULTIPLIES the combinatorial candidate space, making
the spurious-hypothesis flood WORSE, not better. Novel != true !=
valuable: an engine that emits a billion cross-lens 'relationships no
one thought of' has produced noise, not discovery, unless the
verify/rank step is extraordinarily good — and that verifier, not the
generator, is where all the value and all the unsolved difficulty
actually live. THIRD, the competitive frontier is already here and
validated: Google's AI co-scientist (Nature 2026, https://www.nature.com/articles/s41586-026-10644-y)
already does multi-agent generate/debate/evolve/rank hypothesis
discovery with WET-LAB-VALIDATED novel drug-repurposing and
liver-fibrosis results — it occupies the agentic-hypothesis-discovery
space today without needing a bespoke paraconsistent quad store. So 'no
one has thought to do this' is plainly false on the discovery side;
what's left unclaimed is only the substrate/persistence-and-curation
architecture, which is a narrower (and harder-to-monetize) claim than
the founder's framing.
What must be true: The vision holds as an ADVANCE
only if ALL of these hold: (1) The bottleneck is the VERIFIER/curator,
and donto's paraconsistent hold-and-anchor design is what makes
verification tractable at scale — i.e., the value must come from holding
millions of speculative edges as durable, evidence-anchored, re-curable
state (so verification improves monotonically as evidence arrives)
rather than from the lens generation itself. If the generator is the
product, it's just AI co-scientist with extra symbolic plumbing. (2)
Many DEEP lenses must beat one frontier LLM asked directly — there must
be demonstrable lift from explicit decomposition
(recall/precision/novelty) over simply prompting a top model for
cross-domain connections, otherwise the bitter lesson wins. This is
testable and currently UNTESTED in the codebase (the 6-lens sweep today
extracts facts WITHIN an entity; it does not yet generate cross-ENTITY
relationship hypotheses — the founder's actual payoff is not yet built).
(3) A cheap, high-precision ranking/certification layer must exist (the
Lean-4 overlay + argument edges + evidence anchoring) that filters the
combinatorial flood to the rare valuable few WITHOUT human review of
every candidate — the unsolved 30-year LBD problem. (4) There must be a
domain where ground-truth verification is cheap and the payoff of a
single true cross-lens link is high (drug repurposing, genealogy
bridging documents, materials), so the spurious-hypothesis flood is
survivable. (5) 'Novelty' must be operationalized as 'novel AND survives
evidence-anchored verification,' never raw count of generated edges. If
instead the pitch stays 'a million facts / a million connections,' the
claim is REFUTED by the bitter lesson and the LBD precision literature
simultaneously.
Literature-Based Discovery (LBD) is the single most direct
intellectual ancestor of the founder's "find connections nobody made"
vision, and it is far more developed than most people realize. Its
founding insight, Don R. Swanson's 1986 concept of "undiscovered public
knowledge," is precisely the founder's premise: knowledge that is
logically derivable from the union of two existing bodies of literature,
but that no single human ever assembled because no one read both
literatures. Swanson formalized this as the ABC syllogism: if literature
reports A→B (e.g., Raynaud's disease involves elevated blood
viscosity/platelet aggregation) and a separate, non-co-citing
literature reports B→C (fish oil/eicosapentaenoic acid reduces blood
viscosity), then a plausible, untested A→C link (fish oil treats
Raynaud's) exists in the "complementary but disjoint" literatures. He
published the fish-oil/Raynaud's hypothesis in 1986 and it was
clinically confirmed by a 1989 trial (DiGiacomo); his 1988 "Migraine and
magnesium: eleven neglected connections" produced 11 indirect links
supporting magnesium-deficiency→migraine, later clinically supported.
These are existence-proofs that the method generates real, non-obvious,
testable discoveries from text alone.
The field splits discovery into two modes that map cleanly onto the
founder's two use-cases. Open discovery ("serendipity
mode"): start from A, find all B intermediates, rank all candidate C's —
a fan-out search for unexpected endpoints. Closed
discovery ("verification mode"): given a fixed A and C (a
hypothesis you already suspect), find and rank the B-paths that would
explain/support it. donto's "generate many speculative relationships
then verify the valuable few" is exactly open-then-closed discovery. The
hard engineering problem LBD has wrestled with for 40 years is
ranking: open discovery generates a combinatorially
overwhelming candidate list, so the entire literature is essentially a
competition over scoring functions for "which of these millions of
latent links is worth a human's attention." Classic systems (Swanson
& Smalheiser's Arrowsmith, Hristovski's BITOLA, Petric's RaJoLink,
Weeber's concept-based DAD-system) rank B/C candidates by frequency,
tf-idf, and co-occurrence association measures; LION LBD
(Pyysalo/Cambridge, 2019) added a rich menu — Jaccard, normalized PMI,
symmetric conditional probability, chi-squared, log-likelihood — over a
graph of ~27M PubMed abstracts with NER-grounded entities. The recurring
lesson, brutally relevant to donto: even with good ranking, the true
target often sits at rank 56–299 (closed) or rank 15–120,000 (open) in
LION's own evaluation — i.e., the signal is real but buried, and
precision of ranking is the entire game.
The modern shift (roughly 2018→present) moved LBD from explicit
co-occurrence to learned representations. SemMedDB/Semantic MEDLINE
(Kilicoglu, Rindflesch, NLM) replaced raw co-occurrence with ~130M
typed semantic predications (subject-predicate-object triples
like "Drug-X TREATS Disease-Y") extracted by the SemRep parser, enabling
discovery over a typed knowledge graph rather than bag-of-terms — a
direct precursor to donto's quad/predicate structure (note: NLM
deprecated SemMedDB on 31 Dec 2024). Then knowledge-graph-embedding
methods (TransE, RDF2Vec, complex link prediction) and contextual
embeddings (BioBERT-based, temporal-difference embeddings) reframed open
discovery as link prediction on a literature KG — and crucially
adopted time-sliced evaluation: train on literature
before year Y, test whether the model predicts links that were actually
published after Y. This is the field's hard-won, honest evaluation
protocol and donto should adopt it directly. Embedding methods improved
recall of plausible links but lost interpretability (you get a score,
not a B-path), spawning a tension between ranked-list quality and
explainability that remains unresolved.
The frontier (2024-2026) is finally agentic and partially
multi-perspective — which is where the founder's vision overlaps
most and is least uniquely novel. Markus Buehler's SciAgents (MIT,
2024/2025, Advanced Materials) samples paths through a large
ontological knowledge graph (built from ~1,000 papers, 33K
nodes/49K edges) and runs a multi-agent team — an Ontologist that
defines the concepts on the path, Scientist agents that draft a
hypothesis spanning the path, and a Critic that adversarially reviews —
to surface cross-domain connections (e.g., silk ↔︎ energy-intensive
materials) that classical ABC could never reach because the link is a
multi-hop, multi-domain narrative rather than a single B-term. Google
DeepMind's AI Co-Scientist (Nature, 2025/2026) runs
Generation/Reflection/Ranking/Evolution/Proximity/Meta-review agents
with an Elo tournament of "scientific debates" to rank competing
hypotheses, grounded in literature + ChEMBL/UniProt, and produced
experimentally validated results (a liver-fibrosis
drug-repurposing candidate that blocked ~91% of a scarring response at
Stanford; an antimicrobial-resistance mechanism that matched years of
unpublished lab work at Imperial). These systems have already
operationalized "agents break a problem down, propose cross-domain
links, then critique/rank them" at impressive quality. What they have
NOT done is (a) decompose entities through the full spectrum of
human analytical lenses (they are scientifically/biomedically
scoped — causal/mechanistic, not
philosophical/aesthetic/semiotic/teleological), (b) treat identity as a
query-time hypothesis, or (c) persist the millions of
rejected/speculative/contradictory links as durable, evidence-anchored
legal state. They generate, rank, surface the top few, and discard the
rest. That discard is donto's white space.
Foundational works:
Undiscovered Public Knowledge — Don R. Swanson
(1986): Knowledge can be logically implied by the union of two existing
literatures yet remain unknown because no human read both. The founding
premise of 'connections nobody made' — and almost verbatim the founder's
thesis. Published in Library Quarterly; the related 'A ten-year update'
(1996) extended it. https://www.journals.uchicago.edu/doi/10.1086/601720
Fish Oil, Raynaud's Syndrome, and Undiscovered Public
Knowledge (the ABC model) — Don R. Swanson (1986): The
canonical existence-proof and the ABC syllogism: A→B and B→C in
disjoint, non-co-citing literatures imply a testable A→C. Hypothesis
(fish oil treats Raynaud's) was clinically confirmed in 1989. This is
the precise mechanism donto's cross-lens link discovery generalizes. https://muse.jhu.edu/article/403510/summary
Migraine and Magnesium: Eleven Neglected
Connections — Don R. Swanson (1988): Second validated
demonstration — 11 independent indirect links all supporting
magnesium-deficiency→migraine, later clinically supported. Shows the
method finds convergent multi-path evidence, not just single
bridges — relevant to donto scoring a link by how many lenses
independently surface it. http://abel.lis.illinois.edu/tutorial/swanson_pbm_1988.pdf
Arrowsmith System + open/closed discovery — Don R.
Swanson & Neil R. Smalheiser (1997-1999): First operational LBD
tool: two PubMed searches define literatures A and C; system computes a
ranked 'B-list' of shared intermediate terms filtered by semantic type.
Established the open-discovery (A→?) vs closed-discovery (A→?→C)
distinction that maps onto donto's generate-vs-verify split. https://en.wikipedia.org/wiki/Arrowsmith_System
The Place of Literature-Based Discovery in Contemporary
Scientific Practice — Neil R. Smalheiser (2008/2012): The
field's honest self-assessment: ABC finds only a narrow slice of
possible discoveries (single-bridge links), generates overwhelming
candidate volumes, and includes 'gap analysis' (problems nobody is
studying because they fall between disciplines) — directly naming the
combinatorics and inter-disciplinary-gap problems donto inherits. http://abel.lis.illinois.edu/tutorial/smalheiser_LBD_preprint_2008.pdf
Literature-based knowledge discovery: the state of the art
(survey) — Sebastian, Siew, Orimaye et al. (2012):
Comprehensive map of ranking methods (linking-term measures, tf-idf,
association measures) and systems (Arrowsmith, BITOLA, RaJoLink). Names
the core limitations donto must solve: knowledge bottlenecks,
terminology inconsistency, ranking reliability, scaling cost, and
experimental validation difficulty. https://arxiv.org/pdf/1203.3611
SemMedDB / Semantic MEDLINE (typed semantic
predications) — Halil Kilicoglu, Thomas Rindflesch et al. (NLM)
(2012): Moved LBD from co-occurrence to ~130M typed
subject-predicate-object predications (e.g., TREATS, CAUSES) over 37M
citations, enabling discovery over a typed knowledge graph — the closest
classical precursor to donto's predicate/quad substrate. (Deprecated by
NLM 31 Dec 2024.) https://academic.oup.com/bioinformatics/article/28/23/3158/195282
Modern AI systems:
LION LBD — Interactive neural/co-occurrence LBD
system for cancer biology over ~27M PubMed abstracts; NER-grounded
entities to 6 ontologies; ranks candidate A→B→C links with a menu of
association measures (Jaccard, normalized PMI, SCP, chi-squared,
log-likelihood); supports both open and closed discovery with drill-down
to source sentences. Adds a CNN 'hallmarks of cancer' classifier.
[Replicated 5 of Swanson's classic discoveries and 5 modern cancer
A-B-C chains; but honestly reports targets at rank 56-299 (closed) and
rank 15-120,000 (open); manual review judged ~44% (closed) / ~34% (open)
of top candidates potentially valid. Open-source at
lbd.lionproject.net.]https://pmc.ncbi.nlm.nih.gov/articles/PMC6499247/
Neural open/closed LBD + KG link-prediction for drug
repurposing — Reframes LBD as link prediction on a
literature/biomedical knowledge graph using KG embeddings (TransE,
RDF2Vec, ComplEx) and BioBERT-based contextual embeddings; introduces
time-sliced evaluation (train pre-year-Y, test post-Y published links).
Applied to COVID-19 and Alzheimer's drug repurposing on SemMedDB-derived
graphs. [Embedding methods improve recall of plausible links but
sacrifice the interpretable B-path; time-slicing produced ranked drug
candidates with subsequent literature/clinical-trial support. The
field's standard honest evaluation protocol now.]https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0232891
SciAgents — MIT (Ghafarollahi & Buehler).
Samples PATHS through a large ontological knowledge graph (~33K
nodes/49K edges from ~1,000 papers) and runs a multi-agent team —
Ontologist (defines concepts on the path), Scientist agents (draft +
refine a hypothesis spanning the path), Critic (adversarial review) — to
surface cross-domain connections classical ABC cannot reach (multi-hop,
multi-domain narratives). Closest existing analog to the founder's
agentic many-step decomposition. [Published in Advanced Materials
(2025); generated novel bio-inspired materials hypotheses (e.g.,
silk-mycelium, dandelion-pigment composites) rated more novel/feasible
than baselines; open-source (lamm-mit/SciAgentsDiscovery). Validation is
mostly in-silico/LLM-judged, not yet wet-lab.]https://arxiv.org/abs/2409.05556
Google DeepMind AI Co-Scientist — Multi-agent
(Generation, Reflection, Ranking, Evolution, Proximity, Meta-review +
Supervisor) hypothesis engine; ranks competing hypotheses via an Elo
tournament of simulated 'scientific debates' (AlphaGo-style self-play);
grounds in literature + ChEMBL/UniProt/AlphaFold; scales test-time
compute on verification. [Nature paper (2025/2026). Experimentally
validated: a liver-fibrosis drug-repurposing candidate blocked ~91% of a
scarring response at Stanford; reproduced in days an
antimicrobial-resistance gene-transfer mechanism that took an Imperial
College lab years (unpublished); Calico confirmed an
integrated-stress-response hypothesis. Strongest evidence to date that
agentic LBD yields real discoveries.]https://deepmind.google/blog/co-scientist-a-multi-agent-ai-partner-to-accelerate-research/
SKiM (Serial KinderMiner) + reproducible-pipelines
LBD — Generalized, domain-agnostic LBD system (Cowell/Blewitt)
extending ABC beyond biomedicine with simple, scalable ranking; part of
a 2025 push ('Make LBD Great Again through Reproducible Pipelines') to
fix the field's reproducibility/gold-standard/time-slicing crisis and
standardize evaluation. [SKiM recovers Swanson's discoveries and
scales to PubMed; the reproducibility movement is a reaction to LBD
results that don't replicate across implementations — a direct warning
for donto's evaluation design.]https://arxiv.org/pdf/2502.16450
Relevance to the lens engine: BORROW: (1) The
open/closed discovery distinction is donto's exact two-mode architecture
— open discovery = generate speculative cross-lens links; closed
discovery = the verification/curation pass on a suspected link. Name and
build both explicitly. (2) Time-sliced evaluation is the field's
hard-won, non-gameable validation protocol: train donto on the corpus
pre-year-Y, measure whether its top-ranked machine-proposed links were
later asserted/published. This is how you prove the engine works without
manual labeling, and it is essentially the only honest LBD metric. (3)
Convergent multi-path scoring (Swanson's 'eleven connections', LION's
path-accumulation functions): a cross-lens link is far more credible
when MULTIPLE independent lenses surface it — donto should rank a
hypothesis-link by how many of its lenses converge on it, not by any
single lens's confidence. (4) Typed predications over bag-of-words
(SemMedDB lesson) — donto's quad structure already has this advantage;
preserve typed argument edges. (5) Drill-down to the source byte: LION's
and Arrowsmith's usability came from letting a human see the originating
sentence — donto's evidence-anchoring is the same affordance and is
essential for trust. AVOID: (1) The ranking-precision trap — LION
honestly shows true links buried at rank 15-120,000 in open mode; raw
fan-out without a strong scorer produces an unusable haystack. donto's
many-lens combinatorics will be far worse than ABC's single-B-term
blowup, so ranking/pruning is THE make-or-break problem, not generation.
(2) Co-occurrence-only signal generates spurious links; prefer
typed/argumentative edges and require explanatory paths. (3) The
reproducibility crisis — pin corpora, seeds, and evaluation splits from
day one. (4) Don't discard the rejected candidates the way
SciAgents/Co-Scientist do — that persistence IS donto's differentiator
(see white space).
Already done vs white space: ALREADY DONE (the
founder should NOT assume 'no one has thought to do this'): The core
thesis — that machine-assembled connections across disjoint knowledge
can constitute genuine, validated discovery — is 40 years old and
clinically proven (Swanson 1986/1988). Open/closed discovery,
candidate-link ranking, scaling to ~27-37M documents, typed-predication
knowledge graphs, KG-embedding link prediction with time-sliced
evaluation, and now multi-agent path-sampling + adversarial critique +
tournament ranking (SciAgents, Co-Scientist) are all built and, in the
agentic case, producing wet-lab-validated discoveries in 2025-2026.
'Agents decompose a problem, propose cross-domain links, and
critique/rank them' is effectively the state of the art, not white
space. GENUINE WHITE SPACE for donto: (1) MANY-LENS DECOMPOSITION as the
link-generation substrate. Every LBD system to date is mono-perspective
— biomedical/causal/mechanistic. None decomposes entities through the
full spectrum (philosophical, semiotic, teleological,
aesthetic, phenomenological, ethical, mereological, ecological) and then
mines link candidates at lens INTERSECTIONS. LBD finds A-B-C bridges
within one ontology; donto's bet is that the richest unmade connections
live between incommensurable lenses, which no LBD system
attempts. (2) PERSISTING the rejected/speculative/contradictory
candidates as durable, evidence-anchored, paraconsistent legal state.
Every LBD/agentic system generates millions of candidates, surfaces the
top-k, and throws the rest away. donto's hypothesis_only +
contradiction-frontier + supports/rebuts/undercuts edges let the
discarded 99.9% remain queryable forever and be re-ranked as the corpus
and lenses evolve — a standing 'latent-structure reservoir' no LBD
system maintains. (3) IDENTITY-AS-HYPOTHESIS at query time. LBD assumes
entity grounding/NER is settled before discovery; donto lets the merge
itself be a weighted, lens-dependent hypothesis — which is where many
false LBD links actually come from (spurious co-occurrence of
ambiguously-grounded entities). (4) Lean-4 certification of the rare
valuable link — formal shape/rule verification of a discovered
relationship has no analog in LBD, which validates only
empirically/clinically.
Hard problems:
RANKING / combinatorial explosion is the central unsolved problem:
open discovery already produces overwhelming candidate lists in
single-ontology ABC (LION buries true links at rank 15-120,000); donto's
cross-product of many lenses over all entities makes this dramatically
worse — the scorer, not the generator, decides whether the engine is
usable.
EVALUATION without ground truth: there is no gold standard for 'a
connection nobody made' (by definition). Time-slicing (predict
post-year-Y published links) is the only honest metric, but it only
validates connections that eventually got published — it cannot
score genuinely novel never-published links, which are exactly the
target. This is a deep epistemic bind.
SPURIOUS correlation vs causal/explanatory link: co-occurrence and
embedding similarity surface statistically-associated but meaningless
pairs; distinguishing a real latent relationship from coincidence at
scale remains unsolved (LION's own undirected co-occurrence edges
flagged as a key weakness).
NOISE from extraction and entity grounding: LBD links are only as
good as NER/relation extraction; ambiguous entity resolution
manufactures false bridges. donto's identity-as-hypothesis helps but
pushes the ambiguity into the ranking, not away.
REPRODUCIBILITY: LBD results notoriously fail to replicate across
implementations (the 2025 'reproducible pipelines' movement exists
because of this); corpus, preprocessing, and split choices dominate
outcomes.
INCOMMENSURABILITY across lenses: assembling a coherent A→C claim
that spans, e.g., a phenomenological and an economic lens has no
established semantics — how do you even type a cross-lens edge,
let alone score its strength? No prior LBD work addresses inter-paradigm
linking.
VALIDATION cost / the last mile: even perfectly ranked hypotheses
require expensive human or wet-lab verification; the rate-limiting step
is curation, and the rare valuable link is hidden among many
plausible-but-wrong ones (Co-Scientist mitigates with tournaments but
still needs a lab).
INTERPRETABILITY vs accuracy tradeoff: embedding/LLM methods rank
better but lose the explicit explanatory path that makes a discovery
trustworthy and actionable; donto must deliver both a score AND a
human-readable lens-path.
bisociation-computational-creativity
The founder's intuition — "connections no one thought of, because no
one holds all the lenses at once" — is, almost word for word, Arthur
Koestler's BISOCIATION. In "The Act of Creation" (1964), Koestler argues
every creative act (the comic Haha, the scientific Aha, the artistic Ah)
shares one structure: perceiving a situation or idea simultaneously in
two self-consistent but HABITUALLY INCOMPATIBLE "matrices" / frames of
reference (M1, M2). Ordinary thought is ASSOCIATION — moving within a
single plane/matrix. Creativity is BISOCIATION — the collision/fusion of
two planes that normally never touch. The single richest precedent: the
value is not the facts inside a frame, it is the relation that springs
from intersecting two frames. This is the founder's thesis, stated in
1964.
Koestler's idea was operationalized into a real computational program
by the EU FP7 BISON project (2008-2012), summarized in Michael Berthold
(ed.), "Bisociative Knowledge Discovery: An Introduction to Concept,
Algorithms, Tools, and Applications" (Springer LNCS 7250, 2012; 32
chapters; consortium incl. Berthold/Konstanz, Nada Lavrac & Dunja
Mladenic/Jozef Stefan Institute, Werner Dubitzky, Christian Borgelt).
They formalized bisociation as discovery of bridges between
weakly-connected or disjoint "domains" inside a heterogeneous graph
called a BisoNet (Bisociative Information Network — nodes are
concepts/units from many sources, edges are evidential relations). Three
computational TYPES of bisociation were distinguished: (1) bridging
CONCEPTS (a single term/node co-occurring in two otherwise-unlinked
domains — the classic b-term); (2) bridging GRAPHS / structural
similarity (two subgraphs in different domains share an isomorphic
relational pattern — analogy); (3) bridging by GRAPHS (a connecting
path/subgraph that links two domains). Crucially they tried to make
"bisociativeness" a RANKABLE score — distinguishing a genuinely
surprising cross-domain link from a trivially common one.
The concrete, working instantiation is CrossBee (Cross-Context
Bisociation Explorer; Jursic, Cestnik, Urbancic, Lavrac, ICCC 2012; http://crossbee.ijs.si). You feed it
two document sets from two domains (e.g. two non-interacting
literatures); it ranks candidate BRIDGING TERMS ("b-terms") by a
BISOCIATION SCORE computed as an ENSEMBLE of text-mining heuristics
(frequency, tf-idf, outlier-ness, appearance in both domains, etc.)
voting together, then offers side-by-side document inspection so a human
EXPERT verifies the link. This is a direct ancestor of donto's
lens-engine: machine over-generates candidate cross-context links; human
(or downstream certifier) disposes. Its intellectual root is older still
— Don Swanson's LITERATURE-BASED DISCOVERY ("Undiscovered Public
Knowledge," 1986): two literatures (Raynaud's disease and fish oil;
migraine and magnesium) never co-cited, but logically linked through a
shared bridging concept B (blood viscosity), yielding a testable A-C
hypothesis later clinically confirmed. Swanson's ABC model (A-B + B-C
therefore maybe A-C) is the canonical computational template for
"relationships no one drew because the two literatures were
isolated."
Running alongside bisociation is CONCEPTUAL BLENDING / conceptual
integration (Gilles Fauconnier & Mark Turner, "The Way We Think,"
2002). Where Koestler collides two frames, blending integrates them: two
(or more) INPUT mental spaces selectively project into a BLENDED space
via a GENERIC space, and the blend develops EMERGENT structure not
present in either input (their canonical example: "the Buddhist monk"
riddle; "this surgeon is a butcher"). Blending has been computationally
modeled: Joseph Goguen's algebraic/category-theoretic Unified Concept
Theory; Pereira's DIVAGO (2005, optimality-principle metrics); and the
EU COINVENT project (Schorlemmer, Kutz, Confalonieri, Pease, et al.,
2014-2016) which models blends as AMALGAMS (knowledge-transfer via
colimits in category theory) and applied them to mathematical concept
invention and music. The third lens is Margaret Boden ("The Creative
Mind," 1990; "Creativity and Art," 2010): creativity comes in three
kinds — COMBINATIONAL (novel combinations of familiar ideas —
bisociation/blending live here), EXPLORATORY (finding unvisited points
inside an existing conceptual space / set of rules), and
TRANSFORMATIONAL (changing the rules of the space so
previously-impossible ideas become thinkable). The donto vision spans
all three: many lenses = many conceptual spaces; cross-lens intersection
= combinational; pushing a lens "to the utmost" = exploratory; a lens
that rewrites another's assumptions = transformational. Modern LLM work
has revived all of this: PopBlends (Petridis et al., CHI 2023 —
LLM+knowledge-base conceptual blends for design), LiveIdeaBench
(2024-25, benchmarking LLM divergent thinking on single-keyword
scientific idea generation), Nature/Sci-Reports studies showing LLMs
reach population-average creativity but not top-decile humans, and
multi-agent hypothesis engines (Google's AI Co-Scientist 2025, SciMON
ACL 2024, Sakana AI Scientist) that generate-debate-rank-evolve
cross-literature hypotheses — exactly the "agentic many-lens
generate-then-verify" loop, but without a paraconsistent substrate to
hold the rejected/contradictory candidates.
Foundational works:
The Act of Creation (bisociation) — Arthur Koestler
(1964): Creativity = bisociation: perceiving one idea in two
self-consistent but HABITUALLY INCOMPATIBLE frames/matrices (M1,M2) at
once. The payoff is the collision of frames, not the contents of one
frame. This IS the founder's thesis, stated 60 years ago — the single
most important precedent. https://en.wikipedia.org/wiki/The_Act_of_Creation
Undiscovered Public Knowledge / Raynaud's-fish-oil +
migraine-magnesium (Literature-Based Discovery, ABC model) —
Don R. Swanson (1986 (Raynaud's), 1988 (migraine), Arrowsmith 1990s):
Two non-interacting literatures (A and C) linked through a shared
bridging concept B yield a true, testable, previously-undrawn
relationship A-C. The canonical 'connection no human drew because the
two corpora were isolated' — the computational ancestor of the whole
lens-engine idea, with real clinical confirmation. https://news.uchicago.edu/story/don-r-swanson-information-science-pioneer-1924-2012
The Way We Think: Conceptual Blending and the Mind's Hidden
Complexities (conceptual integration) — Gilles Fauconnier &
Mark Turner (2002 (theory from mid-1990s)): Two input mental spaces
project selectively into a blended space (via a generic space); the
blend has EMERGENT structure absent from both inputs. Intersection of
frames doesn't just connect — it CREATES new relational structure.
Directly models 'a relationship that emerges only at the intersection of
lenses.' https://pages.ucsd.edu/~scoulson/spaces/fauconnier05.pdf
The Creative Mind: Myths and Mechanisms (three kinds of
creativity) — Margaret A. Boden (1990 / 2nd ed. 2004;
Creativity and Art 2010): Combinational (novel combos — where
bisociation/blending sit), Exploratory (new points in an existing
conceptual space), Transformational (changing the space's rules). Gives
the founder a taxonomy: each lens is a conceptual space; cross-lens
links are combinational; pushing a lens to the limit is exploratory; a
lens rewriting another's rules is transformational. https://www.themarginalian.org/2025/08/22/margaret-boden-creativity/
Bisociative Knowledge Discovery: An Introduction to Concept,
Algorithms, Tools, and Applications (BISON project, BisoNets) —
Michael R. Berthold (ed.); Dubitzky, Kötter, Lavrač, Mladenić, Borgelt
et al. (2012 (Springer LNCS 7250; EU FP7 BISON 2008-2012)): The most
direct prior art: an entire research program that computationalized
Koestler. Formalized bisociation as bridging across domains in a BisoNet
(heterogeneous evidence graph), defined three bisociation types
(bridging concept / structural-similarity graph / bridging graph), and
tried to make 'bisociativeness' a rankable score. Read this before
building — it is donto's lens-engine, minus the paraconsistent substrate
and the agents. https://link.springer.com/book/10.1007/978-3-642-31830-6
CrossBee: Cross-Context Bisociation Explorer (+ ensemble
b-term heuristics) — Matjaž Juršič, Bojan Cestnik, Tanja
Urbančič, Nada Lavrač (2012 (ICCC)): A working generate-then-verify
system: machine ranks candidate bridging terms by an ENSEMBLE of
heuristics (a bisociation score), human expert inspects/confirms.
Exactly donto's intended workflow (over-generate speculative cross-lens
links, curate the rare valuable ones) — proves the architecture and
shows where it gets stuck (too many candidates, expert is the
bottleneck). https://computationalcreativity.net/iccc2012/wp-content/uploads/2012/05/226-Jursic.pdf
Computational conceptual blending: Goguen's Unified Concept
Theory, Divago, COINVENT amalgams — Joseph Goguen; Francisco
Câmara Pereira (Divago); Schorlemmer, Kutz, Confalonieri, Pease
(COINVENT) (1999-2006 (Goguen/Divago); 2014-2016 (COINVENT FP7)): Shows
blending CAN be made algorithmic (blends = amalgams / colimits in
category theory; optimality principles as quantitative metrics) and that
the hard part is SELECTING good blends from a combinatorial explosion of
possible ones. The selection/optimality problem is precisely donto's
'verify the rare valuable hypotheses' problem. https://www.iiia.csic.es/~enric/papers/Ch1-CoInvent.pdf
Modern AI systems:
CrossBee + TextFlows bridging-term workflows — Web
tool (crossbee.ijs.si) that takes two document sets, ranks cross-domain
bridging terms via an ensemble heuristic bisociation score, and gives
side-by-side inspection for human verification. Later re-implemented in
the TextFlows platform. [Reproduced Swanson's migraine-magnesium
bridging terms; academic adoption in the cross-domain literature-mining
community; the canonical running prototype of computational bisociation.
Not at web scale, single-pair-of-domains at a time.]http://crossbee.ijs.si/
PopBlends — LLM + knowledge-base pipeline that
auto-suggests conceptual blends (e.g. pop-culture x brand) using
Fauconnier-Turner blending strategies; supports both divergent and
convergent design ideation. [CHI 2023; user study: people found ~2x
more blend suggestions with half the mental demand vs. without.
Demonstrates LLM+KB beats LLM-alone for combinational creativity.]https://savvaspetridis.github.io/papers/popblends.pdf
Google AI Co-Scientist — Multi-agent (Gemini 2.0)
hypothesis-generation system:
Generation/Reflection/Ranking/Evolution/Proximity/Meta-review agents run
self-play scientific debate + ranking tournaments to evolve novel,
literature-grounded hypotheses. [Feb 2025; reported
wet-lab-validated proposals (drug repurposing for AML, novel
antimicrobial-resistance mechanism, liver-fibrosis targets). The closest
existing thing to 'agents generate cross-domain relationships, then
rank/verify' — but hypotheses live in a transient run, not a persistent
contradiction-preserving store.]https://deepmind.google/blog/co-scientist-a-multi-agent-ai-partner-to-accelerate-research/
SciMON / LiveIdeaBench / The AI Scientist (Sakana)
— SciMON (ACL 2024) retrieves 'inspirations' from prior literature and
optimizes ideas explicitly for novelty; LiveIdeaBench (2024-25)
benchmarks LLM divergent thinking from single keywords across 22
domains; Sakana's AI Scientist runs end-to-end
idea->experiment->paper. [SciMON shows novelty-optimized
retrieval beats vanilla generation; LiveIdeaBench (40+ models, 1180
keywords) finds idea-generation poorly predicted by general-intelligence
benchmarks — a key signal that lens-diversity, not model IQ, may drive
discovery.]https://arxiv.org/abs/2412.17596
Spark / serendipitous knowledge-discovery systems & ABC
LBD tools — Modern literature-based-discovery engines (Spark,
Arrowsmith descendants, SemMedDB-driven ABC pipelines, word-embedding
bridging-term detection) that mine bridging concepts between literatures
and rank candidate A-C hypotheses. [Active biomedical-discovery
subfield; main published finding is sobering — over-generation of
candidates and lack of agreed evaluation standards are the binding
constraints, not generation capacity.]https://link.springer.com/article/10.1186/s12859-019-2989-9
LLM divergent-creativity comparative studies —
Controlled studies measuring LLM
novelty/originality/flexibility/diversity vs. humans on
divergent-thinking, problem-solving, and creative-writing tasks.
[2025: LLMs match/exceed average human creativity but top-decile
humans still beat every model; LLM outputs cluster (lower diversity) —
relevant warning that a single agent over many lenses may collapse
toward homogeneous 'connections.']https://www.nature.com/articles/s41598-025-25157-3
Relevance to the lens engine: BORROW: (1) The
vocabulary and metrics — frame donto's payoff explicitly as bisociation
(Koestler) and conceptual blending (Fauconnier-Turner): a discovered
relationship is valuable in proportion to how HABITUALLY INCOMPATIBLE
its two source lenses/domains are. Steal CrossBee's idea of a rankable
BISOCIATION SCORE computed as an ENSEMBLE of heuristics over a
heterogeneous graph — donto already is that graph (a BisoNet by another
name). (2) Swanson's ABC bridging template is the cleanest first
product: surface A-B and B-C claims sitting in two different ctx:*
contexts/lenses that are never co-cited, propose A-C as a
hypothesis_only edge, anchor B as the bridge with evidence. (3) Boden's
taxonomy gives a roadmap and honest framing: most cross-lens output will
be combinational; treat exploratory (push one lens to its limit) and
transformational (one lens rewrites another's assumptions) as harder,
rarer, higher-value tiers. (4) COINVENT/Divago teach that the bottleneck
is SELECTION/optimality among a combinatorial explosion of blends —
design for ranking and pruning from day one, not generation. AVOID / be
warned: (a) The b-term/blend space explodes combinatorially; CrossBee,
LBD tools, and blending systems ALL hit the same wall — far more
candidate links than any human can review, and no agreed way to tell
signal from noise. Donto's edge must be that its paraconsistent
substrate can HOLD the explosion as legal hypothesis_only state forever
(where prior systems had to discard), with the Lean-4 overlay +
evidence-anchoring + argument edges (supports/rebuts/undercuts) as the
eventual VERIFY/prune mechanism — this is genuinely the missing piece in
every prior system. (b) LLM creativity clusters/homogenizes (Nature
2025); a single agent run over many lenses risks producing samey
'connections.' Force lens-diversity structurally (distinct
prompts/personas/conceptual spaces per lens, as the AI Co-Scientist's
specialized agents do) and measure diversity, not just count. (c) Resist
raw-volume framing — Koestler, Swanson and BISON all insist the win is
the RARE high-bisociativeness link across distant frames, not millions
of intra-frame facts; the founder's refined
'depth-of-decomposition-then-intersection' view is correct and should be
the headline.
Already done vs white space: ALREADY DONE (the
founder must not reinvent these): (1) The core idea — "creativity =
connecting two habitually-incompatible frames" — is Koestler 1964, named
bisociation; it is not new. (2) "Find relationships no human drew
because two corpora/frames are isolated" is Swanson's literature-based
discovery (1986) and was clinically validated. (3) An entire EU research
program (BISON, Berthold ed. 2012) computationalized exactly this:
BisoNets, three formal bisociation types, rankable bisociativeness, and
a working tool (CrossBee) that over-generates cross-domain bridging
links and has a human verify them. (4) "Intersection of frames yields
emergent relational structure" is conceptual blending (Fauconnier-Turner
2002), and it has been made algorithmic (Goguen, Divago, COINVENT
amalgams). (5) "Agents generate-debate-rank-evolve cross-literature
hypotheses" is the 2024-25 AI-co-scientist / SciMON / Sakana wave. So
"no one has thought to do this" is, frankly, false at the level of the
concept and even of single-pair tooling. GENUINE WHITE SPACE (where
donto is actually novel): (a) SCALE + MANY LENSES SIMULTANEOUSLY — every
prior system bisociates TWO domains/literatures at a time chosen by a
human; nobody runs the FULL spectrum of analytical lenses agentically
over ALL entities at once and harvests the combinatorial set of
cross-lens intersections. (b) A PARACONSISTENT, CONTRADICTION-PRESERVING
SUBSTRATE THAT CAN HOLD THE SPECULATION FOREVER — this is the deepest
gap. CrossBee/LBD/COINVENT/AI-Co-Scientist all generate transient
candidates that are discarded if not immediately validated; none has a
legal, queryable, permanent home for unanchored, mutually-contradictory,
hypothesis_only relationship-claims with typed argument edges. (c)
IDENTITY-AS-HYPOTHESIS + EVIDENCE-FIRST + LEAN-CERTIFICATION as the
curation layer — using formal proof to certify the rare valuable shapes
out of the speculative frontier is, as far as the literature shows,
unattempted. (d) Closing the loop: generate (agents) -> hold
(paraconsistent quad store) -> rank (bisociation score) -> certify
(Lean) -> promote, as one persistent system rather than a one-shot
pipeline. The novelty is NOT the lens idea and NOT bisociation; it is
the AGENTIC-MANY-LENS + PERSISTENT-PARACONSISTENT-HOLD + FORMAL-VERIFY
combination at substrate scale.
Hard problems:
COMBINATORIAL EXPLOSION: with L lenses over N entities the
cross-lens link space is enormous (>= N^2 x L^2 candidate
intersections); generation is trivially cheap, so the system is
instantly drowned in candidates. Every prior system (CrossBee, LBD,
COINVENT) hits this wall.
EVALUATION HAS NO GROUND TRUTH: the LBD field's own consensus is
that there are no agreed evaluation standards and results swing with the
dataset/method; a 'good' bisociation is defined by later human/empirical
validation that you don't have at generation time. Distinguishing a
profound cross-lens link from a coincidence is the central unsolved
problem.
SIGNAL-VS-NOISE / TRIVIALITY: most cross-frame links are either
trivially true (a stopword-like bridge term co-occurs everywhere) or
spurious. CrossBee needed an ensemble of heuristics + a human just to
surface the few real ones; ranking bisociativeness reliably is unsolved
at scale.
THE OPTIMALITY/SELECTION PROBLEM (Goguen, COINVENT):
conceptual-blending theory's optimality principles (e.g. 'good form,'
'web,' 'unpacking') are notoriously hard to formalize and compute;
choosing the few good blends from the combinatorial set has no clean
algorithm.
LLM HOMOGENIZATION: a single agent over many lenses tends to produce
clustered, low-diversity outputs (Nature 2025), undercutting the whole
'connections no one thought of' premise unless lens-diversity is
enforced structurally and measured.
DEFINING 'HABITUALLY INCOMPATIBLE' / DOMAIN BOUNDARIES
OPERATIONALLY: bisociativeness depends on the two frames being genuinely
distant; but quantifying frame-distance/domain-membership in a substrate
where everything is in one graph is itself unsolved — too-distant looks
like noise, too-close is mere association.
VERIFICATION COST AND SCOPE: Lean-4 certification can validate
logical shapes/rules but cannot certify EMPIRICAL truth of a proposed
real-world relationship; bridging the gap between 'formally well-formed
hypothesis' and 'true discovery' still requires external
evidence/experiment, which doesn't scale to millions of candidates.
PROVENANCE & PARACONSISTENT BLOAT: holding every speculative,
contradictory relationship-claim forever risks an unmanageable frontier;
deciding what to garbage-collect / down-weight vs. preserve as legal
state, without losing the rare gem, is an open governance problem.
analogy-structure-mapping
Analogical reasoning is the most directly relevant intellectual
tradition to the lens-engine vision, because a "lens comparison across
entities" IS structurally an analogy: it asks whether the system of
relations holding among one entity's parts also holds among
another's, independent of surface features. The field's canonical theory
is Dedre Gentner's Structure-Mapping Theory (1983): an analogy maps
relational structure from a base domain to a target,
and the quality of a mapping is governed by the systematicity
principle — people (and good algorithms) prefer to carry over
deep, interconnected systems of higher-order relations (causal,
mathematical) rather than isolated attributes or surface features. This
was operationalized in the Structure-Mapping Engine
(SME) (Falkenhainer, Forbus & Gentner, 1986/1989), a
local-to-global structural alignment algorithm that, given two
predicate-calculus representations, returns correspondences, a
structural-evaluation score, and candidate inferences — new
claims about the target imported from the base. The candidate-inference
output is exactly the "relationship no one thought to draw" the founder
wants: SME doesn't just match, it generates novel hypotheses by
projecting unmatched base structure onto the target.
The second great lineage is Douglas Hofstadter's Fluid Analogies
Research Group (FARG) and its books Fluid Concepts and Creative
Analogies (1995) and Surfaces and Essences (Hofstadter
& Sander, 2013). Hofstadter's radical claim — "analogy is the core,
the fuel and fire, of all thinking" — reframes categorization,
perception, and concept-formation themselves as analogy-making. The
computational model, Copycat (Mitchell &
Hofstadter), differs philosophically from SME: rather than receiving
fixed representations and aligning them, Copycat builds its
representations fluidly via a Slipnet (a conceptual
network whose link-lengths/"conceptual slippage" change dynamically), a
Workspace (a blackboard of perceptual structures), a
Coderack of stochastic codelets
(parallel micro-agents that compete/cooperate), and a global
temperature that anneals the search and serves as a
quality proxy. The deep lesson for donto: representation is not
given, it is constructed under pressure, and the same situation
supports many rival construals — which maps cleanly onto donto's
"identity-is-a-hypothesis" and paraconsistent-frontier stance.
The third tradition is Fauconnier & Turner's Conceptual
Blending / Conceptual Integration (1990s–2002, The Way We
Think). Where structure-mapping is asymmetric (base→target),
blending is many-to-one: two-or-more input mental spaces, a generic
space of shared structure, and a blend that selectively
projects from each input and crucially generates emergent
structure (via composition, completion, elaboration) present in
neither input. This is the theoretical name for "relationships
that emerge at the intersection of lenses" — the payoff the founder
describes is essentially emergent structure in a blend. Computational
blending (Goguen's algebraic/category-theory amalgams, the COINVENT
project, divago) exists but is brittle and hard to evaluate.
The scale story arrived with Dafna Shahaf, Aniket Kittur, Joel Chan
and Tom Hope: "Accelerating Innovation Through Analogy
Mining" (KDD 2017, Best Paper) learned purpose and
mechanism vector representations from product descriptions
(crowdsourcing + RNNs) so that analogies could be mined from messy
real-world repositories (the patent/idea corpus) — finding products with
the same purpose but different mechanism, or vice versa. SOLVENT
/ the Analogy Search Engine (Chan, Hope et al., 2018) extended
this to scientific papers, annotating
background/purpose/mechanism/findings and embedding them so cross-domain
research analogies surface that pure IR misses. This is the closest
existing relative of donto's vision at the document level — but it uses
one coarse facet schema (purpose×mechanism), not a full
spectrum of philosophical/linguistic/temporal/ethical/etc. lenses, and
it does not hold contradictory mappings as durable state.
The 2023–2026 LLM wave reopened everything. Webb, Holyoak & Lu
(Nature Human Behaviour 2023) reported "emergent analogical
reasoning" in GPT-3/4 (Raven's-style matrices, letter strings, story
analogies) at or above human level zero-shot. This was sharply
contested: Lewis & Mitchell (2024) and Hodel & West showed
performance collapses on counterfactual
variants (permuted/synthetic alphabets) where humans stay robust —
evidence the apparent reasoning leans on training-data similarity, not
domain-general structure mapping. Webb et al. (2024) replied that with
code-execution/tool augmentation the capacity generalizes. Newer work
splits the difference and is most useful to donto: hybrid systems like
YARN (Khojasteh et al., 2026) explicitly
re-fuse Gentner-style structural mapping with LLM-derived
multi-level abstractions, finding that pure LLM prompting fails on "far"
(low-surface-similarity) analogies and pure SME scores below random, but
LLM-abstraction-then-structural-align beats both — a direct template for
donto. "Parallelograms Strike Back" (2026) even argues LLMs now generate
better analogies than people in some settings. Net: LLMs are excellent
at proposing candidate cross-domain relations and at
abstracting messy text into mappable structure, but unreliable
at certifying whether a mapping is structurally valid and not
surface-pattern-matching — precisely the gap donto's evidence-anchoring
+ Lean-4 certification + paraconsistent hold-without-collapse could
fill.
Foundational works:
Structure-Mapping Theory (SMT) — Dedre Gentner
(1983): An analogy maps a SYSTEM of relations from base to target,
ignoring surface/object features; the 'systematicity principle' says
deep interconnected higher-order (causal/mathematical) structure is
preferred over isolated attributes. This is the formal definition of
what 'comparing two entities through a lens' actually is. https://groups.psych.northwestern.edu/gentner/papers/Gentner83.pdf
The Structure-Mapping Engine (SME) — Brian
Falkenhainer, Kenneth Forbus, Dedre Gentner (1986 / 1989): A
local-to-global structural alignment algorithm that returns not just
correspondences and a match score but CANDIDATE INFERENCES — novel
claims projected from base onto target. The candidate-inference step is
the literal engine for 'a relationship no one thought to draw.' https://groups.psych.northwestern.edu/gentner/papers/FalkenhainerForbusGentner89.pdf
Copycat / Fluid Concepts and Creative Analogies
(FARG) — Douglas Hofstadter & Melanie Mitchell (1995):
Analogy as 'high-level perception': representations are CONSTRUCTED
fluidly (Slipnet + Workspace + stochastic codelets + temperature), not
given. The same situation supports rival construals chosen under
pressure — the cognitive-science analogue of donto's
identity-as-hypothesis and contradiction frontier. https://en.wikipedia.org/wiki/Fluid_Concepts_and_Creative_Analogies
Surfaces and Essences: Analogy as the Fuel and Fire of
Thinking — Douglas Hofstadter & Emmanuel Sander (2013): The
strong thesis that ALL thinking — categorization, concept formation,
perception — is analogy-making. Justifies treating every 'lens' as an
analogy-generating apparatus rather than a mere feature extractor. https://www.basicbooks.com/titles/douglas-hofstadter/surfaces-and-essences/9780465018475/
Conceptual Blending / Conceptual Integration (The Way We
Think) — Gilles Fauconnier & Mark Turner (1998 / 2002):
Two+ input mental spaces + a generic space project selectively into a
BLEND that contains EMERGENT structure present in neither input (via
composition, completion, elaboration). This is the precise theoretical
name for the founder's 'relationships that emerge at the intersection of
lenses.' https://markturner.org/blendaphor.html
Analogical mapping / multiconstraint theory (similarity,
structure, purpose) — Keith Holyoak & Paul Thagard
(ACME/ARCS) (1989 / 1995): Analogy is governed by simultaneous soft
constraints — semantic similarity, structural parallelism, and pragmatic
purpose — satisfied in parallel rather than strict isomorphism. Argues a
lens-engine needs goal/purpose weighting, not just structural match. https://onlinelibrary.wiley.com/doi/10.1207/s15516709cog1303_1
Modern AI systems:
Analogy Mining (purpose-mechanism) — Learns
'purpose' and 'mechanism' vector embeddings from product/idea
descriptions (crowdsourcing + RNN) so analogies can be mined from large
messy repositories (e.g. the patent corpus): same purpose / different
mechanism, or vice versa. KDD 2017 Best Paper. [Best Paper + Best
Student Paper KDD 2017; found analogies usable by experts; seeded a
whole research line on computational analogy at scale.]https://arxiv.org/abs/1706.05585
SOLVENT / Analogy Search Engine — Mixed-initiative
system annotating scientific papers by
background/purpose/mechanism/findings, embedding them to surface
cross-domain research analogies that pure information retrieval misses.
[Found more (and more useful) analogies than IR baselines;
annotations generalized across domains; experts rated discovered
analogies inspiring. Closest existing relative of donto's vision at
document level — but ONE facet schema, no contradiction-holding.]https://arxiv.org/abs/1812.06974
SciAgents — Multi-agent (ontologist / scientist-1 /
scientist-2 / critic) system over a 33K-node, 48K-edge ontological
knowledge graph that samples RANDOMIZED heuristic PATHS between distant
concept nodes to seed cross-domain hypotheses, then expands and
critiques them; benchmarks novelty against Semantic Scholar.
[Published in Advanced Materials (Wiley, 2025); revealed 'hidden
interdisciplinary relationships previously considered unrelated' in
bio-inspired materials. Strongest existing proof that KG-path-sampling +
agentic critique surfaces non-obvious cross-domain relations — but no
paraconsistent persistence, no formal certification, single domain
validated.]https://arxiv.org/abs/2409.05556
Emergent analogical reasoning in LLMs (Webb, Holyoak,
Lu) — Showed GPT-3/4 solve novel matrix/letter-string/story
analogies zero-shot at or above human level — the empirical basis for
using LLMs as the 'lens' analogy-proposers. [Nature Human Behaviour
2023, highly cited; GPT-4 reported even stronger. Establishes LLMs as
competent analogy PROPOSERS.]https://www.nature.com/articles/s41562-023-01659-w
Counterfactual analogy evaluation (Lewis &
Mitchell) — Tests LLM analogy on counterfactual variants
(permuted/synthetic alphabets) far from training data; GPT performance
collapses while humans stay robust — the key caution against trusting
LLM-proposed mappings as ground truth. [Widely cited rebuttal;
reframed the debate as 'proposal vs verification.' Directly motivates
donto's need to HOLD-then-CERTIFY rather than trust LLM mappings.]https://arxiv.org/abs/2402.08955
YARN (LLM-abstraction + structure mapping) — Hybrid
pipeline: LLMs decompose narratives into units and produce multi-level
abstractions (conceptual/evaluative/narrative-arc/stage), then a
structural-mapping algorithm aligns them — explicitly re-fusing Gentner
SMT with LLMs. [Beats both pure-LLM CoT (0.46 vs 0.41 MCQ) and pure
structural mapping (which scores BELOW random, 0.17) on far/low-surface
analogies. Direct architectural template for donto: LLM abstracts ->
structural aligner verifies.]https://arxiv.org/abs/2603.29997
Parallelograms Strike Back — 2026 study arguing
LLMs now generate higher-quality analogies than human participants in
several generation settings. [Evidence the proposal/generation half
of the pipeline is increasingly LLM-solvable; shifts the bottleneck to
selection/verification/curation — donto's strong suit.]https://arxiv.org/abs/2603.19066
Relevance to the lens engine: A lens-to-lens
comparison across two entities IS an analogy in Gentner's exact sense,
so this field hands donto a ready vocabulary and tooling. BORROW: (1)
The systematicity principle as a relevance filter — rank
machine-proposed relationships by the SIZE and INTERCONNECTEDNESS of the
relational system they share under a lens, not by surface attribute
overlap; this is the antidote to the combinatorial-noise problem (most
cross-lens pairs will be junk). (2) SME's candidate-inference mechanism
as the literal generator of 'relationships no one thought to draw' —
when two entities align structurally under, say, the teleological lens,
project the UNMATCHED base structure onto the target as a new
hypothesis_only edge. (3) The Hope/Shahaf/SOLVENT purpose×mechanism
schema as proof that faceting documents into structured aspects and
embedding each facet separately yields better cross-domain matches than
holistic embeddings — donto generalizes this from 2 facets to N lenses.
(4) SciAgents' randomized KG-path-sampling between distant nodes as a
concrete way to PROPOSE candidate relationships across donto's 39.5M
statements without enumerating all pairs. (5) The YARN result that
LLM-abstraction-THEN-structural-alignment beats both pure LLM and pure
SME — donto should use LLM lenses to ABSTRACT entities, then a
structural/Lean-certified aligner to VALIDATE, never trusting the LLM's
raw mapping. (6) Copycat's temperature/codelet stochasticity and
Fauconnier-Turner's emergent-structure vocabulary to name and rank the
payoff. AVOID: (a) treating LLM-proposed analogies as ground truth — the
Lewis & Mitchell counterfactual collapse shows they pattern-match;
donto must keep them as weighted, evidence-anchored, contestable claims
(its native mode). (b) Requiring strict isomorphism (classic SME
brittleness) — Holyoak-Thagard multiconstraint and donto's
paraconsistency both argue for soft, purpose-weighted,
contradiction-tolerant matching. (c) One fixed facet schema — donto's
many-lens ambition is exactly the generalization SOLVENT stopped short
of.
Already done vs white space: ALREADY DONE (the
founder should not reinvent these): (1) The core theory that
cross-domain relationship discovery = structural analogy, with a working
algorithm that emits NOVEL hypotheses (SME candidate inferences) — 40
years old. (2) Analogy MINING AT SCALE over messy real-world
repositories using learned facet embeddings — Hope/Shahaf (patents 2017)
and SOLVENT (scientific papers 2018) already demonstrated 'find
cross-domain analogies humans missed, experts find them useful.' (3)
AGENTIC, multi-agent, KG-path-sampling cross-domain hypothesis
generation that 'reveals hidden interdisciplinary relationships' —
SciAgents (2024-25) is a published, peer-reviewed instance of a large
chunk of the founder's pitch, in materials science. (4) LLMs as
competent analogy proposers AND as the abstraction layer feeding a
structural aligner (YARN 2026). So 'use AI to break entities down and
find cross-domain relationships' is, at the level of a single facet
schema in a single domain, NOT new. GENUINE WHITE SPACE: (a) The FULL
SPECTRUM of lenses — every prior system uses one or a few facets
(purpose/mechanism; ontological KG edges). Nobody has run a dozen+
heterogeneous analytical lenses (phenomenological, semiotic, ethical,
aesthetic, mereological, teleological...) over the SAME entities and
looked for relationships at lens INTERSECTIONS. The
intersection-of-many-lenses is real white space. (b) PARACONSISTENT,
PERSISTENT HOLDING of speculative/contradictory machine-proposed
mappings as durable first-class state with typed argument edges
(supports/rebuts/undercuts) — every analogy-mining system today is
one-shot retrieval; none KEEPS the rejected and the contradictory
mappings as a queryable frontier over time. (c) EVIDENCE-ANCHORING each
proposed relationship to a source byte + bitemporal provenance —
SOLVENT/SciAgents do not anchor or version their analogies. (d) FORMAL
CERTIFICATION (Lean-4) of the rare valuable mappings' structural
validity — no analogy system formally proves a mapping's shape. The
combination (many-lens + agentic-proposal + paraconsistent-hold +
evidence-anchor + certify) at substrate scale is, as far as the
literature shows, unbuilt.
Hard problems:
COMBINATORIAL EXPLOSION / signal-to-noise: N entities x M lenses x
pairwise comparison is astronomically large, and the overwhelming
majority of cross-lens 'relationships' will be spurious. SME's
systematicity score and Holyoak-Thagard multiconstraint help rank, but
no scalable, calibrated relevance filter for many-lens intersections
exists. SciAgents' random-path sampling is a heuristic dodge, not a
solution.
EVALUATION / what counts as a GOOD discovered relationship: analogy
quality is notoriously hard to measure; SOLVENT and SciAgents fall back
on expert judgment or Semantic-Scholar novelty checks. There is no
agreed automatic metric distinguishing a profound cross-domain insight
from a superficial pun, so curation cost stays high.
LLM SURFACE-PATTERN-MATCHING vs genuine structure mapping: Lewis
& Mitchell's counterfactual collapse shows LLM-proposed analogies
may be training-data echoes, not valid structural mappings. Verifying a
machine mapping is actually structurally sound (not surface) is unsolved
at scale — this is exactly the gap donto's certification layer must
close.
REPRESENTATION / the 'tractability vs flexibility' dilemma (the
SME-vs-Copycat split): SME needs clean predicate-calculus input (where
does it come from?); Copycat builds representations fluidly but doesn't
scale beyond microworlds. Getting LLM-derived abstractions that are both
rich enough to map and clean enough to align reliably (YARN's finding:
'no single abstraction works best across all settings') is open.
CAUSAL and higher-order relation transfer: every recent system
(YARN, narrative-analogy work) reports that models capture
object/attribute correspondences but mis-transfer CAUSAL and
higher-order relations — yet higher-order relations are precisely what
systematicity says make an analogy valuable. The most valuable mappings
are the hardest to get right.
ASYMMETRY and DIRECTION: structure-mapping is directional
(base->target) and blending is many-to-one with selective projection;
deciding which entity is base, which lens dominates, and what to project
(vs suppress) into an emergent 'blend' has no principled automatic
answer.
GROUNDING / hallucinated relationships: agentic LLM proposers will
confabulate plausible-sounding cross-domain links with no evidential
basis; without donto-style mandatory evidence-anchoring, the discovery
engine becomes a serendipity-shaped hallucination generator.
kg-link-prediction-completion
Knowledge graph completion (KGC), a.k.a. link prediction, is the
machine-learning field devoted to scoring the plausibility of unstated
triples (h, r, t) so that missing/latent edges can be inferred from
observed ones. It is the single most directly relevant prior art to
donto's "discover relationships no one drew" vision — it has been
predicting unstated relationships at scale for a decade. The dominant
paradigm is the geometric/algebraic EMBEDDING model: TransE (Bordes et
al., NeurIPS 2013) treats a relation as a translation h + r ≈ t in real
vector space; DistMult uses a bilinear diagonal product; ComplEx
(Trouillon et al., ICML 2016) moves to complex-valued embeddings so the
Hermitian dot product can score asymmetric relations differently by
argument order; RotatE (Sun et al., ICLR 2019) models a relation as a
rotation in complex space, letting one model express
symmetry/antisymmetry, inversion, AND composition simultaneously. These
are scored against a corrupted-negative ranking objective and evaluated
by MRR / Hits@k on benchmarks (FB15k-237, WN18RR, YAGO3-10). They are
cheap, scalable to tens of millions of edges, and genuinely surface
unstated facts — but they are SHALLOW relational pattern-matchers,
transductive (no embedding exists for an unseen entity), and they encode
only the latent geometry of co-occurrence, not meaning.
A second, older and more interpretable tradition is RULE MINING: AMIE
/ AMIE+ / AMIE3 (Galárraga et al., WWW 2013 onward) mine closed Horn
rules ("Datalog") with support/confidence under a partial-completeness
assumption; AnyBURL (Meilicke et al., IJCAI 2019, VLDBJ 2023) samples
bottom-up paths and generalizes them into rules anytime, and —
strikingly — a simple symbolic rule learner MATCHES OR BEATS most
embedding models on link prediction while producing human-readable,
evidence-bearing explanations. This matters enormously for donto: rules
are inherently auditable and map naturally onto donto's typed argument
edges and Lean-certifiable shapes. Hybrids now feed embedding-predicted
links back to enrich the graph before rule mining (Betz/Meilicke et al.,
2024).
The frontier moved to two places. (1) GNN / path-based and INDUCTIVE
KGC: GraIL (Teru et al., 2020) reasons over enclosing subgraphs so it
generalizes to unseen entities; NBFNet (Zhu et al., NeurIPS 2021)
reframes link prediction as a learned generalized Bellman-Ford over
paths and is a strong SOTA; and ULTRA (Galkin et al., ICLR 2024) is the
watershed — a single FOUNDATION MODEL that does zero-shot link
prediction on ANY KG with any entity/relation vocabulary, by learning
representations of the graph of relations (how relations
interact) rather than fixed per-entity embeddings, beating
graph-specific baselines across 50+ KGs. This is the closest thing to
"one model, every domain" — but it still operates purely on graph
topology, not on cross-modal or deep-semantic content. (2) LLM-AUGMENTED
KGC (2023-2026): KICGPT (Wei et al., EMNLP 2023 / 2024) couples a
structure-aware retriever with an LLM reranker to fix the long-tail
problem; KG-LLM, SAT (structure-aware alignment-tuning, 2025), DrKGC
(subgraph-retrieval-augmented LLMs, 2025), and ontology-enhanced LLM-KGC
(2025) all inject the LLM's world knowledge and natural-language
semantics as a second signal. These finally bring lexical/semantic
understanding to KGC — the bridge to donto's agentic-lens idea — but
they bolt the LLM onto a single structural task, not onto a many-lens
decomposition.
Crucially, the field is honest about what it MISSES. (a)
Degree/popularity bias: embedding KGC preferentially scores high-degree
"rich club" entities, so it amplifies what is already well-studied and
overlooks the long tail (Shomer et al., WWW 2023; biological-KG topology
study, bioRxiv 2024) — the opposite of serendipity. (b) Plausibility ≠
novelty ≠ truth: KGC ranks how much a candidate edge resembles the
existing distribution, so "best" predictions are often the most
obvious/redundant ones, and benchmarks (FB15k/WN18) are inflated by
reverse-triple leakage and binarized n-ary relations (Akrami et al.; "On
Large-scale Evaluation," 2025). (c) Calibration is poor, especially
under the realistic open-world assumption — confidence scores do not
equal probabilities of truth (Tabacof & Costabello, EMNLP 2020;
"Using Model Calibration to Evaluate Link Prediction," WWW 2024;
KGE-Calibrator, EMNLP 2025). (d) Contradictions: standard KGC assumes a
single consistent truth and cannot natively HOLD a contradiction; even
uncertain-KG embeddings (UKGE, Chen et al., AAAI 2019) model confidence
but still struggle to represent negative/false links as legal state.
Every one of these gaps is something donto's paraconsistent,
evidence-first, calibration-agnostic substrate is architecturally built
to absorb.
Foundational works:
TransE — Translating Embeddings for Modeling
Multi-relational Data — Antoine Bordes, Nicolas Usunier,
Alberto García-Durán, Jason Weston, Oksana Yakhnenko (2013): A relation
is a translation in vector space (h + r ≈ t). Established that unstated
edges can be predicted cheaply at scale by geometry — the founding move
of embedding-based relationship discovery, and the simplest baseline to
beat. https://proceedings.neurips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html
ComplEx — Complex Embeddings for Simple Link
Prediction — Théo Trouillon, Johannes Welbl, Sebastian Riedel,
Éric Gaussier, Guillaume Bouchard (2016): Complex-valued embeddings +
Hermitian dot product let one model score asymmetric/antisymmetric
relations differently by argument order, while staying linear in
time/space. Shows the SCORING GEOMETRY must match the relation's
algebraic type — directly relevant to donto querying identity/relations
'under a lens'. https://arxiv.org/abs/1606.06357
RotatE — Knowledge Graph Embedding by Relational Rotation in
Complex Space — Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, Jian
Tang (2019): Modeling each relation as a rotation captures symmetry,
antisymmetry, inversion AND composition in one model, plus
self-adversarial negative sampling. The clearest statement that
different RELATION PATTERNS need different algebraic structure — a
single embedding space cannot represent all relationship types equally.
https://arxiv.org/abs/1902.10197
AMIE / AMIE+ / AMIE3 — symbolic Horn-rule mining over
KGs — Luis Galárraga, Christina Teflioudi, Katja Hose, Fabian
Suchanek (2013-2020): Mines human-readable logical rules with
support/confidence under a partial-completeness assumption. The
interpretable, evidence-bearing alternative to black-box embeddings —
maps onto donto's typed argument edges and Lean-certifiable shapes. https://github.com/dig-team/AMIE
AnyBURL — Anytime Bottom-Up Rule Learning for KG
Completion — Christian Meilicke, Melisachew Wudage Chekol,
Daniel Ruffinelli, Heiner Stuckenschmidt (2019 (VLDBJ 2023)): Samples
paths bottom-up and generalizes them to rules anytime; a SIMPLE symbolic
learner that matches/beats embeddings on link prediction with full
explanations. Proof that interpretable path-rules are competitive with
opaque vectors — the right substrate for an audit-first system. https://link.springer.com/article/10.1007/s00778-023-00800-5
NBFNet — Neural Bellman-Ford Networks (path-based GNN link
prediction) — Zhaocheng Zhu, Zuobai Zhang, Louis-Pascal
Xhonneux, Jian Tang (2021): Reframes link prediction as a learned
generalized Bellman-Ford over paths between a pair of nodes — combining
the interpretability of path-reasoning with the power of GNNs, and
generalizing inductively to unseen entities. https://proceedings.neurips.cc/paper_files/paper/2021/file/f6a673f09493afcd8b129a0bcf1cd5bc-Paper.pdf
UKGE — Embedding Uncertain Knowledge Graphs — Xuelu
Chen, Muhao Chen, Weijia Shi, Yizhou Sun, Carlo Zaniolo (2019): Embeds
confidence scores (not binary truth) and uses probabilistic soft logic
to infer confidence for unseen facts. The honest precursor to donto's
weighted-hypothesis stance — but it still cannot hold
mutually-contradictory claims as legal state, which is exactly donto's
differentiator. https://web.cs.ucla.edu/~yzsun/papers/2019_AAAI_UKG.pdf
Modern AI systems:
ULTRA (Foundation Model for KG Reasoning) — A
single pre-trained model that does ZERO-SHOT link prediction on any KG
with any entity/relation vocabulary, by learning representations of the
'graph of relations' (relation-to-relation interactions) rather than
fixed per-entity embeddings. UltraQuery extends it to inductive
logical-query answering. [Across 57 KGs, zero-shot inductive
performance often matches or beats baselines TRAINED on each specific
graph; the dominant transferable-KGC result of 2024. Open-source
(DeepGraphLearning/ULTRA), HF checkpoints.]https://arxiv.org/abs/2310.04562
KICGPT (LLM with Knowledge in Context) — Couples a
structure-aware triple-retriever with an LLM reranker; encodes KG
structure into in-context demonstrations (Knowledge Prompt) to fix the
long-tail-entity problem without extra finetuning. [EMNLP 2023; SOTA
or near-SOTA on FB15k-237 / WN18RR especially for long-tail entities;
widely cited template for LLM+retriever KGC.]https://arxiv.org/abs/2402.02389
SciAgents (Buehler lab, MIT) — Multi-agent system
over an ontological knowledge graph (~33k nodes / 49k edges from ~1,000
papers) that traverses the graph to surface hidden interdisciplinary
connections and autonomously generates+refines scientific hypotheses
with multiple specialized LLM agents. The closest existing realization
of the founder's 'agents traverse a graph to find links no human drew'
vision. [Published in Advanced Materials (2025); reported to reveal
previously-unrelated interdisciplinary links in bio-inspired materials
and generate mechanistically-grounded hypotheses. Open-source
(lamm-mit/SciAgentsDiscovery).]https://arxiv.org/abs/2409.05556
SAT / DrKGC / Ontology-Enhanced LLM-KGC (2025 LLM-KGC
wave) — Family of structure-aware LLM-KGC methods: SAT aligns
graph embeddings with NL space via contrastive multi-task tuning; DrKGC
does dynamic subgraph retrieval-augmented LLM completion across
general+biomedical domains; ontology-enhanced variants inject schema
constraints into the LLM. [SAT reports 8.7%-29.8% link-prediction
improvement over prior SOTA on four benchmarks; DrKGC strong
cross-domain (general + biomedical). Mostly 2025 arXiv/venue
papers.]https://arxiv.org/abs/2509.01166
Drug-repurposing / literature-based-discovery KGC
pipelines — Casts hypothesis generation as link prediction over
literature-derived biomedical KGs (SemMedDB, custom). The real-world
proof that KGC produces actionable NOVEL hypotheses (gene-disease,
drug-disease), the closest deployed instance of donto's
serendipity-engine payoff. [Rare-disease repurposing KGE reached
AUROC ~0.89 on known indications while proposing novel candidates;
COVID-19 repurposing via KGC (2020); active subfield with wet-lab
follow-ups.]https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6937428/
Calibration tooling (Tabacof & Costabello;
KGE-Calibrator) — Methods+evaluations to turn KGC scores into
trustworthy probabilities; show calibration works under closed-world but
degrades badly under the realistic open-world assumption, and that
calibrated scores measurably improve human-AI collaborative curation.
[Tabacof & Costabello (EMNLP 2020); 'Using Model Calibration to
Evaluate Link Prediction' (WWW 2024); KGE-Calibrator (EMNLP 2025)
calibrates without hurting ranking. Open-source
(Yang233666/KGE-Calibrator).]https://aclanthology.org/2020.emnlp-main.667/
Relevance to the lens engine: BORROW (don't
reinvent): (1) Use KGC as the cheap, scalable FIRST PASS that proposes
candidate edges over donto's 39.5M statements — ULTRA-style
inductive/foundation models and AnyBURL-style path-rules both run at
this scale and need no per-entity training, fitting a substrate where
'identity is a hypothesis.' (2) Adopt rule mining (AMIE/AnyBURL/NBFNet
paths) as the EXPLAINABLE generator: every proposed edge arrives with a
path/rule that can be written as a donto typed argument edge
(supports/rebuts) and handed to the Lean-4 overlay for shape
certification — turning machine guesses into auditable claims. (3) Treat
calibration as a first-class output, not an afterthought: store the
model score AND a calibrated open-world probability on each
hypothesis_only edge so curation can triage. (4) Use the LLM-KGC wave
(KICGPT, SAT, DrKGC) as the bridge from structure to SEMANTICS — they
are the existence proof that natural-language meaning improves link
prediction. AVOID: (a) Don't let embedding-style scoring be the arbiter
of value — it is degree-biased ('rich club') and rewards REDUNDANT,
distribution-conforming edges, which is anti-serendipity; the rare
valuable cross-domain link will score LOW by construction. donto should
explicitly up-weight low-prior, cross-context (cross-ctx:*) candidates
rather than top-ranked ones. (b) Don't inherit the
single-consistent-truth assumption baked into every KGC loss; donto's
paraconsistent frontier is precisely the part KGC cannot do. (c) Don't
trust benchmark MRR as a proxy for discovery quality — it is inflated by
leakage and measures plausibility, not novelty or truth.
Already done vs white space: ALREADY DONE (the
founder should NOT believe 'no one has thought to do this'): Predicting
unstated relationships at massive scale is a solved, decade-old industry
— TransE→RotatE→ComplEx→NBFNet→ULTRA do it transductively and now
zero-shot across 50+ graphs. Generating NOVEL, actionable cross-entity
hypotheses is done in drug repurposing / literature-based discovery
(with wet-lab validation). Explainable, evidence-bearing link proposals
exist (AMIE, AnyBURL, NBFNet paths). Confidence/uncertainty on proposed
edges exists (UKGE, calibration work). And — most pointedly — AGENTS
traversing a knowledge graph to surface previously-unrelated
interdisciplinary links and auto-generate hypotheses ALREADY EXISTS in
SciAgents (Buehler, 2025) and the broader agentic-graph-discovery wave
(GraphAgents, cross-domain materials design, 2026). So the 'agents break
things down and find links no human drew' core is real and demonstrated.
GENUINE WHITE SPACE (the defensible novelty): (1) The MANY-LENS
decomposition as the GENERATIVE engine. All prior KGC predicts within a
SINGLE relation vocabulary / single ontology / single modality; nobody
systematically decomposes each entity through the full spectrum of
analytical lenses (mereological, teleological, semiotic,
phenomenological, ethical, ecological...) and then mines relationships
at the INTERSECTION of lenses. SciAgents traverses one graph; it does
not multiplex perspectives. (2) The PARACONSISTENT,
contradiction-preserving SUBSTRATE for holding millions of speculative,
mutually-incompatible machine-proposed edges forever as legal state — no
KGC system can do this; they all collapse to one truth. (3) The
EVIDENCE-FIRST byte-anchoring of every speculative edge plus a Lean-4
certification path, giving a generate-hold-verify lifecycle that the ML
field has no equivalent for (KGC outputs a ranked list, not a curated,
source-anchored, machine-checked claim store). The honest novelty is
therefore NOT 'discover unstated links' (done) but 'the AGENTIC +
MANY-LENS generation × PARACONSISTENT/EVIDENCE-FIRST holding ×
certifiable verification PIPELINE at substrate scale' — the combination,
not any single piece.
Hard problems:
Plausibility ≠ novelty ≠ truth: KGC scores reward edges that
resemble the existing distribution, so the highest-ranked predictions
are the most redundant/obvious — the genuinely valuable cross-domain
link scores LOW. No good objective exists for
'surprising-yet-true.'
Degree/popularity bias ('rich club'): embedding KGC over-scores
well-connected entities and ignores the long tail, amplifying what is
already studied — structurally anti-serendipity (Shomer et al. WWW
2023).
The combinatorial explosion: many lenses × millions of entities =
astronomically many candidate intersection-edges. Generating is cheap;
the bottleneck is RANKING/curation and the cost of false positives
flooding the substrate.
Evaluation has no ground truth for novel discovery: benchmark MRR is
inflated by reverse-triple leakage and n-ary binarization, and rewards
rediscovering known edges; measuring whether a never-before-drawn link
is correct requires expensive external (often wet-lab/expert)
validation.
Calibration under the open-world assumption: KGC confidence scores
are not probabilities of truth, and existing calibration largely fails
OWA — so triaging which speculative edges to verify is unreliable
(Tabacof & Costabello 2020; WWW 2024).
Contradiction handling: every standard KGC loss assumes one
consistent truth and cannot natively represent/hold
mutually-contradictory or false links — exactly the state donto wants to
keep, with no off-the-shelf method to score within it.
Cross-modal / deep-semantic links: embeddings encode only
co-occurrence geometry within one relation vocabulary; they cannot
represent analogical, teleological, semiotic, or phenomenological
relations, nor links that span modalities — the lenses the founder cares
about are precisely the ones current KGC cannot embed.
Inductive generalization to NEW relation TYPES: even ULTRA
generalizes to new entities/graphs but the field still struggles when
the RELATION vocabulary itself is unseen or open-ended — a many-lens
engine continuously invents new relation types.
Noise and hallucination from LLM-augmented generation: LLM-KGC and
agentic generators produce fluent but spurious edges; without strict
evidence-anchoring and certification they pollute the graph faster than
humans can curate.
ai-scientific-hypothesis-generation
This field is the single closest existing analogue to donto's
"many-lens relationship-discovery engine," and its 60-year arc is
essential context. The intellectual root is Don Swanson's
Literature-Based Discovery (LBD, 1986): his "undiscovered public
knowledge" thesis holds that independently-created literature fragments
can be logically related yet never connected, and his ABC model (A
relates to B, B relates to C, therefore hypothesize A-C) found the
fish-oil/Raynaud's link purely by bridging non-interacting MEDLINE
literatures — later clinically validated. This is EXACTLY the founder's
intuition ("relationships no human thought to draw because no human
holds all the literatures/lenses at once"), and it predates LLMs by 40
years. LBD's whole premise is that the value is in the
intersection/bridge term, not the facts inside either literature —
identical to the founder's "payoff is at the intersection of
lenses."
The second lineage is closed-loop autonomous science: Ross King's
Robot Scientist Adam (Cambridge/Aberystwyth, 2009, first machine to
autonomously discover new scientific knowledge — yeast functional
genomics) and Eve (drug screening), now Genesis. The critical lesson
here is that Adam/Eve close the loop — they generate hypotheses, design
discriminating experiments, RUN them with lab robotics, and revise. This
is the "verify/curate" half of the founder's vision made physical, and
it's the part pure-text systems lack. The third lineage is
embedding/representation-based latent-knowledge extraction: Tshitoyan et
al. (Nature 2019, "mat2vec") trained Word2vec on 3.3M materials-science
abstracts and showed the unsupervised embeddings recommended
thermoelectric materials YEARS before their actual discovery — i.e., the
"latent structure of future discoveries is already embedded in past
text." This is the strongest empirical proof that machine-readable
latent relationships exist in a corpus and can be surfaced. The fourth
lineage is knowledge-graph link prediction as hypothesis generation:
drug-repurposing KGs (DRKG, COVID-19 KGs using GNNs/ComplEx/ensemble KG
embeddings) frame a new drug-disease hypothesis literally as predicting
a missing edge, validated via AUROC/AUPRC and explanatory paths — this
is the paradigm donto's substrate is architecturally nearest to.
The 2024-2026 wave fuses LLM agents with all of the above. SciAgents
(Ghafarollahi & Buehler, MIT, arXiv:2409.05556, Advanced Materials
2025) is the MOST relevant single system: it builds a large ontological
knowledge graph (~33K nodes / 49K edges from ~1,000 papers), then
samples a PATH between two concepts — crucially WITH INJECTED RANDOMNESS
/ random waypoints to force non-deterministic, exploratory,
serendipitous bridges — and hands that path to a multi-agent pipeline
(Ontologist → Scientist-1 proposes hypothesis → Scientist-2 adds
mechanism/experiment → Critic evaluates → novelty checked against
Semantic Scholar). It explicitly claims to reveal "hidden
interdisciplinary relationships previously considered unrelated." This
is essentially the founder's engine for one domain, minus the
paraconsistency and the persistent contradiction-holding substrate.
Google DeepMind's AI co-scientist (Feb 2025, Gemini 2.0) is the most
mature: a Supervisor orchestrates Generation, Reflection, Ranking,
Evolution, Proximity, and Meta-review agents; hypotheses compete in an
Elo TOURNAMENT via simulated scientific debate (self-play), and
test-time compute scales the search. It produced wet-lab-validated
results: AML drug-repurposing candidates that inhibited tumor viability,
anti-fibrotic epigenetic targets in liver organoids, and independently
re-derived a then-unpublished antimicrobial-resistance mechanism (phage
capsid gene transfer). Adjacent recent systems — BioDisco (dual-mode
KG+literature evidence, iterative feedback, and a notable TEMPORAL
evaluation that tests whether a hypothesis is confirmed by post-cutoff
literature), KG-CoI / Knowledge-Grounded LLMs (arXiv:2411.02382),
TruthHypo/KnowHD, and Bayes-Entropy collaborative agents — all converge
on grounding hypotheses in graphs to fight hallucination and
ranking/refining to control quality.
The honest verdict on what LLM ideation actually delivers: Si, Yang
& Hashimoto (Stanford, arXiv:2409.04109, 100+ NLP researchers) found
LLM-generated ideas were judged statistically MORE NOVEL than expert
ideas (p<0.05) but slightly less feasible — encouraging for the
founder. BUT the follow-up "Ideation-Execution Gap" (arXiv:2506.20803,
2025) had 43 experts actually EXECUTE the ideas (100+ hrs each): after
execution, LLM ideas' scores collapsed on every metric and human ideas
overtook them. The lesson directly applicable to donto: surface novelty
is cheap and machine-abundant; durable value requires
execution/verification, which is exactly why a substrate that can HOLD
speculative relationships cheaply and selectively VERIFY the rare
valuable ones is the right architecture — but the verification step is
where all the real difficulty (and value) lives.
Foundational works:
Fish Oil, Raynaud's Syndrome, and Undiscovered Public
Knowledge (Literature-Based Discovery; ABC model) — Don R.
Swanson (1986): Logically-related but non-interacting literatures hide
valid hypotheses at the bridge (B) term; value is in the
cross-literature intersection no single reader holds — the 40-year-old
direct ancestor of the founder's many-lens intersection idea. https://muse.jhu.edu/article/403510/summary
Unsupervised word embeddings capture latent knowledge from
materials science literature (mat2vec) — Tshitoyan, Dagdelen,
Weston, Dunn, Persson, Ceder, Jain et al. (2019): Unsupervised
embeddings over 3.3M abstracts recommended materials YEARS before
discovery — empirical proof that latent future-relationships are already
encoded in an existing corpus and can be surfaced without labels. https://www.nature.com/articles/s41586-019-1335-8
Knowledge-graph link prediction as hypothesis generation
(DRKG / COVID-19 drug-repurposing KGs; ComplEx, GNNs, ensemble KG
embeddings) — Multiple (DRKG team; Hsieh et al.; Ioannidis et
al.) (2020-2023): A new relationship hypothesis = a predicted missing
edge, scored (AUROC/AUPRC) and explained via supporting paths — the
paradigm donto's quad substrate is architecturally closest to;
relationships are rankable predictions, not facts. https://arxiv.org/pdf/2212.03911
Discovering Research Hypotheses Using Knowledge Graph
Embeddings — Springer / KG-embedding LBD line (2021): Frames
hypothesis discovery over a paper-derived KG as link prediction
(ComplEx) — generalizes LBD/Swanson into embedding space, the bridge
between symbolic LBD and modern neural discovery. https://link.springer.com/chapter/10.1007/978-3-030-77385-4_28
Modern AI systems:
SciAgents (MIT, Ghafarollahi & Buehler) —
Multi-agent (Ontologist, Scientist-1, Scientist-2, Critic) discovery
over a ~33K-node ontological KG; samples a PATH between two concepts
with INJECTED RANDOMNESS / random waypoints to force serendipitous
interdisciplinary bridges, then generates+mechanizes+critiques the
hypothesis and checks novelty against Semantic Scholar. The single
closest existing analogue to the founder's lens-intersection engine.
[Published in Advanced Materials (2025); open-source
(lamm-mit/SciAgentsDiscovery); claims to reveal hidden interdisciplinary
material relationships 'previously considered unrelated'; evaluated
mainly by expert/critic-agent judgment, not wet-lab at scale.]https://arxiv.org/abs/2409.05556
Google DeepMind AI co-scientist — Supervisor
orchestrates Generation, Reflection, Ranking, Evolution, Proximity,
Meta-review agents (Gemini 2.0); hypotheses compete in an Elo TOURNAMENT
via self-play scientific debate; test-time compute scales the search;
iterative evolution refines. [Wet-lab-validated: AML
drug-repurposing candidates inhibited tumor viability; anti-fibrotic
epigenetic targets in human liver organoids; independently re-derived an
unpublished antimicrobial-resistance mechanism (phage capsid gene
transfer). Enterprise pilots (Daiichi Sankyo, Bayer).]https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/
BioDisco — Multi-agent biomedical hypothesis
generation with DUAL-MODE evidence (KG + literature), iterative
feedback, and a TEMPORAL evaluation that tests whether a hypothesis is
confirmed by literature published AFTER the model's cutoff — a
retro-validation protocol highly relevant to donto's bitemporality.
[Reports evaluation on PubMedQA, GPQA, CardBioMedBench, HypoBench
and custom temporal datasets (2025 preprint; adoption unknown).]https://arxiv.org/pdf/2508.01285
KG-CoI / Knowledge-Grounded LLM hypothesis
generation — Injects KG subgraphs into a chain-of-ideas to
ground LLM hypotheses, with a KG-supported hallucination-detection step.
[Reports reduced hallucination vs ungrounded LLM ideation;
research-stage.]https://arxiv.org/pdf/2411.02382
Si/Yang/Hashimoto human study + Ideation-Execution
Gap — The two-part empirical reality check: large blinded
studies on whether LLM-generated research ideas are actually good
(ideation-only, then post-execution). [Ideation: LLM ideas judged
MORE novel than experts (p<0.05). Execution (arXiv:2506.20803, 43
experts, 100+ hrs each): LLM ideas' scores collapsed on all metrics;
humans overtook — surface-novelty does not survive verification.]https://arxiv.org/abs/2409.04109
Bayes-Entropy collaborative agents / TruthHypo-KnowHD /
hypothesis-eval benchmarks (ResearchBench, IdeaBench, LiveIdeaBench,
HypoBench) — Newest wave: uncertainty/entropy-driven iterative
hypothesis optimization, truthfulness-vs-hallucination evaluation
combining literature+KG retrieval, and a benchmark ecosystem for
novelty/feasibility/truthfulness. [E.g. Bayes-Entropy reports
Shannon-entropy drop of 0.92 over 12 iterations; benchmarks expose that
most systems optimize novelty/diversity while neglecting
truthfulness.]https://arxiv.org/pdf/2508.01746
Relevance to the lens engine: BORROW: (1) Swanson's
ABC/bridge logic and SciAgents' random-waypoint path sampling are the
proven mechanics for generating cross-domain relationships no human
posited — donto can run the same over its 39.5M-statement graph, but
with MANY analytical lenses as the typed dimensions the bridge can
traverse, not just co-occurrence. SciAgents needed only ~1,000 papers;
donto already has the substrate. (2) The disposer/loop is
non-negotiable: Adam/Eve and the AI co-scientist show value comes only
when generation is paired with a ranking + verification mechanism (Elo
tournament, critic agents, novelty-vs-Semantic-Scholar, and ultimately
experiment). donto's Lean-4 certification + evidence-anchoring +
argument edges (supports/rebuts/undercuts) ARE a disposer — wire the
agent-proposed relationships into it (mirrors donto's own
'agent-proposes / Lean-disposes' rosie-search pattern). (3) BioDisco's
temporal evaluation maps perfectly onto donto's bitemporality: hold a
speculative relationship as legal state now, and let later-ingested
evidence retro-confirm or rebut it without rewriting history. (4)
mat2vec proves a cheap embedding pass over the corpus can pre-rank which
speculative edges are worth materializing — use it as a candidate
generator before expensive agentic decomposition. AVOID / heed the
warnings: the Ideation-Execution Gap is the central caution — machine
novelty is abundant and cheap; do NOT treat 'hundreds of speculative
relationships per pass' (donto already extracts ~483 facts/pass) as the
win condition, because surface novelty evaporates under execution. The
differentiated value is the VERIFICATION funnel, not generation volume.
Also avoid SciAgents/co-scientist's reliance on a single LLM judge as
ground truth (your own memory note: 'no authority is ground truth') —
donto's paraconsistent, contradiction-preserving design is precisely the
antidote, letting you keep rival relationship-claims and their argument
edges instead of collapsing to one ranked answer. Combinatorial blow-up
(path/lens-pair explosion) is the operational risk SciAgents controls
with bounded random sampling and the co-scientist controls with Elo
pruning — donto needs an equivalent bounded-candidate + ranking gate (it
already has the pattern in its bounded-candidate /search query).
Already done vs white space: ALREADY DONE (founder
should not reinvent): The core thesis that valuable relationships live
unconnected across literatures/domains and can be surfaced mechanically
— Swanson proved it in 1986 and it has been clinically validated.
'Latent future relationships already encoded in the corpus' —
Tshitoyan/mat2vec proved it (2019). Multi-agent, ontological-KG,
randomized-path, serendipitous cross-domain hypothesis generation with a
critic and a novelty check — SciAgents IS this (2024-25), for materials.
Tournament-ranked, self-debating, evolving multi-agent hypothesis
generation with REAL wet-lab validation — Google's AI co-scientist
(2025). Relationship-as-link-prediction over biomedical KGs with
explanatory paths — the entire drug-repurposing-KG field (2020-23).
Evidence-grounded, hallucination-resisting, KG+literature hypothesis
generation — KG-CoI, BioDisco, TruthHypo (2024-25).
Temporal/retro-validation of hypotheses — BioDisco. Honest evals of
whether any of it produces durable value — Si et al. +
Ideation-Execution Gap. So the components 'agentic,' 'many-perspective
decomposition,' 'cross-domain bridging,' and 'KG substrate' each
individually EXIST. GENUINE WHITE SPACE: (1) The full SPECTRUM of human
analytical lenses as first-class, typed, persistent dimensions — every
existing system uses ONE implicit lens (semantic similarity / domain
ontology / co-citation). Nobody has made philosophical, mereological,
teleological, semiotic, phenomenological, ethical, ecological etc.
lenses explicit, simultaneous, and cross-indexed so relationships emerge
at lens INTERSECTIONS. (2) A PARACONSISTENT, contradiction-PRESERVING
substrate that HOLDS millions of mutually-incompatible machine-proposed
relationships forever as legal state with typed argument edges — every
existing discovery engine collapses to a single ranked hypothesis list
and discards rivals; donto's hold-without-collapse +
identity-as-hypothesis is genuinely unexplored at scale. (3)
Domain-GENERALITY — all proven systems are narrow (yeast, materials,
biomed); a domain-agnostic engine over a general 39.5M-statement
substrate is untested. (4) Evidence-anchoring to source bytes + Lean-4
certification of the rare valuable edge as the disposer is a
verification architecture no one has assembled. The combination —
agentic many-lens decomposition + paraconsistent hold + formal/evidence
verification, domain-general, at substrate scale — is novel even though
no individual ingredient is.
Hard problems:
The ideation-execution gap: machine-generated novelty is abundant
and cheap but collapses under real execution/verification (Stanford
2025) — generation volume is NOT the win; the verification funnel is,
and it is expensive and largely unsolved at scale.
Evaluation/ground-truth problem: there is no reliable automated
metric for whether a discovered relationship is TRUE and VALUABLE vs
merely novel; LLM-as-judge and Elo tournaments are proxies that can be
gamed, and a hypothesis's worth often can't be known without
wet-lab/empirical execution.
Hallucination and spurious-bridge generation: agents fabricate
plausible-but-false relationships; KG-grounding (KG-CoI, BioDisco)
reduces but cannot eliminate it (hallucination is argued mathematically
inevitable), and grounding to a contradiction-laden substrate
complicates 'grounding' itself.
Combinatorial explosion: the space of lens-pairs x entity-pairs x
paths is astronomically larger than any one-lens system; SciAgents and
the co-scientist only tame the single-lens version via bounded random
sampling and tournament pruning — many-lens intersection search needs a
fundamentally better candidate-generation/pruning theory.
Noise vs signal at scale: most surfaced cross-domain relationships
are trivial, coincidental, or already-known; the precision problem
(finding the rare Swanson-grade bridge among millions of junk edges) is
the field's core unsolved difficulty.
Knowledge-graph/ontology construction and maintenance cost, plus the
un-anchored-relationship problem: holding speculative edges forever is
cheap to write but expensive to keep coherent, and a paraconsistent
store risks degenerating into an unqueryable contradiction soup without
strong argument-edge curation and lens-scoped query semantics.
Defining and operationalizing the 'lenses' rigorously: 'push each
lens to the utmost of human understanding' is under-specified — turning
philosophical/teleological/phenomenological analysis into reproducible,
comparable, machine-typed features (not just prose) is an open problem
with no benchmark.
foundational-faceted-ontologies
This tradition supplies the rigorous theory of the lens
itself — what an orthogonal analytical dimension is, how to
decompose an entity through several at once, and (critically for donto's
"relationship discovery" payoff) how implicit structure emerges from the
intersection of dimensions. It splits into three lineages that the
founder's vision unknowingly braids together.
(1) Upper / foundational ontologies define a small set of
top-level categories through which any entity can be viewed —
the formal backbone of "lenses." BFO (Barry Smith, Buffalo; the upper
ontology of the OBO Foundry / ~hundreds of biomedical ontologies) splits
reality into continuants (3D enduring things, with
independent/dependent/quality/role/disposition/function
sub-distinctions) vs occurrents (4D processes),
unifying 3D-ist and 4D-ist views in one frame. DOLCE (Gangemi, Guarino,
Masolo, Borgo; the LOA in Trento, ~2002) is explicitly
"cognitive/linguistic-biased" — it carves the categories underlying
natural language and common sense (endurants, perdurants,
qualities, abstracts), and its Descriptions & Situations
(D&S) extension is the single most relevant piece here: it
reifies descriptions (roles, concepts, parameters) separately
from the situations/states-of-affairs they "satisfy," so the
same facts can be re-interpreted under many
descriptions/perspectives without conflict — a near-exact formal
analogue of donto's "identity/relationship is a hypothesis queried under
a lens." SUMO (Niles & Pease, Teknowledge/Articulate, 2000) is a
large, fully axiomatized ontology with first-order reasoning
(Sigma/Vampire/E provers) and a complete manual mapping of every
WordNet synset to SUMO terms — the best example of bridging a
lexical lens to a formal one. Cyc (Lenat, 1984–) is the deepest
precedent for the paraconsistent / contextual angle: its
microtheories (Mt) scope assertions to
assumption-contexts so globally contradictory views (relativistic vs
Newtonian physics, fiction vs fact, conflicting economic theories)
coexist without exploding — Cyc deliberately is locally
consistent but globally contradiction-tolerant, exactly donto's
posture.
(2) Faceted classification is the methodology of many orthogonal
lenses. S. R. Ranganathan's Colon Classification
(1933) and its PMEST fundamental categories —
Personality (the focal entity), Matter
(substance/material), Energy
(action/process/operation), Space,
Time — were the first
analytico-synthetic scheme: you analyze a
subject into facets, then synthesize a compound class number by
combining foci from independent facets with connecting symbols (the
colon). The deep claim is that a small set of orthogonal facets can
compose to express an unbounded space of compound subjects no one
enumerated in advance — precisely the combinatorial generativity
the founder wants, stated in 1933. PMEST is the historical ancestor of
(a) modern faceted search/navigation (Pollitt,
Shneiderman, Marchionini, and especially Marti Hearst's
Flamenco, Berkeley 2000s — multi-dimensional filter UIs
everywhere now), and (b) BFO/DOLCE-style category systems. The founder's
"philosophical, temporal, causal, mereological, teleological…" list is,
structurally, a much larger PMEST.
(3) Frame semantics + Formal Concept Analysis give the emergence
engine. Charles Fillmore's frame semantics
(1970s–80s) and FrameNet (ICSI Berkeley, 1997–) say a
word's meaning is only graspable against a whole frame
— a structured scene with frame elements (roles): the
COMMERCIAL_TRANSACTION frame binds Buyer, Seller, Goods, Money. Frames
are reusable relational "lenses" with typed slots — the conceptual
template for any per-lens schema and for relation extraction (semantic
role labeling). Formal Concept Analysis (Rudolf Wille,
"Restructuring Lattice Theory," 1981; Ganter & Wille's
Mathematical Foundations, 1996/1999, on Birkhoff lattice theory
and Peirce/Port-Royal logic) is the deepest mathematical realization of
the founder's exact payoff. From a binary formal
context (objects × attributes table), a Galois
connection between extents and intents produces formal
concepts (maximal object-set/attribute-set pairs where neither
can grow), and ordering them yields a concept lattice —
a complete lattice whose nodes are emergent concepts the analyst
never named, plus a canonical basis of attribute
implications (A→B: "every object with all of A has all of B")
computable via attribute exploration. This is literally a
machine that surfaces latent concepts and rules from an object×attribute
matrix — the founder's "relationships no human thought to
draw." Its multi-relational extension, Relational Concept
Analysis (RCA) (Rouane-Hacène, Huchard, Napoli, Valtchev,
2013), iterates FCA over several object sorts linked by
relations, abstracting links into relational attributes and producing a
family of coupled lattices — i.e., discovering cross-entity
relations across multiple "kinds" at once, which is structurally what
donto's many-lens cross-entity discovery aims at.
Foundational works:
Colon Classification & PMEST (analytico-synthetic
faceted classification) — S. R. Ranganathan (1933): Decompose
any subject into a small set of ORTHOGONAL fundamental facets —
Personality, Matter, Energy, Space, Time — then SYNTHESIZE compound
subjects by combining foci across facets. A small lens set composes to
an unbounded, never-enumerated subject space: the founder's
combinatorial generativity, stated in 1933. https://en.wikipedia.org/wiki/Colon_classification
Restructuring Lattice Theory / Formal Concept
Analysis — Rudolf Wille (with Bernhard Ganter) (1981 (Ganter
& Wille foundations 1996/1999)): From an object×attribute table, a
Galois connection yields formal concepts (extent/intent pairs) ordered
into a concept LATTICE plus a canonical basis of attribute implications.
A literal machine for surfacing EMERGENT concepts and rules implicit in
data — the closest classical analogue of donto's 'relationships no one
thought to draw.' https://en.wikipedia.org/wiki/Formal_concept_analysis
Frame Semantics & FrameNet — Charles J.
Fillmore (ICSI Berkeley team) (1976–1985 theory; FrameNet 1997–): A
word's meaning is only intelligible against a structured FRAME — a scene
with typed roles (frame elements). Frames are reusable relational lenses
with slots; the template for any per-lens schema and for relation/role
extraction (SRL). https://en.wikipedia.org/wiki/FrameNet
Cyc & microtheories (contextual, contradiction-tolerant
KB) — Douglas Lenat (MCC / Cycorp) (1984–): Scope assertions to
assumption-contexts (microtheories) so globally contradictory worldviews
coexist without explosion — locally consistent, globally
contradiction-tolerant. The deepest precedent for donto's
paraconsistent/lens-scoped stance. https://en.wikipedia.org/wiki/Cyc
DOLCE + Descriptions & Situations (D&S /
DUL) — Gangemi, Guarino, Masolo, Borgo, Oltramari (LOA Trento)
(2002 (D&S ~2004; DUL)): A cognitively-biased upper ontology whose
D&S pattern REIFIES descriptions (roles/concepts/parameters) apart
from the situations they satisfy — so the same facts can be re-read
under many descriptions/perspectives without conflict. A formal analogue
of 'identity/relationship is a hypothesis under a lens.' https://arxiv.org/pdf/2308.01597
Basic Formal Ontology (BFO) — Barry Smith et al.
(Buffalo) (~2002; ISO/IEC 21838-2 in 2020): A small realist top-level
split — continuants (objects, qualities, roles, dispositions, functions)
vs occurrents (processes) — unifying 3D and 4D views; the de facto upper
ontology of the OBO Foundry. Shows a disciplined, minimal lens-backbone
that hundreds of domain ontologies actually share. http://ontology.buffalo.edu/bfo/
Suggested Upper Merged Ontology (SUMO) + WordNet
mapping — Ian Niles & Adam Pease (Teknowledge/Articulate)
(2000–2003): A fully first-order-axiomatized upper ontology with
theorem-prover reasoning (Sigma) and a COMPLETE manual mapping of every
WordNet synset to SUMO terms — the strongest worked example of bridging
a lexical lens to a formal/logical lens. https://en.wikipedia.org/wiki/Suggested_Upper_Merged_Ontology
Relational Concept Analysis (RCA) — Rouane-Hacène,
Huchard, Napoli, Valtchev (2013): Extends FCA to MULTI-relational data:
iterates over several object sorts linked by relations, abstracts links
into relational attributes, and converges to a coupled FAMILY of
lattices that reveal cross-sort implications and connections. Closest
formal match to donto's cross-entity, multi-kind discovery. https://link.springer.com/article/10.1007/s10472-012-9329-3
Modern AI systems:
The Lattice Representation Hypothesis of LLMs — Bo
Xiong (Stanford), 2026 — proposes LLM embeddings encode FCA-style
concept LATTICES: attribute directions with thresholds act as separating
half-spaces whose intersections induce a concept lattice; concept
meet/join become geometric intersection/union. Bridges the Linear
Representation Hypothesis with FCA. [On 5 WordNet domains, LDA
recovered concept-attribute relations at 71-83% F1 (physical domains);
projection-based subsumption up to 77.1% F1; meet/join produced coherent
generalizations (e.g. 'predator' as join of dog+wolf). Research result,
not a product.]https://arxiv.org/html/2603.01227v1
Faceted search / navigation (Flamenco lineage) —
Marti Hearst's Flamenco (Berkeley, 2000s), building on
Pollitt/Shneiderman/Marchionini — turned Ranganathan's facets into
multi-dimensional filter UIs; now ubiquitous in e-commerce, library, and
enterprise search. [Dominant production paradigm for navigating
multi-dimensional item collections; validated by usability studies.
Mature, widely deployed.]https://people.ischool.berkeley.edu/~hearst/papers/hcir08.pdf
FrameNet + neural frame-semantic parsing / SRL —
FrameNet (ICSI) plus neural semantic-role-labeling models that auto-tag
frames and roles in text — operationalizing the 'frame as relational
lens' for extraction at scale. [FrameNet has 1,200+ frames / 13,000+
lexical units; SRL is a standard NLP task with strong neural baselines.
Mature resource + active research.]https://en.wikipedia.org/wiki/FrameNet
SUMO + Sigma reasoning system — Large axiomatized
upper ontology run through automated theorem provers (Vampire/E) via
Sigma; WordNet-linked for NL-to-logic. [Tens of thousands of axioms;
used for QA, word-sense disambiguation, formal reasoning. Stable,
academically active.]https://www.gabormelli.com/RKB/Suggested_Upper_Merged_Ontology_(SUMO)
Microsoft GraphRAG + LLM-driven KG construction
(2024-2025) — LLM pipelines that extract entities/relations
into knowledge graphs and use community detection for corpus-spanning QA
— the dominant 2024-25 way to auto-build relational structure from text
(NOT facet-theoretic, but the de facto competitor to a 'lens engine' for
relationship surfacing). [GraphRAG open-sourced 2024;
KG-construction reported reaching production maturity / strong ROI
across industries in 2024-25. Widely adopted.]https://aclanthology.org/2025.emnlp-main.1249.pdf
ConExp / FCA tooling & attribute exploration —
Concept Explorer and successors compute concept lattices, canonical
implication bases, and run semi-automatic attribute exploration
(expert-in-the-loop KB completion) from object×attribute contexts.
[Handles lattices up to ~millions of concepts; mature for
small/medium contexts. Niche academic tooling, not web-scale.]https://arxiv.org/abs/2411.06675
Relevance to the lens engine: BORROW (4 concrete
imports): (1) PMEST's analytico-synthetic principle as the design
contract for lenses — keep each lens ORTHOGONAL and have the value be in
the synthesis (foci combined across facets), not in any single
facet. This is the founder's intuition, already formalized in 1933;
treat it as a constraint (lenses should be as independent as possible)
rather than reinventing it. (2) FCA/RCA as the literal back-end for the
'relationship discovery' step: once agents fill many lenses, project the
cross-lens output into formal contexts (object×attribute) and
per-relation RCA contexts, then compute the concept lattice +
canonical implication basis. The emergent concepts and implications ARE
the 'relationships no one thought to draw' — and they come with a
provenance-free, deterministic derivation that pairs perfectly with
donto's evidence-anchoring and Lean-4 certification (FCA implications
are exactly the shape Lean can verify). (3) DOLCE's Descriptions &
Situations and Cyc's microtheories as prior art for donto's
'identity/relationship is a hypothesis under a lens' and
contradiction-holding — donto should cite these as the lineage it
extends, and reuse D&S's description/situation split as the modeling
pattern for 'a relationship-claim viewed under lens X.' (4) Frames
(FrameNet) as the per-lens schema template — each lens defines typed
roles to fill, making extraction structured and relation-ready. AVOID:
(a) the upper-ontology trap of forcing one universal,
globally-consistent category tree — BFO/SUMO spent two decades on
alignment wars; donto's paraconsistent, lens-relative stance is
the differentiator, so do NOT collapse lenses into a single
canonical ontology. (b) Classical FCA's brittleness — it requires
exact binary incidence and is noise-sensitive and worst-case
exponential in concepts; LLM-extracted attributes are noisy and graded,
so use fuzzy/relaxed FCA, bounded candidate generation, and donto's
hypothesis_only/contradiction-frontier to absorb noise instead of
letting it explode the lattice. (c) The FrameNet/Cyc lesson that
hand-curation does not scale — the whole bet must be that AGENTS fill
lenses cheaply; that agentic fill is the genuinely new ingredient these
classical systems lacked.
Already done vs white space: ALREADY DONE (the
founder should not claim these as novel): (1) The idea that a small set
of ORTHOGONAL lenses composes to an unbounded analytical space —
Ranganathan, 1933 (PMEST). (2) That implicit concepts and relationships
can be automatically derived from an object×attribute table —
FCA, Wille 1981; and across multiple related kinds — RCA, 2013. This is
the founder's core 'discovery engine' payoff, mathematically solved
decades ago for clean binary data. (3) Holding mutually-contradictory
claims in scoped contexts without explosion — Cyc microtheories, 1984+;
locally-consistent/globally-tolerant KBs are old news. (4)
Re-interpreting the same facts under many descriptions/perspectives —
DOLCE D&S, ~2004; multi-perspective and aspect-oriented ontology
development are established sub-fields. (5) Bridging a lexical lens to a
formal lens (SUMO×WordNet) and frames-as-relational-lenses (FrameNet) —
done. (6) Even the modern hint that LLM embeddings already
contain FCA-style concept lattices — Stanford 2026. GENUINE WHITE
SPACE: No prior system combines all four legs at once — (i) AGENTIC,
LLM-driven population of MANY heterogeneous human-analytical lenses
(philosophical/teleological/aesthetic/semiotic/ecological, far beyond
PMEST's five or BFO's continuant/occurrent split), at (ii) WEB-SCALE
over a (iii) PARACONSISTENT, evidence-anchored, bitemporal substrate
that can hold the resulting speculative cross-lens relations FOREVER as
legal state, with (iv) a verification layer (FCA-implication mining +
Lean-4 certification) to promote the rare valuable hypotheses. FCA/RCA
assumed clean curated contexts and tiny scale; Cyc/DOLCE assumed human
knowledge engineers; FrameNet assumed manual annotation; upper
ontologies assumed one consistent world. The novel claim that survives
scrutiny is the integration: agents as the lens-fillers,
paraconsistency as the holding-tank for cross-lens serendipity, and
FCA/Lean as the disciplined harvester. That specific assembly appears
genuinely unexplored.
Hard problems:
Combinatorial blow-up: FCA concept lattices are worst-case
exponential in context size, and cross-lens relationship candidates grow
combinatorially with #lenses × #entities — the 'discovery' space is
mostly junk, so the engine is bottlenecked on RANKING/pruning, not
generation. PMEST composition and donto's hypothesis-holding both face
this.
Noise vs exactness mismatch: classical FCA/RCA need exact binary
incidence and have no inherent noise tolerance, but LLM-extracted
attributes are noisy, graded, and hallucination-prone — naive contexts
produce garbage lattices and spurious implications. Needs fuzzy/relaxed
FCA, calibration, and source-anchoring.
Lens orthogonality is aspirational, not guaranteed: real analytical
lenses (causal, teleological, ethical) overlap and interact; PMEST
itself blurs Personality/Matter. Non-orthogonal lenses inflate the
combinatorics and make emergent 'intersections' artifacts of redundancy
rather than real discovery.
Evaluation / ground truth: there is no benchmark for 'a valuable
relationship no human thought of.' Distinguishing genuine serendipity
from coincidence, restatement, or LLM confabulation is unsolved — and
donto's no-authority-is-ground-truth stance makes automatic scoring even
harder.
Paraconsistency at scale: Cyc kept consistency LOCAL within
microtheories with heavy engineering; donto wants to hold millions of
unanchored, mutually-contradictory machine-proposed relations cheaply.
Query-time lens evaluation over a contradiction frontier of that size is
an open performance + semantics problem (which argument edges win under
which lens, computed fast).
Curation/verification throughput: the historical killer of
Cyc/FrameNet/upper-ontologies was that human curation could not keep up.
The lens engine inherits this at the back end — even if agents generate
cheaply, promoting the rare valuable hypothesis (Lean certification,
evidence review) is human-bottlenecked unless verification is itself
largely automated.
Cost and depth-control of agentic lens-filling: 'to the utmost of
human understanding' per lens per entity is unbounded compute; deciding
how deep each lens goes, and for which entities, without a payoff
signal, is an unsolved economic/scheduling problem.
semantic-decomposition-primitives
The unifying claim of this tradition is that meaning is not atomic —
it decomposes into a small, recurring set of deeper, comparable
components. Five major frameworks instantiate this in importantly
different ways, and together they form the most direct intellectual
ancestry for donto's "many lenses on every entity" vision.
(1) Wierzbicka's Natural Semantic Metalanguage (NSM) is the most
radical reductionist program: roughly 65 indefinable,
cross-linguistically universal "semantic primes" (I, YOU, SOMETHING,
GOOD, BAD, DO, HAPPEN, KNOW, WANT, THINK, BECAUSE, IF, NOT, BEFORE,
PART, KIND, LIKE...) plus a universal mini-grammar and ~50 "semantic
molecules" (man, water, hands). Any concept, however culture-specific,
is "explicated" as a paraphrase built only from primes, so two concepts
from different cultures become directly comparable at the prime level.
NSM is the purest expression of "break meaning to the utmost" — a finite
alphabet of thought.
(2) Pustejovsky's Generative Lexicon (GL, 1991/1995) is the single
most lens-like framework and the most architecturally relevant. Its
QUALIA STRUCTURE assigns every noun FOUR modes of explanation,
explicitly derived from Aristotle's four aitiai (via Moravcsik 1975):
FORMAL (what kind of thing it is), CONSTITUTIVE (its parts/material —
mereology), TELIC (its purpose/function — teleology), and AGENTIVE (how
it came into being — origin/causation). A noun like 'book' carries
formal=physical object, constitutive=pages/text, telic=read(x),
agentive=write(x); 'door' carries telic=pass-through, etc. GL's
generative devices — type coercion ("begin a book" coerces to "begin
reading"), co-composition ("bake a cake" vs "bake a potato"), selective
binding ("fast car" binds to the telic driving event) — and its
dot-objects/complex types (book = PHYSICAL•INFORMATION, a single entity
legitimately under two types at once) solve LOGICAL POLYSEMY without
sense enumeration. This is essentially a four-lens decomposition built
into the lexicon, and the dot-object is a near-exact precedent for "one
entity, multiple co-present aspects."
(3) Schank's Conceptual Dependency (CD, late 1960s–70s, Yale)
decomposes all event meaning into ~11 primitive ACTs (ATRANS
abstract-transfer/give, PTRANS physical-transfer/go, MTRANS
mental-transfer/tell, MBUILD, INGEST, EXPEL, MOVE, GRASP, PROPEL,
ATTEND, SPEAK) plus conceptual cases and states, so paraphrases ("John
gave Mary a book" / "Mary took a book from John") collapse to one
canonical, language-independent representation enabling inference. CD
scaled up into scripts/plans/goals (SAM, PAM). It is the canonical
predicate-decomposition lens and the historical lesson in
over-reduction.
(4) Jackendoff's Conceptual Semantics treats meaning as a level of
THOUGHT (Conceptual Structure), built from a fixed ontology of
categories — Event, State, Thing, Place, Path, Property, Amount —
combined by functions like GO, BE, STAY, CAUSE, INCH. Crucially
Jackendoff argues decomposition is the cognitive-science method itself:
meanings are decomposed into primitives "as the semantic equivalents of
phonological features."
(5) The modern, data-driven heirs are Universal Decompositional
Semantics (UDS; White, Reisinger, Rawlins, Van Durme, 2016–2020) and
Abstract Meaning Representation (AMR; Banarescu et al. 2013). UDS is the
most directly transferable to donto: instead of discrete categories it
annotates each predicate/argument with many SCALAR, real-valued,
confidence-weighted properties across orthogonal dimensions — 18
semantic proto-role properties (volition, sentience, causation,
change-of-state, grounded in Dowty 1991), genericity, factuality,
time/duration, event aspect (telicity/dynamicity), 26 entity supersenses
— over a single graph (PredPatt). That is precisely "many independent
lenses, each a graded hypothesis, layered on one graph." AMR is the
production-scale graph meaning-representation (rooted DAG over PropBank
predicates, "who did what to whom," abstracting away syntax), now with
strong LLM parsers (Smatch ~86) and a 52-language MASSIVE-AMR corpus —
but it deliberately drops tense, number, quantifier scope, and
figurative meaning.
The throughline for the founder: every one of these is, in effect, a
fixed set of LENSES that turn an entity or predicate into deep,
comparable atoms. GL's qualia literally are four lenses; UDS's property
sheets are dozens of scalar lenses. The relationship-discovery payoff
donto wants is exactly what these atoms enable: once two entities are
decomposed into the same primitive vocabulary, latent cross-entity
relations (shared telic purpose, shared agentive origin, matching
proto-role profiles) become computable rather than guessed.
Foundational works:
Natural Semantic Metalanguage (NSM) — semantic primes &
universals — Anna Wierzbicka (with Cliff Goddard) (1972 (14
primes); 1996 Semantics: Primes and Universals; ~65 primes by 2002): All
meaning reduces to ~65 indefinable, cross-linguistically universal
primes + a universal mini-grammar; any concept is an 'explication'
(paraphrase) in primes, making concepts from different cultures directly
comparable at the atomic level. The purest 'break meaning to the utmost'
program. https://en.wikipedia.org/wiki/Natural_semantic_metalanguage
The Generative Lexicon + QUALIA STRUCTURE
(formal/constitutive/telic/agentive) — James Pustejovsky (1991
(Computational Linguistics 17:4); 1995 book): Every noun decomposes into
four Aristotelian 'modes of explanation' = four lenses (FORMAL/kind,
CONSTITUTIVE/parts, TELIC/purpose, AGENTIVE/origin). Generative devices
(type coercion, co-composition, selective binding) + dot-objects (book =
PHYSICAL•INFORMATION) resolve logical polysemy without sense
enumeration. The most lens-like and architecturally relevant precedent.
https://aclanthology.org/J91-4003.pdf
Aristotle's four aitiai (causes) via Moravcsik's reading —
the philosophical root of qualia — Aristotle; J.M.E. Moravcsik
(1975) (c.350 BCE; 1975): Material, formal, efficient (agentive), and
final (telic) cause = four irreducible 'modes of explanation' for any
thing. Pustejovsky's qualia are an explicit modern operationalization.
This is the deepest ancestor of 'analyze every entity through purpose,
origin, parts, kind.' https://en.wikipedia.org/wiki/Four_causes
Conceptual Dependency (CD) — primitive ACTs
(ATRANS/PTRANS/MTRANS...) — Roger Schank (1969–1977 (Stanford
then Yale); scripts/plans/goals with Abelson 1977): All event meaning
reduces to ~11 primitive ACTs + cases + states, giving a canonical,
language-independent representation so paraphrases collapse to one form
and support inference. The canonical predicate-decomposition lens — and
the cautionary tale on over-reduction/coverage. https://en.wikipedia.org/wiki/Conceptual_dependency_theory
Conceptual Semantics — Lexical Conceptual Structure
(Event/State/Thing/Place/Path/Property; GO/BE/CAUSE) — Ray
Jackendoff (1983 Semantics and Cognition; 1990 Semantic Structures;
2002): Meaning is a level of THOUGHT built from a fixed ontology of
categories and functions; decomposition into primitives ('semantic
equivalents of phonological features') is the very method of cognitive
science. Grounds the claim that lensing entities into primitive
components is scientifically principled, not arbitrary. https://en.wikipedia.org/wiki/Conceptual_semantics
Thematic Proto-Roles and Argument Selection (Proto-Agent /
Proto-Patient) — David Dowty (1991 (Language 67:3, 547–619)):
Thematic roles are not discrete labels but CLUSTERS of independent
entailment properties (volition, sentience, causation,
change-of-state...). The direct conceptual seed of UDS's scalar,
multi-property decomposition — i.e. role meaning is itself a multi-lens,
graded profile. https://www.cs.rochester.edu/u/james/Papers/Dowty.1991.pdf
Frame Semantics / FrameNet — frames as situation-lenses with
frame elements — Charles J. Fillmore (FrameNet at ICSI
Berkeley) (1976–1985; FrameNet from 1997): A word's meaning is only
understood relative to a structured background frame (a stereotyped
situation) it evokes, whose slots (frame elements) are filled by
participants. Complements decomposition with a situational/relational
lens; ~1200 frames + frame-to-frame relations are a ready-made lens
library. https://en.wikipedia.org/wiki/FrameNet
Universal Decompositional Semantics (UDS) + Decomp
toolkit — Aaron Steven White, Drew Reisinger, Kyle Rawlins,
Benjamin Van Durme, Elias Stengel-Eskin (2016 (EMNLP); 2019–2020 UDS1.0
+ Decomp): One semantic graph (PredPatt) annotated with MANY orthogonal,
real-valued, confidence-weighted property dimensions (18 proto-roles,
genericity, factuality, time, event aspect, 26 entity supersenses). The
closest existing realization of 'layer many graded lenses, each a
hypothesis, on a single graph.' https://arxiv.org/abs/1909.13851
Abstract Meaning Representation (AMR) for
Sembanking — Laura Banarescu, Claire Bonial, Nathan Schneider,
Martha Palmer et al. (2013 (LAW VII)): Whole-sentence meaning as a
rooted directed acyclic graph over PropBank predicates ('who did what to
whom'), abstracting away syntax so paraphrases share a representation.
The production-scale, parseable graph format — but deliberately omits
tense/number/scope/figurative meaning. https://people.cs.georgetown.edu/nschneid/p/amr.pdf
Undiscovered Public Knowledge / Literature-Based Discovery
(the ABC model) — Don R. Swanson (later Neil Smalheiser) (1986
(Raynaud–fish-oil); 1991 update): If literature A relates to B and B to
C but no one has connected A–C, the A–C relation is latent, discoverable
public knowledge. The canonical prior art for donto's exact payoff —
'relationships between entities no human thought to draw' — done over
decomposed concept terms, not full lenses. https://www.journals.uchicago.edu/doi/10.1086/601720
Lexical Decomposition: For and Against (the Fodor
critique) — Jerry Fodor & Ernie Lepore vs.
decompositionalists (Pustejovsky, Hale & Keyser) (1998 (and
ongoing)): Atomists argue most lexical meaning is
primitive/undecomposable and decomposition smuggles in unverifiable
structure. The essential counter-argument the founder must answer: do
machine-proposed primitive decompositions carry real, falsifiable
content, or just rename the problem? https://www.cs.ox.ac.uk/files/240/lexdecomp.pdf
Modern AI systems:
Decomp toolkit + UDS1.0 dataset (decomp.io) — Open
Python toolkit + dataset that stores a sentence as one semantic graph
with many scalar, confidence-weighted decompositional property layers
(proto-roles, genericity, factuality, time, event aspect, entity type);
queryable. The most concrete 'many graded lenses on one graph' system in
existence. [UDS1.0 unifies 5 decompositional annotation sets;
standard benchmark for decompositional semantic parsing (Stengel-Eskin
et al. 2020). Research-grade, modest adoption; not LLM-native.]https://github.com/decompositional-semantics-initiative/decomp
AMR parsers (LeakDistill / self-knowledge-distillation) +
MASSIVE-AMR — State-of-the-art sentence-to-graph meaning
parsers and a 52-language, ~84k-graph multilingual corpus; the mature
graph meaning-representation pipeline. [Smatch ~84.6–86.1 on
standard AMR benchmarks; GPT-4 zero-shot ~100% structural validity but
lower accuracy; large multilingual corpus. Active 2024–2025 survey
activity.]https://arxiv.org/html/2505.03229v1
LLM ontology-guided KG construction (EDC, ODKE+,
OntoKG-style, schema-grounded extraction) — LLM pipelines that
extract triples/entities and canonicalize them against an ontology; some
explicitly route extractions into 'intrinsic/rigid-sortal' vs
'relational/mixin cross-cutting' modules — a partial echo of multi-facet
decomposition. [Production maturity reported across
finance/health/manufacturing in 2024–2025; surveys note most systems
still LACK evidence-grounding/corroboration of triples — exactly donto's
evidence-first niche.]https://arxiv.org/pdf/2510.20345
LLM hypothesis-generation / scientific-discovery agents
(multi-agent debate, graph-bridging, RAG-grounded) — Agentic
systems that generate, critique, and rank novel hypotheses, sometimes
bridging concepts across causal graphs or noninteractive literatures —
the modern, agentic re-implementation of Swanson's LBD. [Active
2024–2025 surveys (HKUST-KnowComp; NAACL/EMNLP 2025); biomedical
drug-combination multi-agent results in iScience 2025. Evaluation of
genuine novelty remains the open problem.]https://arxiv.org/html/2504.05496v1
Structured-Representation + LLM studies (SR-LLM; 'Role of
Semantic Representations in the LLM Era') — Work probing
whether feeding explicit symbolic/decompositional structures (AMR, etc.)
into LLMs helps or hurts reasoning. [Mixed/negative: naive injection
of structured representations into prompts can DEGRADE LLM reasoning — a
direct caution for how donto should surface lenses to/from agents.]https://arxiv.org/html/2502.14352v1
Relevance to the lens engine: BORROW, concretely:
(1) GL's qualia are a ready-made, defensible STARTING lens-set — donto's
6 apertures could be extended with formal/constitutive/telic/agentive,
which are entity-level (your current 6 are text-extraction-level) and
yield exactly the cross-entity links you want (shared telic purpose,
shared agentive origin, part-of overlap). (2) GL's DOT-OBJECT (book =
PHYSICAL•INFORMATION) is a near-perfect formal precedent for donto's
'identity is a hypothesis' / one entity legitimately under multiple
co-present aspects — cite it; it gives your design philosophical
pedigree. (3) UDS is your closest sibling and the single best model to
imitate: store each lens as SCALAR, CONFIDENCE-WEIGHTED, ORTHOGONAL
properties on a graph rather than discrete labels — that is exactly what
a paraconsistent, hypothesis-weighted substrate wants, and it makes
'relationship at the intersection of lenses' a vector-similarity /
shared-profile query. (4) NSM is the right vocabulary-design discipline:
a small, comparable atom set is what makes two entities from different
domains LINEABLE at all; without a shared decompositional alphabet,
cross-entity discovery degrades to surface string matching. (5)
Swanson/LBD is your proof-of-concept and your evaluation template (ABC
model; closed vs open discovery; replicate a known discovery to
validate). (6) FrameNet's ~1200 frames are a free situational-lens
library. AVOID: (a) Schank's mistake — a fixed, too-small primitive set
that loses coverage (CD covered only a fraction of real-event corpora);
keep lenses OPEN/extensible, not a closed alphabet. (b) AMR's deliberate
amnesia — it drops tense, number, scope, figurative meaning; donto must
NOT collapse those, since temporal/modal/figurative differences are
often where the novel relation hides (and your bitemporal +
paraconsistent design is built precisely to keep them). (c) Feeding raw
symbolic structure into LLMs naively (SR-LLM result: it can hurt) — let
agents produce decompositions but mediate the structure carefully. (d)
The Fodor trap — make each lens's output FALSIFIABLE and
evidence-anchored (your byte-level evidence + Lean certification is the
right answer to 'is this decomposition real content or
relabeling?').
Already done vs white space: ALREADY DONE (do not
reinvent): The four-lens-per-entity idea (GL qualia, 1991), the
many-graded-lenses-on-one-graph idea (UDS), the small-universal-atom
idea (NSM/CD/Jackendoff), the situation/frame lens (FrameNet), the
scalar-multi-property role decomposition (Dowty→UDS), the whole-sentence
graph (AMR), and — critically — the 'discover relationships no human
drew across decomposed concepts' idea (Swanson's literature-based
discovery, 1986, and the entire 2024–2025 LLM-hypothesis-generation
field). The founder's belief that 'no one has thought to do this' is
FALSE at the level of any single component;
relationship-discovery-via-decomposition is a 40-year-old research
program. GENUINE WHITE SPACE (the real novelty is the COMBINATION at
scale, not any piece): (1) No prior system runs the FULL SPECTRUM of
human analytical lenses
(philosophical+ethical+aesthetic+economic+ecological+semiotic+phenomenological,
far beyond linguistic) — every framework above is linguistic/lexical,
narrow by design; an agentic engine that applies dozens of heterogeneous
interpretive lenses is genuinely unattempted. (2) No prior decomposition
substrate is PARACONSISTENT and contradiction-preserving — UDS/AMR/GL
assume one correct analysis; donto can hold mutually contradictory
lens-outputs as legal state forever, which is exactly right for
'speculative machine-proposed relations.' (3) No prior LBD/KG system is
simultaneously evidence-first-to-the-byte AND formally certifiable
(Lean) — this directly answers the Fodor critique and the surveys' #1
complaint (extracted triples lack corroboration). (4) Agentic generation
of lenses at 39M-statement scale with HOLD-then-VERIFY is new: classic
LBD/KG curate eagerly; donto's 'generate speculative, hold without
collapsing, certify the rare valuable few' is an unexplored operating
model. The honest pitch: the lenses, the atoms, and the discovery goal
are all prior art; the AGENTIC + MANY-HETEROGENEOUS-LENS +
PARACONSISTENT-EVIDENCE-FIRST-CERTIFIABLE substrate, at this scale, is
the defensible novelty.
Hard problems:
Combinatorial explosion at the intersection: N entities x M lenses x
pairwise relations is astronomically large; almost all candidate
cross-lens relations are spurious. The hard part is not GENERATING
relations (trivial) but RANKING/pruning them — Swanson's open-discovery
suffers the same 'too many B-terms' problem.
Evaluation of novelty vs. nonsense: there is no accepted metric for
'a valuable relationship no human thought of.' Novel-and-true,
novel-and-false, and trivially-true are hard to separate automatically;
LLM hypothesis-generation surveys flag this as the central unsolved
issue.
The Fodor/atomism objection made operational: does a
machine-proposed primitive decomposition carry real falsifiable content,
or just relabel the entity? Without grounding, lenses produce confident
pseudo-structure (the documented LLM failure mode of copying surface
tokens as 'concepts').
Lens vocabulary commensurability: cross-entity discovery only works
if decompositions share an alphabet (NSM's whole point). Heterogeneous
lenses (ethical vs economic vs phenomenological) may not produce
comparable atoms, so their 'intersection' may be ill-defined.
Coverage vs. fixed primitives (Schank's lesson): too-small a
primitive set loses coverage; too-open a set loses comparability.
Finding the right granularity per lens is unsolved.
Noise and confidence calibration: scalar/weighted decompositions
(UDS-style) need well-calibrated confidence or the paraconsistent store
fills with low-quality contradictions; agent-generated weights are
typically miscalibrated.
LLMs degrade on injected structured representations (SR-LLM
finding): naively round-tripping rich symbolic lens-structure through
agents can reduce reasoning quality, so the human/agent interface to the
lens graph is itself a research problem.
Scaling certification: Lean-style verification can certify
shapes/rules but cannot adjudicate the empirical truth of a discovered
relationship; deciding WHICH of millions of held hypotheses to spend
verification/curation effort on is an open resource-allocation
problem.
Subjective/interpretive lenses (aesthetic, ethical,
phenomenological) have no ground truth — they are inherently
perspectival, so 'to the utmost of human understanding' may have no
convergent target, only a distribution of defensible readings.
network-science-of-discovery
The network-science / science-of-science tradition gives the most
rigorous answer to the donto founder's central, unstated question: not
"how do I generate connections?" but "which generated connections are
VALUABLE?" Its core, empirically-validated finding is that value lives
in a specific place — at the BRIDGES between otherwise-disconnected
clusters, and in the ATYPICAL recombination of distant elements — but
only when that novelty is anchored in convention. This is the field's
deepest result and it directly contradicts a naive reading of the
founder's intuition. Volume of connections is worthless;
positionally-improbable connections are everything.
The lineage runs through three nested layers. (1) Network structure:
Granovetter's "strength of weak ties" (1973) showed that novel
information flows across bridges (weak, non-redundant ties), not within
dense clusters where everyone already knows the same things. Burt
formalized this as STRUCTURAL HOLES — a gap between two clusters with
non-redundant information — and his "Structural Holes and Good Ideas"
(AJS 2004, the Raytheon study) demonstrated empirically that managers
whose networks SPAN holes have a "vision advantage": their ideas are
disproportionately rated as valuable, less likely to be dismissed.
Burt's line "the creative spark on which serendipity depends is to see
bridges where others see holes" is almost a literal mission statement
for a lens-intersection engine. (2) Combinatorics of discovery:
Weitzman's "Recombinant Growth" (1998) and Arthur's "The Nature of
Technology" (2009) model innovation as recombination of existing
components, with the supply of ideas effectively unbounded — the binding
constraint is the R&D/evaluation effort to test combinations, not
the combinations themselves. Kauffman's "adjacent possible" and the TAP
equation (Cortês, Steel, Kauffman et al.) formalize how the space of
possible combinations explodes (a long plateau then a hockey-stick) as
each new object opens new adjacent recombinations. (3) Empirical scoring
of novelty value: Uzzi, Mukherjee, Stringer & Jones, "Atypical
Combinations and Scientific Impact" (Science 2013, 17.9M papers) is the
keystone. They measure a paper's combinations by z-scoring every pair of
co-referenced journals against a degree-preserving randomized null (how
surprising is this pairing vs chance), then take the paper's MEDIAN
conventionality and its 10th-percentile TAIL novelty. The hit finding:
the highest-impact papers are NOT the most novel — they sit in the
high-conventionality / high-tail-novelty quadrant. A bedrock of
convention with a sharp intrusion of one atypical combination is 2x more
likely to be a hit. Pure novelty underperforms.
The science-of-science tradition also quantifies the OPPOSITE problem
the founder will hit. Foster, Rzhetsky & Evans, "Tradition and
Innovation in Scientists' Research Strategies" (ASR 2015), mapped
millions of biomedical claims as a network of chemical relationships and
showed scientists overwhelmingly play it safe (extending known nodes)
because the reward premium for risky bridging strategies, though real
(higher expected impact), is insufficient to compensate for the higher
chance of being ignored. Wang, Veugelers & Stephan, "Bias Against
Novelty in Science" (Research Policy 2017), showed the most novel papers
are SYSTEMATICALLY undervalued in short windows, suffer delayed
recognition, and are cited mainly in "foreign" fields — precisely
because no single evaluator holds all the lenses. This is the strongest
external validation of the founder's thesis: there is a real, measurable
surplus of value in cross-lens bridges that human, discipline-bounded
evaluation leaves on the table. The Funk/Owen-Smith CD-index and the
Park et al. (Nature 2023) "disruption is declining" work give an
alternative, network-based way to score whether a connection
CONSOLIDATES or DISRUPTS its neighborhood.
Crucially for donto's paraconsistent design, Chen, Ding &
Evans-style work — "New Directions in Science Emerge from Disconnection
and Discord" (arXiv 2103.03398) — shows that DISAGREEMENT/contradiction
between clusters, not just disconnection, is the strongest predictor of
where new scientific directions emerge. Bridges that span a structural
hole AND carry discord are disproportionately generative. This is the
empirical warrant for holding contradictions as legal state rather than
collapsing them: a contradiction frontier IS a map of where novel
directions are most likely.
The throughline for a discovery-scoring engine: a discovered
relationship should be scored not by plausibility alone but by (a) the
network DISTANCE/improbability of the entities it bridges
(structural-hole span, z-score atypicality), (b) the CONVENTIONALITY of
its surrounding scaffold (Uzzi: anchor the leap in known ground), and
(c) the presence of unresolved DISCORD across the bridge. Score for
surprise-given-grounding, not for either alone.
Foundational works:
The Strength of Weak Ties — Mark Granovetter
(1973): Novel information flows across weak, bridging ties between
clusters — not within dense clusters where knowledge is already
redundant. The bridge, not the hub, carries novelty. First formal claim
that valuable connections are positional. https://www.cs.cmu.edu/~jure/pub/papers/granovetter73ties.pdf
Structural Holes / Brokerage; 'Structural Holes and Good
Ideas' (AJS 2004, Raytheon study) — Ronald S. Burt (1992 /
2004): A 'structural hole' is a gap between clusters with non-redundant
information; people whose networks SPAN holes have a 'vision advantage'
— their ideas are disproportionately judged valuable, less often
dismissed. 'The creative spark on which serendipity depends is to see
bridges where others see holes.' This is the empirical core: value is at
the bridge, and it's measurable. http://www.ronaldsburt.com/research/files/SHGI.pdf
Atypical Combinations and Scientific Impact (Science, 17.9M
papers) — Uzzi, Mukherjee, Stringer, Jones (2013): The keystone
scoring method. z-score every co-referenced journal pair vs a
degree-preserving randomized null; take a paper's MEDIAN conventionality
+ 10th-percentile TAIL novelty. Hits are 2x more likely in the
HIGH-conventionality + HIGH-tail-novelty quadrant. Pure novelty
underperforms — anchor the atypical leap in conventional ground. The
exact recipe for scoring a discovered relationship for VALUE not just
plausibility. https://www.science.org/doi/10.1126/science.1240474
Tradition and Innovation in Scientists' Research Strategies
(ASR) — Foster, Rzhetsky, Evans (2015): Mapped millions of
biomedical claims as a network of chemical relations; scientists
overwhelmingly choose conservative (extend-known-node) strategies. Risky
bridging strategies have higher expected impact but the premium is too
small to offset the elevated chance of being ignored — so the search
space is systematically under-explored. The market gap donto could
exploit: machines bear the risk humans rationally avoid. https://arxiv.org/abs/1302.6906
Recombinant Growth + The Nature of Technology (combinatorial
models of innovation) — Martin Weitzman; W. Brian Arthur (1998
/ 2009): Innovation = recombination of existing components. The supply
of possible combinations is effectively unbounded; the binding
constraint is the R&D/EVALUATION effort to test them, not generating
them. Tells the founder: generation is cheap (true for LLM lenses too),
so the engine's whole value is the evaluation/triage filter, not the
combinatorial firehose. https://mattsclancy.com/wp-content/uploads/2023/01/Recombinant-Growth.pdf
The 'adjacent possible' & the TAP equation —
Stuart Kauffman; Cortês, Steel, Herriot et al. (2000 / 2022): Each
realized combination opens new ADJACENT combinations; the space grows as
a long plateau then an explosive hockey-stick. Explains why a many-lens
decomposition keeps yielding new bridges as it runs (each new
entity/lens-fact expands the adjacent possible) — but also why
uncontrolled expansion combinatorially explodes and must be bounded. https://arxiv.org/abs/2204.14115
New Directions in Science Emerge from Disconnection and
Discord — Chen, Ding, Evans et al. (2021): New scientific
directions emerge most where clusters are both DISCONNECTED (structural
hole) AND in DISCORD (contradictory). Disagreement is generative, not
noise. Direct empirical warrant for donto's paraconsistent,
contradiction-preserving design: the contradiction frontier is a map of
where novel relationships are most likely to pay off. https://arxiv.org/pdf/2103.03398
Bias Against Novelty in Science — Wang, Veugelers,
Stephan (2017): The most novel (highest-atypicality) papers are
systematically undervalued in short windows, show delayed recognition,
and are cited mostly in 'foreign' fields. Because no single
discipline-bounded evaluator holds all lenses, real cross-lens value is
left on the table — the strongest external validation of the founder's
'no one holds all the lenses at once' intuition. https://www.nber.org/papers/w22180
Conceptual Blending / Combinational Creativity —
Fauconnier & Turner; Margaret Boden; Arthur Koestler (bisociation)
(2002 / 1990 / 1964): The cognitive-science account of HOW novelty
arises from combining distant mental spaces (blending), and Boden's
taxonomy (combinational / exploratory / transformational). Gives a
vocabulary for what a lens-intersection actually produces and why
cross-domain blends feel creative — the micro-mechanism beneath the
macro network finding. https://en.wikipedia.org/wiki/Conceptual_blending
Modern AI systems:
SciAgents (MIT, Buehler lab) — Multi-agent system
that builds a large ontological knowledge graph (~33K nodes / 49K edges
from ~1000 papers), then samples RANDOM (not shortest) paths between two
distant/random concept nodes to seed a hypothesis; specialized agents
(Ontologist, Scientist_1/2, Critic) expand the path into a structured
proposal (hypothesis, mechanism, novelty, unexpected properties) and
score novelty/feasibility against Semantic Scholar. This is the closest
existing system to donto's lens-intersection vision. [Published in
Advanced Materials (2024/2025); generated genuinely cross-domain
bioinspired-materials hypotheses (e.g. silk + structural-coloration).
Random-path sampling explicitly chosen to 'infuse the path with a richer
array of concepts' — empirical confirmation that bridging distant nodes
beats shortest-path.]https://arxiv.org/abs/2409.05556
Accelerating science with human-aware AI (Sourati &
Evans) — Builds a hypergraph of materials, properties, and the
researchers who study them; 'human-aware' random walks model not just
what's logically possible but what's COGNITIVELY REACHABLE by the human
expert crowd — then deliberately 'avoids the crowd' to surface valuable
'alien' hypotheses far from human reach. The single most relevant
value-scoring idea for donto. [Nature Human Behaviour 2023.
Human-aware models improved prediction of which discoveries will
actually be made by UP TO 400% over content-only models, especially in
sparse literature; the inverse mode generates promising hypotheses
'unlikely to be imagined until the distant future.']https://pubmed.ncbi.nlm.nih.gov/37443269/
Literature-Based Discovery (Swanson ABC) and modern KG
link-prediction descendants — Swanson's 1986 fish-oil/Raynaud's
discovery: A-B and B-C links in disjoint literatures imply an untested
A-C. Modern versions do temporal link-prediction / graph-embedding over
biomedical KGs (e.g. AGATHA, SemMedDB-based systems, the
active-curriculum temporal-graph LBD work) to propose A-C edges that
bridge disconnected literatures. [Swanson's original hypotheses
(fish oil/Raynaud's, magnesium/migraine) were later clinically
validated. Modern LBD is an active field; link-prediction over
biomedical KGs is the canonical 'find the missing bridge edge'
formulation that donto's hypothesis_only edges resemble.]https://link.springer.com/article/10.1007/s10462-024-10885-1
Mat2vec — unsupervised word embeddings capture latent
knowledge (Tshitoyan et al.) — Word2vec over 3.3M
materials-science abstracts; the embedding geometry encoded undiscovered
structure — recommending thermoelectric materials YEARS before they were
reported in the literature. Proof that latent cross-document
relationships exist and are extractable without supervision. [Nature
2019. Predicted several materials later experimentally confirmed as
thermoelectrics; demonstrated relationships 'lay dormant' in the
literature, recoverable by geometry — the empirical existence proof for
donto's latent-structure thesis.]https://www.nature.com/articles/s41586-019-1335-8
Analogy Mining / cross-domain bridging (Hope, Chan, Kittur,
Shahaf) — Learns purpose-vs-mechanism representations from
patents/product descriptions so a problem in one domain can be matched
to a structurally-analogous solution in a DISTANT domain —
operationalizing structural-hole brokerage as a retrieval problem.
[KDD 2017 Best Paper; follow-up PNAS 2019 'Scaling up analogical
innovation with crowds and AI' showed analogies surfaced by the system
led humans to generate more creative solutions. Directly validates the
value of bridging distant domains.]https://arxiv.org/abs/1706.05585
AI co-scientist (Google DeepMind) / Robin / agentic
discovery survey — General multi-agent systems (generate /
debate / rank / evolve hypotheses) for end-to-end hypothesis generation,
design, and analysis. Tournament-style ranking among competing
hypotheses is the relevant pattern for triaging donto's many
machine-proposed relationships. [Co-Scientist (Gemini 2.0) reported
novel hypotheses in drug repurposing/AMR later validated in wet-lab;
Robin reportedly cut a discovery cycle from ~900 human-hours to under 2.
Caveat: a 2026 critique ('Agentic AI Scientists Are Not Built For
Autonomous Scientific Discovery') argues current agents over-produce
plausible-but-unvalidated hypotheses.]https://deepmind.google/blog/co-scientist-a-multi-agent-ai-partner-to-accelerate-research/
Hypothesis-evaluation benchmarks (TruthHypo/TruthEval,
ScholarEval, ProjectionBench) — 2024-2025 systems that grade
generated hypotheses on truthfulness, soundness (have analogous methods
worked before?), novelty, and contribution — combining literature
retrieval + KG retrieval to filter the firehose. [Active 2025
research; consistent finding that LLMs generate MORE NOVEL but LESS
VALID hypotheses than humans — the exact triage problem donto must
solve. Directly relevant to building a value/validity filter atop a
high-volume generator.]https://www.ijcai.org/proceedings/2025/0873.pdf
Relevance to the lens engine: This area is donto's
scoring layer — it tells the engine how to RANK the relationships its
many-lens decomposition proposes. BORROW: (1) Uzzi's exact recipe — for
any discovered relationship, compute a z-score atypicality against a
degree-preserving randomized null over the entity graph, then favor
relationships that pair HIGH conventional scaffolding with a
HIGH-novelty TAIL (a single surprising bridge anchored in known ground),
not maximal novelty. This converts 'plausible' into 'valuable.' (2)
Burt's structural-hole span — score a proposed edge by how many
non-redundant clusters it connects and how large the hole it bridges;
brokerage betweenness over the entity graph is a directly computable
value signal, and donto already has the quad graph to compute it. (3)
Sourati-Evans 'avoid the crowd' — model which relationships are already
cognitively reachable (densely co-occurring, low surprise) and
DOWN-weight them; up-weight the 'alien' bridges far from existing
co-mention, which is where the unrecovered surplus value sits. (4)
Chen/Ding/Evans discord+disconnection — donto's contradiction frontier
is not a bug to resolve but a PRIORITY MAP: rank candidate relationships
highest where they bridge disconnected clusters that also carry
argument-edge discord (supports/rebuts). donto's paraconsistent
substrate is uniquely able to hold and exploit this signal where a
consistency-enforcing store would have destroyed it. (5)
Weitzman/Arthur/Kauffman combinatorics — internalize that generation is
cheap and unbounded; the engine's entire moat is the triage filter, and
the adjacent-possible explosion means you MUST bound exploration (sample
paths, cap fan-out) or drown. AVOID: (a) optimizing for raw novelty or
raw volume — Wang/Veugelers/Stephan and Uzzi both show pure novelty is
low-value and even penalized; (b) shortest-path / nearest-neighbor
relationship discovery — SciAgents found random/distant paths strictly
better for creativity; (c) treating LLM-rated plausibility as value —
the 2025 benchmarks show LLMs over-produce plausible-invalid hypotheses,
so plausibility must be a gate, never the ranking. Net: donto should
ship a 'brokerage + atypicality + discord' composite score as the lens
it applies at query time to triage machine-proposed hypothesis_only
edges, and use Lean-4 certification only on the thin top slice that
survives.
Already done vs white space: ALREADY DONE (the
founder should not reinvent): (1) The CORE THESIS that valuable
connections live at bridges/atypical combinations is not a hunch — it is
one of the most replicated results in social science
(Granovetter→Burt→Uzzi→Foster/Evans, across millions of papers). (2) The
exact MATH to score a connection's value-improbability already exists
and is open (Uzzi z-score atypicality, Burt brokerage/effective-size,
CD/disruption index, Novelpy package). (3) MANY-LENS / cross-domain
GRAPH TRAVERSAL to generate bridging hypotheses is a shipped product
category — SciAgents (random-path graph reasoning), Sourati-Evans
(human-aware walks), analogy mining, LBD link-prediction, AI
co-scientist all do 'find the bridge no one drew.' (4) The empirical
proof that latent cross-document relationships exist and are recoverable
(mat2vec) is settled. GENUINE WHITE SPACE — the defensible combination:
(a) PERSISTENT, PARACONSISTENT HOLDING of speculative relationships as
first-class legal state. Every system above generates hypotheses
transiently and either validates-or-discards them; NONE holds a durable,
contradiction-preserving, evidence-anchored frontier of millions of
unresolved machine-proposed edges that can be re-queried, re-scored, and
accreted over time as new lenses/entities arrive. donto's bitemporal
contradiction store turns one-shot generation into a compounding asset.
(b) SCALE + GENERALITY: the discovery systems are domain-locked
(materials, biomedicine); donto is a general 39.5M-statement substrate,
so it can compute brokerage/atypicality across domains that have never
been jointly indexed — exactly the foreign-field surplus
Wang/Veugelers/Stephan showed is undervalued. (c) EVIDENCE-ANCHORING +
LEAN CERTIFICATION of the survivors: no discovery system byte-anchors
every claim AND offers a formal certification overlay, which is
precisely what closes the 'plausible-but-invalid' gap the 2025
benchmarks expose. (d) IDENTITY-AS-HYPOTHESIS: discovery in these
systems assumes fixed entities; donto's queryable-merge-under-a-lens
means the SAME substrate can discover relationships under different
identity resolutions — a genuinely unexplored degree of freedom. So 'no
one has thought of the many-lens bridge idea' is FALSE; 'no one has
built a persistent, paraconsistent, evidence-first, cross-domain
substrate that holds and compounds the firehose and then certifies the
survivors' is essentially TRUE and is the real moat.
Hard problems:
The plausible-vs-valuable gap: LLM/agentic generators reliably
produce MORE NOVEL but LESS VALID outputs (2025 benchmarks).
Plausibility scoring is necessary but never sufficient; value requires
the network/atypicality signal AND grounding, and even then most
candidates are false. donto's filter, not its generator, is the whole
ballgame.
Combinatorial explosion / triage at scale: Weitzman/Kauffman
guarantee the candidate space grows super-linearly (TAP hockey-stick).
Across 39.5M statements x many lenses, the number of proposable bridges
is astronomically larger than anything you can verify. You need cheap,
computable pre-filters (brokerage, z-score) before any LLM/Lean touch,
and a principled exploration budget.
Defining the right null model for atypicality: Uzzi's z-score
depends entirely on a degree-preserving randomized null. Over a
heterogeneous, bitemporal, paraconsistent quad graph (not a clean
co-citation network), what is the correct null? A wrong null makes every
cross-context edge look 'atypical' and floods the frontier with junk
surprise.
The novelty-impact paradox / delayed recognition:
Wang-Veugelers-Stephan show the most valuable novel connections are
precisely the ones that look worthless on short horizons and only pay
off later in foreign fields. Any greedy value score will systematically
discard the highest-value bridges. You need scoring that tolerates
delayed/foreign validation — hard to operationalize without ground-truth
feedback loops.
Evaluation without ground truth: there is no oracle for 'is this
discovered relationship true/valuable?' Science-of-science uses future
citations as a noisy proxy; donto has no equivalent. Bootstrapping a
value signal without circular reliance on the same LLMs that generated
the candidates is unsolved.
Distinguishing generative discord from mere error: Chen/Ding/Evans
show contradiction is generative — but most contradictions in an
auto-extracted substrate are extraction noise, coreference failures, or
stale facts, not productive scientific discord. Separating the valuable
contradiction frontier from the garbage contradiction frontier is an
open, donto-specific problem.
Brokerage is computed over the WRONG graph if extraction is biased:
structural-hole and weak-tie measures assume the absence of an edge
means genuine disconnection. In an LLM-extracted graph, a missing edge
often just means the extractor didn't read those two sources together —
so a 'structural hole' may be an artifact of coverage, not a real bridge
opportunity. Coverage bias contaminates the core value signal.
Combinatorial creativity does not equal correctness:
conceptual-blending/bisociation explains why cross-lens blends FEEL
novel, but the cognitive-science tradition has no account of which
blends are true. Borrowing the generative mechanism without a validity
gate reproduces the hallucination problem at scale.
multi-perspective-agentic-reasoning
The founder's vision — decompose any entity through the full spectrum
of human analytical lenses (philosophical, temporal, causal,
mereological, teleological, ethical, semiotic, etc.) and harvest the
RELATIONSHIPS that emerge at the INTERSECTION of lenses — sits at the
confluence of a deep philosophical lineage and a very active 2023-2026
AI research front. The intellectual root is PERSPECTIVISM (Nietzsche:
knowledge is irreducibly perspectival, and crucially his
methodological perspectivism — "the more affects we allow to
speak about a thing, the more complete will be our concept of it";
Ortega y Gasset; Wittgenstein's aspect-seeing). The engineering root is
Minsky's "Society of Mind" (1986): intelligence as the emergent product
of many simple, specialized, non-intelligent agents. The discovery root
is Don Swanson's Literature-Based Discovery (1986, fish-oil/Raynaud's):
valuable knowledge already exists latently as UNCONNECTED public facts
across disciplinary silos (the A-B-C model), and the payoff is
connecting them — which is almost exactly the founder's "relationships
no human thought to draw because no human holds all the lenses."
Conceptual Blending Theory (Fauconnier & Turner) supplies the
cognitive mechanism for why cross-frame combination is generative rather
than merely additive.
The modern AI realization is the multi-agent / multi-perspective LLM
literature. Du, Li, Tenenbaum & Mordatch's "multiagent debate"
(2023, ICML 2024) showed multiple LLM instances proposing and critiquing
over rounds improves factuality and math/strategic reasoning —
explicitly framed as a "society of minds." Tree-of-Thoughts (Yao et al.
2023) and Graph-of-Thoughts (Besta et al. 2023/AAAI 2024) generalize
single-chain reasoning to branched/graph search with self-evaluation,
lookahead, backtracking, and — in GoT — synergistic recombination of
intermediate thoughts, the structural analog of intersecting
lenses. Solo-Performance-Prompting (Wang et al., NAACL 2024) is the most
direct precursor to the founder's "many lenses on one object": a single
LLM dynamically identifies and simulates multiple task-relevant PERSONAS
("cognitive synergy"), and critically finds that DYNAMICALLY-IDENTIFIED,
fine-grained personas ("Film Expert") beat fixed generic ones ("Expert")
— e.g. 79% vs 38% on Codenames — though synergy only EMERGES at
GPT-4-level capability. Mixture-of-Agents (Wang et al. 2024) layers
proposer LLMs whose outputs are aggregated, beating GPT-4-Omni on
AlpacaEval. CAMEL (Li et al., NeurIPS 2023) and AutoGen operationalize
role-based agent ensembles as infrastructure.
The single most important system for this vision is Google DeepMind's
AI co-scientist (Gomes et al., arXiv 2502.18864, Feb 2025; Nature 2026).
It is a near-literal instantiation of the
generate→hold-many→curate-the-valuable pipeline the founder describes,
with named specialized agents: a Generation agent proposes hypotheses; a
PROXIMITY agent clusters them specifically so the system does not
collapse into a single line of thinking (the anti-redundancy
mechanism); a Reflection agent acts as virtual peer reviewer scoring
novelty/correctness/rigor; a Ranking agent runs an Elo "idea tournament"
of simulated debates; an Evolution agent recombines and refines top
hypotheses; a meta-review agent feeds back. It produced
experimentally validated novel findings (AML drug-repurposing
candidates with in-vitro tumor inhibition; novel epigenetic
liver-fibrosis targets validated in human organoids; in-silico
rediscovery of an unpublished gene-transfer mechanism). This is concrete
evidence that agentic multi-perspective generation-plus-curation yields
genuinely novel, valuable relationships — not just redundancy.
On the founder's central empirical question — does diverse
decomposition produce EMERGENT INSIGHT or just REDUNDANCY? — the
literature gives a sharp, honest answer: BOTH, and which one you get is
a design problem, not a guarantee. The rigorous theory is the
bias-variance-DIVERSITY decomposition (Wood, Mu, Brown et al., JMLR
2023): an ensemble's expected error = average bias + average variance −
DIVERSITY, where diversity is precisely member DISAGREEMENT. Diversity
is provably valuable, BUT only when members are individually competent
(if "experts disagree very frequently they are individually poor
estimators"). The cautionary 2024-2026 evidence is strong: "Talk Isn't
Always Cheap" (2509.05396) shows debate frequently DEGRADES accuracy —
agents flip from correct to incorrect under social/peer pressure
(conformity dominates truth-seeking), weak agents contaminate strong
ones, and accuracy can fall over rounds. "Representational Collapse in
Multi-Agent LLM Committees" (2604.03809) measured that 3 same-model
agents under different role prompts had mean cosine similarity 0.888 and
effective rank 2.17/3 — i.e. nominal "diversity" via persona prompts can
be largely ILLUSORY. The "tyranny of the majority" / echo-chamber effect
is documented repeatedly. The constructive response is diversity-AWARE
design: diversity-aware message retention (2603.20640), structured
disagreement analysis for uncertainty (DiscoUQ 2603.20975), and the
co-scientist's Proximity-agent clustering — all aimed at PRESERVING
genuine divergence instead of letting it collapse.
Foundational works:
Perspectivism (esp. Nietzsche's methodological
perspectivism) — Friedrich Nietzsche (also Ortega y Gasset,
Karl Jaspers; Wittgenstein on aspect-seeing) (1887 (On the Genealogy of
Morality III §12)): Knowledge is irreducibly perspectival; and crucially
the PRESCRIPTIVE/methodological claim that engaging MORE affects and
viewpoints on a thing yields a more complete concept of it — the
explicit philosophical warrant for 'the more lenses, the fuller the
understanding.' https://en.wikipedia.org/wiki/Perspectivism
The Society of Mind — Marvin Minsky (1986):
Intelligence is an emergent property of a large collection of simple,
specialized, individually non-intelligent agents — the canonical
blueprint for decomposing cognition into many narrow lenses/agents whose
interaction produces understanding. https://en.wikipedia.org/wiki/Society_of_Mind
Literature-Based Discovery & 'undiscovered public
knowledge' (the A-B-C model; fish-oil/Raynaud's) — Don R.
Swanson (1986): Valuable relationships already exist LATENTLY as facts
that are individually published but never CONNECTED across disciplinary
silos; discovery = joining A-B and B-C literatures to surface an unseen
A-C link. This is precisely the founder's 'relationships no one thought
to draw.' https://pmc.ncbi.nlm.nih.gov/articles/PMC7924697/
Conceptual Blending (Conceptual Integration) Theory
— Gilles Fauconnier & Mark Turner (2002 (The Way We Think)): Novel
meaning arises when two distinct mental 'input spaces' (frames/lenses)
are selectively projected into a blended space with emergent structure
not present in either input — the cognitive mechanism for why combining
lenses is generative, not merely additive. https://arxiv.org/pdf/2505.10948
Bias-Variance-Diversity decomposition (A Unified Theory of
Diversity in Ensemble Learning) — Danny Wood, Tingting Mu,
Gavin Brown et al. (2023 (JMLR)): Formal proof that ensemble error = avg
bias + avg variance − DIVERSITY, where diversity IS member disagreement;
diversity is provably beneficial but only when members are individually
competent. The mathematical answer to 'diverse insight vs. redundancy.'
https://jmlr.org/papers/volume24/23-0041/23-0041.pdf
Modern AI systems:
Multiagent Debate ('society of minds') — Multiple
LLM instances independently propose answers, then read each other's
reasoning and revise over several rounds toward consensus; improves
factuality and math/strategic reasoning. The seminal modern
multi-perspective-reasoning result. [Du/Li/Tenenbaum/Mordatch, ICML
2024; widely replicated; significant gains on GSM8K/strategic reasoning
and reduced hallucination vs single-pass.]https://arxiv.org/abs/2305.14325
Tree of Thoughts (ToT) / Graph of Thoughts (GoT) —
Generalize chain-of-thought to branched (tree) or arbitrary-graph search
over reasoning units with self-evaluation, lookahead, backtracking; GoT
adds synergistic RECOMBINATION and feedback of intermediate 'thoughts' —
the structural analog of combining outputs across lenses. [ToT (Yao
et al. 2023) Game-of-24 success 4%→74%; GoT (Besta et al., AAAI 2024)
improves quality and cuts cost vs ToT on sorting/set tasks.]https://arxiv.org/pdf/2401.14295
Solo Performance Prompting (SPP) — multi-persona
self-collaboration / 'cognitive synergy' — A SINGLE LLM
dynamically identifies and simulates multiple task-relevant personas
that collaborate — the closest existing precursor to 'many analytical
lenses on one object then combine.' [Wang et al., NAACL 2024.
Dynamic fine-grained personas beat fixed generic ones (Codenames 79% vs
38%); reduces hallucination; BUT synergy EMERGES only at GPT-4-level
capability (absent in GPT-3.5/Llama2-13b).]https://aclanthology.org/2024.naacl-long.15/
AI co-scientist (Gemini-based multi-agent) —
Generate→debate→evolve hypothesis engine with named specialized agents:
Generation, Proximity (clusters to PREVENT collapse to one line of
thought), Reflection (peer-review for novelty/rigor), Ranking (Elo 'idea
tournament'), Evolution (recombine/refine), Meta-review. The most
complete instantiation of generate-many / hold-many /
curate-the-valuable. [DeepMind, arXiv Feb 2025, Nature 2026.
EXPERIMENTALLY VALIDATED novel results: AML drug-repurposing (in-vitro
tumor inhibition), novel epigenetic liver-fibrosis targets (validated in
human organoids), in-silico rediscovery of an unpublished gene-transfer
mechanism.]https://arxiv.org/abs/2502.18864
Mixture-of-Agents (MoA) — Layered architecture
where multiple proposer LLMs' outputs are fed to aggregator LLMs that
synthesize a better answer; exploits 'collaborativeness' — aggregation
improves output even when individual auxiliary responses are weaker.
[Wang et al. 2024 (ICLR 2025). Open-source MoA 65.1% vs GPT-4-Omni
57.5% on AlpacaEval 2.0.]https://arxiv.org/abs/2406.04692
CAMEL / AutoGen (role-based agent infrastructure) —
Frameworks for orchestrating ensembles of role/persona-specialized
agents that converse autonomously (CAMEL: inception prompting +
role-playing; AutoGen: programmable multi-agent conversation with
critics/supervisors/tools) — the plumbing for instantiating N lenses as
agents. [CAMEL (Li et al., NeurIPS 2023) and Microsoft AutoGen are
widely adopted open-source multi-agent libraries underpinning much of
the 2024-2026 multi-agent work.]https://arxiv.org/abs/2303.17760
Multi-agent KG construction (CooperKGC / KARMA / multi-view
RE) — Teams of specialized agents (NER, relation extraction,
event extraction; entity-/concept-/mention-view inference) that
collaboratively build knowledge graphs and reconcile/enrich triples with
conflict resolution — directly relevant to lens-decomposition feeding a
substrate. [CooperKGC (arXiv 2312.03022) shows varied-expertise
agents improve KGC; KARMA (2025) adds multi-agent enrichment with
conflict resolution; multi-view RE improves relation extraction
F1.]https://arxiv.org/pdf/2312.03022
Debate/committee FAILURE-MODE & collapse
studies — Empirical work documenting when multi-perspective
debate HURTS: conformity (correct→incorrect flips under peer pressure),
weak-agent contamination, accuracy decay over rounds, and
'representational collapse' where same-model persona agents are
near-identical in embedding space. ['Talk Isn't Always Cheap'
(2509.05396): debate degrades accuracy in many configs.
'Representational Collapse' (2604.03809): 3 same-model agents cosine
0.888, effective rank 2.17/3 — persona 'diversity' often illusory.]https://arxiv.org/html/2509.05396v2
Relevance to the lens engine: BORROW: (1) The
co-scientist topology is the proven recipe for donto's lens engine — a
Generation phase (run N lens-agents over an entity/text) feeding a
PROXIMITY/clustering step (essential: without it you get redundancy
collapse, not emergent relationships), then Reflection/Ranking via an
Elo idea-tournament to curate the rare valuable cross-lens
relationships, then Evolution to recombine survivors. donto's
paraconsistent substrate is the ideal place to HOLD the
generated-but-unranked tournament population that co-scientist keeps
only in-memory. (2) SPP's strongest, most actionable lesson: DYNAMIC,
task-specific lenses beat a fixed generic list — so rather than
hard-coding the same 6/N philosophical lenses every time, let an agent
pick the fine-grained lenses an entity actually rewards (a treaty
rewards 'legal/temporal/diplomatic-game-theory'; a poem rewards
'prosodic/semiotic/phenomenological'). (3) GoT's
recombination-of-thoughts is the literal mechanism for 'relationships at
the intersection of lenses' — model lens-outputs as graph vertices and
explicitly generate edges BETWEEN them; do not just concatenate per-lens
fact lists. (4) Swanson LBD + conceptual blending are the right framing
for the PAYOFF metric: a valuable output is an A-C relationship surfaced
because lens-A and lens-C share a B-term — instrument for that, not for
raw fact count. AVOID / GUARD AGAINST: (a) Redundancy/representational
collapse — same base model under N persona prompts gives ~rank-2
'diversity' (cosine 0.888); donto must measure semantic diversity of
lens outputs (effective rank / pairwise distance) and discount
near-duplicates, or genuinely vary models/temperature/tools per lens.
(b) Conformity & 'tyranny of the majority' — do NOT make lenses
debate to consensus; donto's paraconsistent design is a STRENGTH here
precisely because it can preserve minority/contradictory lens-claims as
legal state (hypothesis_only, supports/rebuts/undercuts edges) instead
of collapsing them — this is donto's genuine differentiator over every
debate-to-consensus system. (c) Weak-agent contamination — a low-quality
lens degrades the pool; gate lens-outputs by an individual-competence
check (bias-variance-diversity theory: diversity only helps among
competent members). (d) Cost/emergence floor — SPP shows synergy only
emerges at frontier capability; budget for strong models on the
generation lenses or you'll get redundancy, not insight.
Already done vs white space: ALREADY DONE (the
founder should NOT assume 'no one has thought of this'): The core loop —
run many specialized perspectives/agents over a problem, hold a
population of candidate hypotheses, debate/rank/evolve them, and surface
validated novel ones — is fully built and peer-reviewed in the AI
co-scientist (Nature 2026), with WET-LAB-validated novel discoveries.
'Many personas/lenses on one object then combine' is done at the prompt
level by SPP (NAACL 2024) and at the architecture level by MoA, CAMEL,
AutoGen, multiagent debate, and ToT/GoT. The conceptual claim that
latent cross-silo relationships are the prize is 40 years old (Swanson
LBD) and being actively LLM-ified (Elicit, SKiM, the 2024 MDPI LBD work,
hypothesis-generation surveys arXiv 2504.05496). 'Many critical lenses
over one text' is standard literary pedagogy and is being studied for
LLMs (arXiv 2507.11582). Multi-view/multi-agent KG construction with
conflict resolution (CooperKGC, KARMA) overlaps donto's extraction
layer. GENUINE WHITE SPACE (donto's defensible novelty is the
COMBINATION, not any single piece): (1) PERSISTENCE & SCALE — every
system above generates-and-discards within a single session/query; NONE
durably HOLDS the full speculative cross-lens relationship population as
queryable, bitemporal, evidence-anchored legal state across millions of
entities. donto can keep the 99% of machine-proposed relationships that
co-scientist throws away, forever, for later re-evaluation as
lenses/evidence improve. (2) PARACONSISTENT CO-EXISTENCE — every
debate/committee system is consensus-seeking and thus actively destroys
the minority and contradictory readings; donto's
contradiction-preserving substrate with typed argument edges
(supports/rebuts/undercuts) and identity-as-hypothesis is, as far as the
literature shows, UNIQUE as a place to let mutually-contradictory
cross-lens relationship-claims coexist without collapse. (3)
CROSS-ENTITY × CROSS-LENS at substrate scale — the systems above run
many lenses over ONE object; the founder's distinctive move is
harvesting relationships at the intersection of lenses ACROSS millions
of entities simultaneously (a global LBD over a 39M-statement graph).
That global, always-on, lens-indexed serendipity surface does not exist
in the literature. (4) FORMAL CERTIFICATION of the curated survivors —
pairing speculative generation with a Lean-4 overlay that can CERTIFY
the rare valuable relationship's shape/rule is genuinely unexplored
(co-scientist validates in wet labs / Elo, not by formal proof).
Hard problems:
EVALUATION / GROUND TRUTH: a 'relationship no human ever thought of'
has no label set; you cannot measure precision/recall on serendipity.
The field's only proxies are Elo idea-tournaments (co-scientist),
human-rated helpfulness (cross-domain analogy work, median 4/5), and
downstream wet-lab/empirical validation — all expensive, slow, and not
applicable to most of donto's domains (genealogy, etc.).
DIVERSITY-VS-REDUNDANCY (the founder's own central worry):
same-model-under-N-prompts collapses to ~rank-2 'diversity' (cosine
0.888, effective rank 2.17/3); naive N-lens decomposition will produce
mostly redundant facts unless semantic diversity is actively measured
and enforced (model/tool/temperature variation per lens), per
bias-variance-diversity theory.
COMBINATORIAL EXPLOSION at the intersection: relationships at the
intersection of L lenses over E entities scale ~L^2 × E^2 candidate
pairs; finding the rare valuable few is a needle-in-haystack
ranking/pruning problem — co-scientist needs heavy test-time compute and
an Elo tournament just for one problem, let alone a 39M-statement global
sweep.
NOISE & PLAUSIBLE-NONSENSE: LLMs readily generate confident,
fluent, FALSE cross-domain connections (hallucinated analogies/links);
without per-lens competence gating and evidence-anchoring, the substrate
fills with seductive noise that is costly to refute.
CONFORMITY / COLLAPSE in any debate-to-consensus step: agents flip
correct→incorrect under peer pressure and the majority tyrannizes
minorities — so the very mechanism used to 'combine' lenses can destroy
the divergent signal that creates value (the reason donto should hold,
not collapse).
EMERGENCE CAPABILITY FLOOR & COST: cognitive synergy only
appears at frontier-model capability (SPP: GPT-4 yes, GPT-3.5/Llama2
no); running N strong-model lenses over millions of entities is
economically heavy, and weaker lenses actively contaminate the pool
rather than diversify it.
CURATION / TRUST: deciding WHICH of millions of held speculative
relationships to promote toward 'verified' is an open human-in-the-loop
+ provenance + argument-evaluation problem; paraconsistency keeps
everything alive but defers (does not solve) the question of what to
actually believe.
serendipity-novelty-evaluation
This field exists to answer the donto founder's make-or-break
question directly: when a machine proposes a vast number of novel
relationships, how do you tell a profound connection from pareidolia?
Three research traditions converge on it, and all three have already
discovered the same hard truth.
(1) Computational serendipity in recommender systems
is the most mature. The field's consensus decomposition (Kotkov, Wang
& Veijalainen 2016 survey; Murakami 2008; Ge, Delgado-Battenfeld
& Jannach 2010; Adamopoulos & Tuzhilin 2014) is that serendipity
= relevant AND novel AND unexpected/surprising, where each component is
operationalized separately. The standard trick for
unexpectedness is the "primitive prediction model"
(Murakami/Ge): a recommendation is unexpected iff it would NOT have been
produced by an obvious baseline — Runexp = R \ PM(u). Serendipity score
SRDP then multiplies unexpectedness by usefulness (relevance/rating).
Adamopoulos & Tuzhilin formalize unexpectedness as distance from
a set of expectations E (items the user/system already takes for
granted), explicitly separating it from novelty (unknown) and diversity
(intra-list dissimilarity). The crucial, sobering lesson from this
tradition (Kotkov et al., "The Dark Matter of Serendipity," CHIIR 2024):
serendipity is fundamentally a subjective, experienced event,
yet ~all systems measure only afforded/observable serendipity
via objective proxies — so the metrics are biased and most genuinely
serendipitous hits are invisible to them. There is no clean offline
ground truth for "valuable surprise."
(2) Surprise as a formal quantity. Itti &
Baldi's Bayesian Surprise (NIPS 2006 / Vision Research 2009) is the
canonical operational definition: surprise = KL divergence between an
observer's PRIOR and POSTERIOR beliefs after seeing data,
D_KL(posterior‖prior). It is provably distinct from Shannon
information/rarity (a rare-but-belief-irrelevant event has high Shannon
surprisal but zero Bayesian surprise). Empirically it is "the strongest
known attractor of human attention" (~72–84% of gaze shifts go to
above-average-surprise locations). This has been ported to recommenders
(Kim et al., "Topic-Level Bayesian Surprise and Serendipity," RecSys
2023) by tracking KL divergence between a user's prior and posterior
topic distributions. Bayesian surprise is the most principled,
substrate-friendly metric available for the lens engine: it is exactly
"how much does this relationship change the model's beliefs."
(3) Literature-Based Discovery (LBD) — Swanson's
1986 Raynaud's/fish-oil and migraine/magnesium discoveries via the ABC
model (A relates to B, B relates to C, A↔︎C unknown → hypothesize A–C) —
is the closest historical analog to "relationships no human drew because
no one held all the lenses." Critically, LBD has spent 30 years
grappling with exactly the founder's evaluation problem and has NOT
solved it. Two evaluation regimes exist, both flawed:
replication (rediscover Swanson's 2–3 known cases —
cherry-picked, no statistical power) and time-slicing
(Yetisgen-Yildiz & Pratt 2009: pick cutoff year t, treat post-t
co-occurrences of A–C absent before t as "discoveries," compute
precision/recall/F/AUC/MAP/MRR). Sebastian/Moreau (Bioinformatics 2023,
"addressing the subpar evaluation methodology") shows time-slicing is
"too noisy": the gold standard is dominated by meaningless
co-occurrences (Ebolavirus + Professional Burnout), the true-discovery
fraction is "unknown and likely low," so the metric rewards
co-occurrence prediction, not insight. There is no agreed
benchmark, no shared task, no formal definition of "a discovery."
The unifying finding across all three traditions, plus computational
creativity (Boden's new-surprising-valuable; Ritchie's
novelty/quality/typicality; Lamb et al.'s 2019 survey of evaluation
methods — CAT/Amabile, Colton's tripod, Jordanous's SPECS/components)
and modern LLM-idea studies (Si, Yang & Hashimoto 2024 — 100+
reviewers found LLM ideas MORE novel but LESS feasible/valid; TruthHypo
2025 — explicit novelty↔︎validity tradeoff, high hallucination):
novelty is cheap and mechanizable; value is expensive and
resists automation. Generation is solved; discrimination is
not. At scale this collides with the statistics of multiple comparisons
/ false discovery rate: an engine that proposes millions of cross-lens
links is running millions of implicit hypothesis tests, so the EXPECTED
number of spurious-but-surprising connections is enormous (apophenia by
construction). Without FDR control, calibration, or downstream
validation, "a connection no human ever drew" and "a connection no human
ever drew because it's noise" are indistinguishable.
Foundational works:
Bayesian Surprise (D_KL posterior‖prior) — Laurent
Itti & Pierre Baldi (2006/2009): The only axiomatically-consistent
definition of surprise: how much an observation moves beliefs (KL
divergence prior→posterior), provably distinct from Shannon rarity. A
belief-relative, model-internal surprise metric — exactly what a lens
engine needs to score 'this relationship changes what the substrate
believed.' http://ilab.usc.edu/publications/doc/Itti_Baldi06nips.pdf
Literature-Based Discovery & the ABC model —
Don R. Swanson (1986-1988): Hidden A–C relationships found via shared
intermediary B that no single human saw because the literatures were
disjoint (Raynaud–fish oil). The direct historical precedent for
cross-lens relationship discovery — and a 30-year warning that
evaluation, not generation, is the bottleneck. https://en.wikipedia.org/wiki/Literature-based_discovery
A Survey of Serendipity in Recommender Systems —
Denis Kotkov, Shuaiqiang Wang, Jari Veijalainen (2016): Canonical
decomposition: serendipity = relevant AND novel AND unexpected; each
component formalized separately. Establishes that you must measure
relevance/novelty/unexpectedness/value as distinct axes, not collapse
them into one 'interestingness' score. https://www.sciencedirect.com/science/article/abs/pii/S0950705116302763
Unexpectedness via a 'primitive prediction model' +
serendipity score SRDP — Tomoko Murakami et al.; Mouzhi Ge,
Carla Delgado-Battenfeld, Dietmar Jannach (2008 / 2010): Operational
unexpectedness = items NOT produced by an obvious baseline (Runexp = R \
PM(u)); SRDP = unexpectedness × usefulness. Directly portable: score a
donto relationship as surprising iff a cheap heuristic (co-mention,
embedding similarity, single-lens inference) would NOT have produced it.
https://link.springer.com/chapter/10.1007/978-3-540-78197-4_5
On Unexpectedness in Recommender Systems
(distance-from-expectations) — Panagiotis Adamopoulos &
Alexander Tuzhilin (2014): Formalizes unexpectedness as distance from a
set of expectations E (the already-taken-for-granted), cleanly
separating it from novelty and diversity, and combines it with utility.
Gives the lens engine a principled 'expectation set' to measure surprise
against. https://dl.acm.org/doi/pdf/10.1145/2559952
Serendipitous Information Retrieval (mechanisms to engineer
serendipity) — Elaine G. Toms (2000): Four mechanisms to
provoke serendipity: blind chance (random node), the Pasteur 'prepared
mind' (user/context profile), anomalies via deliberately POOR
similarity, and reasoning by analogy. The 'poor similarity' and
'analogy' mechanisms are the design DNA of a cross-lens discovery
engine. https://www.ercim.eu/publication/ws-proceedings/DelNoe01/3_Toms.pdf
Creativity = new, surprising, valuable;
combinational/exploratory/transformational — Margaret Boden
(formalized by Geraint Wiggins; Graeme Ritchie's criteria) (1990-2007):
Boden's three criteria are the evaluation rubric the whole field reuses;
combinational creativity (novel combination of familiar ideas) is
precisely 'relationships at the intersection of lenses.' Ritchie
reframes value as novelty × quality × typicality — surprise alone is not
enough. https://en.wikipedia.org/wiki/Computational_creativity
Time-sliced evaluation of LBD (gold-standard via future
co-occurrence) — Meliha Yetisgen-Yildiz & Wanda Pratt
(2009): The standard retrospective-rediscovery protocol: cutoff year t,
post-t-but-not-pre-t links are 'discoveries,' score with
precision/recall/F. The best objective evaluation method that exists —
and the paper that exposes why it is still inadequate (most gold links
are noise). https://faculty.washington.edu/melihay/publications/LBDChapter2009.pdf
False Discovery Rate / multiple-comparisons control
— Yoav Benjamini & Yosef Hochberg (1995): When you test millions of
relationships, the expected count of spurious 'significant' findings is
huge; BH-style FDR control bounds the expected proportion of false
positives among accepted discoveries. The mathematically inescapable
governor on any high-volume relationship-discovery engine. https://en.wikipedia.org/wiki/False_discovery_rate
Modern AI systems:
SciAgents — Multi-agent
(Ontologist/Scientist/Critic) system that samples RANDOM paths between
distant concepts in a 33K-node ontological knowledge graph and reasons
over the path to propose hypotheses; explicitly uses random (not
shortest) paths to maximize cross-domain surprise; a 'novelty assistant'
agent scores novelty/feasibility via Semantic Scholar lookups (e.g.
8/7). [Ghafarollahi & Buehler, MIT; arXiv Sep 2024, published
Advanced Materials Dec 2024. Generated novel bio-inspired materials
hypotheses. NOTE: authors admit NO systematic ranking across hypotheses
is implemented — filtering 'remains future work.' The single most
architecturally-aligned system to the donto vision.]https://pmc.ncbi.nlm.nih.gov/articles/PMC12138853/
Robin (FutureHouse) — End-to-end multi-agent
discovery: literature synthesis → hypothesis generation → experimental
data analysis → refinement, with downstream wet-lab validation as the
evaluation gate. [Identified ripasudil as a novel candidate for dry
AMD; experimentally validated in patient-derived RPE cells; ~2.5 months
end-to-end. Demonstrates the only fully convincing answer to the
precision problem: downstream empirical validation, not a metric.]https://www.futurehouse.org/research-announcements/demonstrating-end-to-end-scientific-discovery-with-robin-a-multi-agent-system
The AI Scientist (Sakana) — Fully automated
research loop (ideate → experiment → write → review) including an
automated LLM REVIEWER that scores generated papers. [Automated
reviewer hit ~69% balanced accuracy / F1 exceeding NeurIPS-2021
inter-human agreement — but independent eval (arXiv 2502.14297) found
mixed/overstated quality. Evidence that LLM-as-judge can approximate
human filtering for value, with caveats.]https://arxiv.org/abs/2408.06292
Si–Yang–Hashimoto LLM research-ideation study —
Large-scale blind human evaluation (100+ NLP researchers) comparing
LLM-generated vs human research ideas on
novelty/excitement/feasibility/effectiveness. [LLM ideas judged
statistically MORE novel (p<0.05) but slightly LESS feasible; found
LLM self-evaluation fails and generation lacks diversity. The cleanest
empirical proof of the novelty-cheap / value-hard asymmetry.]https://arxiv.org/abs/2409.04109
TruthHypo / hallucination-aware hypothesis
evaluation — Benchmark separating novelty from
truthfulness/grounding in LLM-generated scientific hypotheses,
validating against PubMed. [Documents an explicit novelty↔︎validity
tradeoff: more creative outputs correlate with higher hallucination; a
substantial fraction of generated hypotheses are invalidated against the
literature. Quantifies the 'most generated links are noise' problem for
LLMs.]https://arxiv.org/pdf/2505.14599
Topic-Level Bayesian Surprise for Recommenders —
Applies Itti-Baldi Bayesian surprise (KL divergence between prior and
posterior topic distributions) as the serendipity signal in a
recommender, balanced against a relevance term. [RecSys-era work
showing Bayesian surprise outperforms pure-novelty and pure-relevance
baselines at finding unexpected-yet-valuable items. Demonstrates the
surprise-vs-relevance balancing the lens engine will need.]https://arxiv.org/pdf/2308.06368
Conformal abstention / LLM-as-judge + HITL triage —
Methods to filter generated outputs by confidence: conformal-prediction
abstention with theoretical hallucination-rate guarantees;
self-consistency as a confidence proxy; human-on-the-loop triage of only
high-risk/high-value items. [Conformal abstention (arXiv 2405.01563)
gives provable bounds on accepted-hallucination rate;
KG-validation-with-HITL (IPM 2025) shows hybrid pipelines where
discarded links become negative training examples. The practical toolkit
for 'find the rare gold.']https://arxiv.org/pdf/2405.01563
Relevance to the lens engine: BORROW: (1) The
decomposition discipline — never score a relationship with one number.
Carry relevance, novelty (unknown-ness), unexpectedness (distance from
an expectation set E), and value as SEPARATE axes
(Kotkov/Adamopoulos-Tuzhilin). donto's bitemporal + evidence-first
design already lets you compute novelty cheaply (is this triple absent
from the substrate?) and unexpectedness via the 'primitive prediction
model' trick (Murakami/Ge): a cross-lens link is surprising iff a cheap
single-lens baseline would NOT have produced it — flag exactly the links
that survive that subtraction. (2) Bayesian surprise (Itti-Baldi) is the
ideal native scorer: D_KL between the substrate's belief BEFORE and
AFTER admitting a hypothesis edge measures 'how much does this
relationship change what donto believes' — belief-relative, not mere
rarity, and it composes naturally with paraconsistency (a
contradiction-inducing edge is maximally surprising). (3) SciAgents is
the proof-of-concept of your exact mechanism — random paths between
distant nodes to manufacture cross-domain surprise — so adopt its agent
topology (Ontologist→Scientist→Critic) but FIX its admitted gap: it has
no cross-hypothesis ranking/filtering. (4) Robin is the north star for
the verification end: the only unambiguous serendipity metric is
downstream validation; design donto's 'rare valuable' curation tier
around external grounding (the evidence-anchor-to-source-byte and Lean-4
certification overlays are precisely the right substrate for this). (5)
Toms' 'poor similarity' and 'analogy' mechanisms and Boden's
combinational creativity legitimize the intersection-of-lenses thesis
intellectually. AVOID: (1) Treating volume as the goal — every tradition
shows novelty is cheap and the bottleneck is value-discrimination; a
million unanchored links is the disease, not the cure. Hold them as
hypothesis_only (donto already does this) but never surface them
un-triaged. (2) Believing offline metrics certify value — time-sliced
LBD evaluation rewards co-occurrence prediction, not insight; the 'dark
matter' critique shows objective serendipity proxies miss most real
serendipity. Treat any automatic 'interestingness' score as a triage
filter, never a verdict. (3) Ignoring multiple comparisons — at your
scale FDR is not optional; an engine proposing millions of links
manufactures apophenia by construction. Make the false-discovery budget
an explicit, tunable parameter and recycle rejected links as negatives
(HITL-KG pattern). (4) LLM self-evaluation as the final gate — Si et al.
and TruthHypo show models over-rate their own novel-but-invalid
outputs.
Already done vs white space: ALREADY DONE (do not
reinvent): (a) The conceptual decomposition of serendipity into
relevance/novelty/unexpectedness/value, with formal metrics for each
(Kotkov, Murakami, Ge, Adamopoulos-Tuzhilin). (b) A principled,
belief-relative surprise metric (Itti-Baldi Bayesian surprise) and its
recommender port. (c) The exact generative mechanism the founder
describes — sampling random paths between distant concepts in a
knowledge graph to surface 'connections no one drew' — is implemented
and published (SciAgents). (d) Retrospective time-sliced evaluation of
discovery systems (Yetisgen-Yildiz & Pratt) and the
multiple-comparisons/FDR machinery. (e) End-to-end agentic discovery
with real wet-lab validation (Robin) and large human studies of AI-idea
novelty (Si et al.). So 'agents that decompose and propose cross-domain
links and score their novelty' is NOT white space; it is a crowded,
~40-year lineage (LBD) plus a 2024-2026 agentic wave. GENUINE WHITE
SPACE — the defensible combination: (1) MANY LENSES SIMULTANEOUSLY AS
THE GENERATIVE SUBSTRATE. Every prior system uses ONE representation (a
citation graph, one ontology, topic vectors). Nobody systematically
decomposes each entity through the full spectrum of analytical lenses
(mereological, teleological, semiotic, phenomenological, ethical,
ecological...) and then mines the INTERSECTIONS across lenses for
relationships. The lens-cross-product as the search space is novel. (2)
A PARACONSISTENT, CONTRADICTION-PRESERVING HOLDING TANK. Every prior
system must commit, prune, or collapse contradictory hypotheses; donto
can legally hold mutually-contradictory machine-proposed relationships
forever as hypothesis_only with typed supports/rebuts/undercuts argument
edges and a contradiction frontier. This dissolves the field's worst
constraint: you no longer must decide value at generation time — you can
accumulate speculative links and let evidence/curation arrive
asynchronously. No serendipity, LBD, or creativity system has this. (3)
IDENTITY-AS-HYPOTHESIS + EVIDENCE-ANCHORING + LEAN-CERTIFICATION as the
verification pipeline that the entire field is MISSING (it is the
unsolved 'value' problem). The white space is not generating
relationships — it is the principled architecture for HOLDING millions
speculatively and VERIFYING the rare valuable ones with byte-level
evidence and machine-checkable proof. That triage/verification layer
over a many-lens generator is unexplored.
Hard problems:
No ground truth for 'valuable surprise.' Value is subjective,
experienced, and context/time-dependent (Kotkov 'Dark Matter'); offline
proxies measure afforded not experienced serendipity. You cannot certify
profundity with a metric — only triage with one.
The precision/pareidolia problem at scale. A many-lens cross-product
generates a combinatorial explosion of candidate links;
multiple-comparisons statistics guarantee a huge expected count of
spurious-but-surprising connections. Distinguishing a profound link from
apophenia requires FDR control AND external grounding that automatic
surprise scores cannot provide.
Novelty↔︎validity tradeoff. Maximizing surprise/novelty mechanically
increases hallucination and invalidity (TruthHypo; Si et al. found LLM
ideas more novel but less feasible). The most surprising link is
disproportionately likely to be wrong.
Surprise vs. relevance/value disentanglement. Bayesian surprise
rewards ANY belief-shifting observation, including errors and noise — a
maximally surprising edge may be maximally wrong. You must combine
surprise with a value/grounding term, and there is no consensus
weighting.
Evaluation methodology itself is unsolved. LBD's two regimes
(replication = no statistical power; time-slicing = rewards
co-occurrence prediction not insight) are both inadequate; there is no
shared benchmark, no formal definition of 'a discovery,' no community
standard (Sebastian/Moreau 2023).
LLM self-evaluation is unreliable. Models over-rate their own
novel-but-invalid outputs (Si et al.; AI Scientist critiques), so the
cheap automated critic cannot be the final gate; some human or empirical
validation is unavoidable, which caps throughput.
Defining and bounding the 'expectation set' E. Unexpectedness needs
a reference model of what is already expected (Adamopoulos-Tuzhilin);
for an open many-lens substrate this set is enormous and ill-defined,
making 'unexpected' hard to compute consistently.
The combinatorics of lenses. The number of entity×lens×entity×lens
intersections grows explosively; you need a principled
sampling/prioritization strategy (SciAgents uses random paths but admits
no ranking) or the engine drowns in its own output before any
triage.