# The Lens Engine — Research Appendix (raw findings)
_Companion to the lens-engine report. Structured output of the 10-area study + 4 adversarial critiques (2026-06-01)._

---

## Critiques (adversarial)

### PARTIALLY-HOLDS (confidence 0.72)
**Claim:** Agentic MANY-LENS decomposition (philosophical/linguistic/temporal/causal/ethical/...) discovers genuinely NOVEL and VALUABLE inter-entity relationships that existing methods (KG embedding link-prediction, literature-based discovery, analogy mining) do NOT already find.

- **Strongest support:** The specific COMBINATION is genuine white space, even though every component is old. (1) "Many lenses" is faceted classification (Ranganathan/Bliss) + multi-viewpoint ontologies; "hold contradictory perspectives in contexts" is Cyc microtheories; "discover hidden cross-domain links" is Swanson LBD (1986) and Hope/Kittur/Shahaf analogy mining; "agentic ontology-KG hypothesis generation revealing hidden interdisciplinary relationships" is SciAgents (Buehler/MIT 2024). So the discovery idea is well-trodden. BUT none of the discovery systems persists its speculative output as durable, contradiction-preserving, evidence-anchored legal state. SciAgents (arxiv 2409.05556) generates and discards into prose; LBD/analogy engines rank candidates but do not hold them as defeasible typed argument edges (supports/rebuts/undercuts) forever; Cyc had the contradiction-tolerant store (microtheories) but no agents and famously collapsed under manual curation load. donto is a real, running bitemporal paraconsistent quad store (~39.5M stmts; tx_time, overlays, contradiction frontier, evidence_links, multi-aperture extraction with the 6 named lenses all present in the live codebase at /mnt/donto-data/workspace/donto-memory). The novel and defensible thesis is the PIPELINE: generate astronomically many machine-proposed relations across lenses, HOLD them without forcing consistency, evidence-anchor each, and verify/curate (Lean overlay) the rare valuable ones. That generate-hold-verify loop on a paraconsistent substrate is not something the cited prior art does.
- **Strongest counterargument:** The claim's load-bearing words are "novel AND valuable," and the best evidence says many-lens agentic generation buys NOVELTY at the direct expense of VALUE. Si & Hashimoto's Ideation-Execution Gap (arXiv 2506.20803, 2025): LLM ideas were rated significantly MORE novel than expert ideas (5.78 vs 4.91), but after 43 experts each spent 100+ hours executing them, LLM scores collapsed (overall -1.976, effectiveness -1.879, novelty -1.049) while human ideas barely moved, converging both to ~4.7-4.9 — i.e., the apparent novelty advantage was an evaluation artifact, not realized value. Biomedical hypothesis-generation work (arXiv 2505.14599) finds LLMs produce high false-positive rates; precision only comes at heavy recall cost. This is the multiple-comparisons / spurious-correlation deluge (Calude & Longo): cross any N lenses over M entities and the count of "connections no human ever drew" explodes combinatorially — almost all of which are noise, and "no one thought to draw this" is the EXPECTED signature of a spurious link, not of a discovery. Sutton's Bitter Lesson cuts deeper: a hand-authored taxonomy of "philosophical/linguistic/teleological/semiotic..." lenses is exactly the engineered human structure that scaled, end-to-end learned representations have repeatedly outclassed; the lenses may be scaffolding a frontier model already internalizes. And SciAgents (2024) already shipped the headline claim ("reveals hidden interdisciplinary relationships previously considered unrelated, surpassing human-driven research") — so the differentiator is reduced to the substrate, which speaks to HOLDING and CURATING claims, not to whether the discoveries are novel-and-valuable. The substrate makes you better at storing and triaging a firehose of mostly-false machine guesses; it does not raise their base rate of being true or useful.
- **What must be true:** For the claim to hold rather than merely partially-hold, the founder must demonstrate, with adversarial evaluation, that: (1) PRECISION/VALUE, not just rated novelty — a measurable, defensible hit-rate of lens-intersection relationships that survive independent verification (Lean certification, a held-out source byte, or domain-expert/experimental confirmation), at a rate beating a strong baseline of (a) a single frontier LLM prompted directly for cross-domain links and (b) KG-embedding link-prediction + LBD/analogy mining on the same corpus. Beating the bitter-lesson baseline (a) is the crux; if a plain large model finds the same valuable links without the lens taxonomy, the lenses are dead weight. (2) The many-lens decomposition adds INCREMENTAL discoveries that the single-pass baseline misses — i.e., value lives specifically at lens INTERSECTIONS, shown by ablation (drop lenses, value drops). (3) A working triage/curation mechanism keeps verification cost sublinear as generated relations explode, so the spurious-deluge / multiple-comparisons problem is controlled (FDR-style discipline, not eyeballing) — otherwise the paraconsistent store just becomes a landfill of unfalsifiable speculation. (4) "Valuable" is operationalized against real downstream users (the genealogy/legal/medical consumers), not self-rated novelty. Conditions under which it FAILS: if value tracks rated-novelty (the ideation-execution gap reproduces); if a vanilla frontier model matches it (bitter lesson); if false-positive rate makes curation cost scale with generation (spurious deluge); or if the substrate's contradiction-preservation degenerates into hoarding noise nobody ever falsifies.
- **Evidence:** https://arxiv.org/abs/2506.20803; https://arxiv.org/html/2506.20803v1; https://arxiv.org/abs/2409.05556; https://en.wikipedia.org/wiki/Bitter_lesson; https://www.di.ens.fr/users/longo/files/BigData-Calude-LongoAug21.pdf; https://arxiv.org/html/2505.14599v1; https://www.kdd.org/kdd2017/papers/view/accelerating-innovation-through-analogy-mining; https://arxiv.org/pdf/1812.06974

### PARTIALLY-HOLDS (confidence 0.72)
**Claim:** "The discovery signal is real and not fatally drowned by combinatorial noise — there is a workable precision/ranking story (novelty × plausibility × value) that surfaces the rare gold rather than producing a hallucinated mess."

- **Strongest support:** The signal is real and a working precision/ranking pipeline already exists. Google's AI Co-Scientist (Feb 2025) produced hypotheses later confirmed in independent wet-lab work in three biomedical areas (AML drug-repurposing inhibiting tumor viability at clinical concentrations; an independently-validated cf-PICI phage-tail host-range mechanism; epigenetic liver-fibrosis targets confirmed in human hepatic organoids). Its precision mechanism IS the claim's novelty × plausibility × value formula instantiated: self-play debate → Elo tournament ranking (Elo empirically correlated with correctness) → evolution → external confirmation. Separately, unsupervised gene-disease embeddings predicted associations ~10 years pre-publication (IFIH1–Aicardi-Goutieres scored 0.925 by a 2004 model with zero corpus co-occurrence), proving latent cross-domain relationships no human had drawn are computationally extractable. donto's paraconsistent, evidence-anchored, hypothesis-as-state substrate is a genuinely under-explored and apt fit for the GENERATE-and-HOLD half of the pipeline (production paraconsistent KGs at 39M statements are rare; the literature is mostly theory/prototypes).
- **Strongest counterargument:** The base-rate/combinatorial problem is the killer, and 40 years of Literature-Based Discovery — the founder's exact idea, minus the LLMs — is the cautionary precedent. The 2023 Bioinformatics critique of LBD evaluation (pmc.ncbi.nlm.nih.gov/articles/PMC9945845) shows that against large candidate sets "very few co-occurrences represent a true discovery and the vast majority are meaningless," the true-discovery proportion is "unknown and likely low," and Kostoff (2007, pubmed 17616484) found that several canonical LBD "discoveries" were not in fact discoveries. After four decades the validated trophy case is still essentially Swanson's two hand-curated findings (fish-oil/Raynaud's, magnesium/migraine) — found by a careful human reasoner, NOT by combinatorial enumeration. The "many lenses → intersection" move makes this strictly WORSE, not better: it multiplies the candidate space, and multiple-testing math is brutal (test enough pairs and spurious "significant" hits approach certainty; FDR can even blow up under the heavy feature dependence that lensed decompositions create). The novelty/precision tradeoff is empirically adverse, not neutral: in serendipity recommenders, novelty (N-nDCG) and accuracy (nDCG) are directly negatively correlated — you cannot crank novelty without paying in precision. The "embeddings already capture this" objection bites hard: the gene-disease result was achieved with plain gradient-boosting on concatenated embeddings — NO symbolic lenses, NO paraconsistency — implying the bitter lesson applies (geometry surfaces latent structure cheaper than hand-built analytical scaffolding). And every cited authority warns that LLM self-validation does NOT certify truth ("even unanimous agreement among SOTA LLM critics does not guarantee scientific accuracy"; arxiv.org/html/2504.05496v1) — so without an external truth signal the ranking degenerates into ranking the model's own plausibility prior, i.e., confident hallucination. donto's Lean-4 overlay only certifies shape/consistency, not empirical truth, so it cannot supply the missing external signal. Finally, "no one has thought to do this" is false: LBD, conceptual blending/bisociation (Koestler/Boden), serendipity engines, KG link prediction, and LLM co-scientists have all worked the pieces.
- **What must be true:** For the claim to hold at scale, ALL of the following must be true: (1) Ranking must be tied to an EXTERNAL truth/value signal — wet-lab assay, market test, expert adjudication, citation/realization outcome — not LLM self-critique alone; the AI Co-Scientist only works because Elo PRIORITIZES and an external experiment CONFIRMS. (2) The domain must have a cheap, fast verifier or a high enough base rate of real relationships that a final-stage filter can achieve usable precision; in verifier-poor or low-base-rate domains the multiple-testing math wins and you get a hallucinated mess. (3) "Many lenses" must be used to ADD evidence/constraints that raise plausibility scoring precision (context-based ABC LBD lifted precision from 27% to ~89-97% by adding biological context), NOT merely to enumerate more candidate pairs — i.e., lenses must be a filtering prior, not a generator of combinatorial volume. (4) The economics must clear: cost-per-candidate-evaluated × candidates × (1/precision) must be less than the value of the rare gold found; at low precision and high fan-out this is the silent business killer. (5) The system must beat the cheap baseline — unsupervised embedding link-prediction over the same corpus — by enough to justify the agentic many-lens overhead; if plain embedding geometry surfaces the same latent relations (as the gene-disease result suggests), the bitter lesson refutes the elaborate symbolic pipeline. (6) novelty must be scored jointly with plausibility AND value, accepting the empirically-real novelty/precision tradeoff, with a human or experiment as the last-mile gate. Where these hold (grounded science with assays, like AI Co-Scientist) the claim holds; where they fail (open-domain "all entities × all lenses" with only LLM self-validation and no external verifier) it is refuted by the LBD track record and multiple-testing statistics.
- **Evidence:** https://pmc.ncbi.nlm.nih.gov/articles/PMC9945845/; https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/; https://pmc.ncbi.nlm.nih.gov/articles/PMC11957060/; https://arxiv.org/html/2504.05496v1; https://pmc.ncbi.nlm.nih.gov/articles/PMC6481912/; https://pubmed.ncbi.nlm.nih.gov/17616484/; https://en.wikipedia.org/wiki/Bitter_lesson; https://encyclopedia.arabpsychology.com/data-snooping/

### PARTIALLY-HOLDS (confidence 0.72)
**Claim:** donto's paraconsistent + evidence-anchored + bitemporal + identity-as-hypothesis substrate is genuinely the RIGHT home for machine-generated relationship-hypotheses — materially better than a vector DB + reranker — because it can hold contradictory speculative edges, anchor them to evidence, and certify/verify the survivors.

- **Strongest support:** The claim is architecturally sound and rests on a deep, legitimate intellectual lineage that the "vector DB + reranker" frame genuinely cannot match on its stated job. (1) The function the substrate must serve — hold many speculative, mutually-contradictory relationship edges WITHOUT collapsing, keep each pinned to its source, and let identity/merge be a query-time weighted hypothesis rather than a hard key — is exactly what a vector DB + reranker is bad at. An ANN index over embeddings stores points and returns nearest neighbors by cosine similarity; it has no native notion of a typed edge, no first-class "this claim attacks that claim," no provenance-on-the-assertion, and no way to assert two contradictory things as co-equal legal state. You would have to bolt all of that on in a metadata store anyway — at which point you have rebuilt a graph/claim store. KG-embedding research itself documents that embeddings learn composition implicitly and "cannot offer logical inference paths as support evidence," and that high-performing KGE models give divergent, unstable triple-level predictions (link.springer.com/chapter/10.1007/978-3-032-25156-5_11; madoc.bib.uni-mannheim.de/66365/1/TGDK.1.1.4.pdf). (2) Every component is independently validated, production-proven prior art that donto correctly composes rather than invents: paraconsistent belief revision / LFI (arxiv.org/html/2412.06117) for non-explosive contradiction; bipolar/weighted argumentation frameworks (Dung, Toulmin/Pollock supports-rebuts-undercuts) for typed argument edges (dl.acm.org/doi/abs/10.1613/jair.1.12394); nanopublications, which are LITERALLY assertion+provenance+publication-info RDF and have a 2024+ line of work using provenance to detect and explain CONTRADICTORY research claims (ieeexplore.ieee.org/document/9582393; link.springer.com/article/10.1007/s00799-025-00431-x); RDF-star / named graphs / reification for evidence-on-statements in production stores (ontotext.com/knowledgehub/fundamentals/what-is-rdf-star); Cyc microtheories for context-relative, prima-facie-contradictory truth (en.wikipedia.org/wiki/Cyc); and Standpoint Logic, which is purpose-built so "multiple viewpoints [can] be integrated into the same ontology, even when certain viewpoints may hold contradicting beliefs" (researchgate.net/publication/357533078) — a near-exact formalization of donto's "lens at query time." (3) The validation half is also real and converging: POPPER / agentic sequential falsification (openreview.net/forum?id=iTevNo8PzG) shows automated, statistically-controlled hypothesis falsification is now feasible and matches human experts 10x faster — meaning donto's "certify/verify the survivors" step (Lean-4 overlay + falsification) is not fantasy. I verified the substrate is real and at the claimed scale: 39,496,776 current statements, genuinely bitemporal (tx_time, valid_time as Postgres ranges), context column functioning as named graphs. So as a STORAGE-AND-CURATION substrate for contradictory, evidence-anchored, identity-fluid claims, donto is materially better-suited than a vector DB + reranker, and the combination is defensible.
- **Strongest counterargument:** The claim quietly conflates a CAPABILITY (the schema CAN hold these things) with a REALITY (it DOES, at scale, in a way that beats the alternative on a job that matters), and on the reality the evidence is damning. (1) The substrate is almost empty of the very features the claim leans on. I grep-counted the 39.5M live statements for any supports/rebuts/undercuts/hypothesis/evidence/contradiction/sameAs predicate: the query returned NOTHING. The store is ~96% genealogy facts (ctx:genealogy/research-db alone is 21.8M); the argument-edges, contradiction-frontier, and evidence-anchors are designed affordances, not populated, exercised state. The schema itself is a plain quad+bitemporal table (statement_id, subject, predicate, object_iri, object_lit, context, tx_time, valid_time, flags, content_hash) — the paraconsistent/argument machinery lives in higher layers and conventions, not as load-bearing, query-optimized columns. The user's own memory corroborates: "only 3 of ~80 Caroline-line kinship triples have evidence_links, all in test contexts" and "donto-pg ex:kitty is a junk-drawer URI." So the differentiator vs. a vector DB is, today, largely aspirational. (2) The whole value chain is gated by a step the substrate does NOT improve and may worsen: validation. The combinatorial-explosion / multiple-comparisons / data-dredging literature is unanimous that any sufficiently large cross-product of entities × lenses yields overwhelmingly spurious "relationships" (en.wikipedia.org/wiki/Data_dredging; tylervigen.com/spurious-correlations), and LLM extraction is "prone to hallucination and produce[s] hypotheses in volumes that make manual validation impractical" (emergentmind.com/topics/automated-hypothesis-generation). donto's headline result — "697 facts from 'cat is red'", "483 facts from one sentence" — is precision-blind over-extraction, exactly the failure mode the KG-construction literature warns about (arxiv.org/pdf/2508.03438). Holding a billion contradictory speculative edges "forever as legal state" without collapsing is not obviously a feature; it is deferred cost — it converts a precision problem into a curation/ranking problem that the substrate makes no easier, while paying graph-write and storage tax that a vector store would not. (3) The bitter lesson + "embeddings already capture this" cut hard: cross-domain analogy / latent-connection discovery — the actual payoff the founder wants — is already demonstrably done by LLMs over learned representations (arxiv.org/pdf/2302.12832; arxiv.org/pdf/2211.15268), with NO hand-specified lens taxonomy and NO symbolic substrate. An agent can be prompted "find a non-obvious connection between X and Y" and traverse semantic space directly; the 10 hand-authored lenses (entities/properties/relations/temporal/spatial/provenance/pragmatics/domain/inferential/quantitative — which I confirmed in extraction.py) are precisely the kind of handcrafted human structure Sutton predicts gets out-scaled. And the most direct historical precedent — Cyc, a hand-built, microtheory-based, contradiction-tolerant, context-relative ontology that consumed 2000+ PhD-years — is the canonical cautionary tale for this entire architecture. Finally, the lenses as currently built do NOT do the one thing the vision is about: I read extraction.py — each lens runs INDEPENDENTLY and facts are deduped across passes; there is no cross-lens INTERSECTION / relationship-discovery step at all. The discovery engine the claim presupposes does not yet exist in the code; the substrate is being justified by a generator that hasn't been written.
- **What must be true:** For the claim to hold rather than merely partially-hold, ALL of the following must become true: (1) The differentiating machinery must be POPULATED and load-bearing, not just schema-possible: a non-trivial fraction of relationship edges must actually carry typed argument links (supports/rebuts/undercuts), evidence-anchors to source bytes, and contradiction-frontier membership — and queries must routinely use them. (Today: ~0 such predicates in 39.5M statements.) (2) A real cross-lens RELATIONSHIP-DISCOVERY step must exist and produce edges, not just per-lens fact extraction that is then deduped (the current extraction.py does the latter). The serendipity payoff lives entirely in the intersection step that isn't built. (3) The economics of holding contradictions must beat collapsing them: i.e., there must be queries/workflows where keeping mutually-contradictory speculative edges alive demonstrably yields better answers than a vector-DB-plus-metadata baseline that resolves or scores them — and the genealogy domain (descendant-restricted records, irreducibly conflicting witnesses, legal/native-title stakes where preserving every source-attestation paraconsistently is the actual requirement) is the strongest case where this is plausibly true. (4) The validation/curation funnel must have a working ranker + falsifier (Lean-4 shape-certification + POPPER-style sequential falsification) that turns the over-generated mass into a small set of survivors at acceptable precision — because without it, the substrate just stores more noise more expensively. (5) The defensibility must rest on the COMBINATION-at-scale + the curation funnel + a domain where contradiction-preservation is a hard requirement, NOT on any single component (each of which — paraconsistency, argumentation edges, nanopub provenance, standpoint/microtheory contexts, bitemporality — is well-established prior art the founder should cite and reuse, not claim as net-new). The genuine white space is narrow but real: an AGENTIC, machine-GENERATED, paraconsistent claim store with first-class evidence-anchoring and a formal (Lean) verification overlay, operated at scale on a domain (genealogy/native-title) whose evidentiary structure actually demands contradiction-preservation. If that white space is executed, the claim holds; if the project stays a high-recall fact-dumper on an unused argument schema, it is refuted.
- **Evidence:** https://arxiv.org/html/2412.06117; https://dl.acm.org/doi/abs/10.1613/jair.1.12394; https://ieeexplore.ieee.org/document/9582393/; https://link.springer.com/article/10.1007/s00799-025-00431-x; https://www.ontotext.com/knowledgehub/fundamentals/what-is-rdf-star/; https://en.wikipedia.org/wiki/Cyc; https://www.researchgate.net/publication/357533078_Standpoint_Logic_Multi-Perspective_Knowledge_Representation; https://pmc.ncbi.nlm.nih.gov/articles/PMC5771422/

### PARTIALLY-HOLDS (confidence 0.6)
**Claim:** The 'lens engine' vision is fundamentally an ADVANCE over (not a rebrand of) Swanson literature-based discovery and KG completion — the agentic + many-deep-lenses + hold-and-verify-on-a-paraconsistent-substrate combination is genuine white space.

- **Strongest support:** The substrate-for-hold-and-verify is genuine white space: every comparable system (Swanson LBD, LLM KG-completion, Google's AI co-scientist) generates candidates and then ranks-and-DISCARDS, whereas donto can hold the entire speculative, mutually-contradictory relationship frontier as durable, evidence-anchored, Lean-certifiable legal state and re-curate it as evidence arrives — and I verified these substrate features (Supports/Rebuts/Undercuts edges, hypothesis_only, paraconsistent append-only store, 6-lens sweep) actually exist in the donto-memory source, not just the pitch.
- **Strongest counterargument:** The vision's central claim of novelty ('relationships no one ever thought to draw') is the OLDEST and most thoroughly-defeated promise in the field, and the two hardest critiques bite directly. FIRST, the bitter lesson (Sutton, https://en.wikipedia.org/wiki/Bitter_lesson): hand-engineering a fixed taxonomy of 'human analytical lenses' (philosophical, teleological, semiotic...) and a structured symbolic substrate is exactly the human-knowledge-injection that scaled general methods have repeatedly eclipsed. LLMs already implicitly encode a 'collective world model' across domains in their embedding space (https://arxiv.org/pdf/2501.00226), so the cross-lens connections the engine laboriously materializes as quads are arguably already latent in a frontier model — and asking the model directly ('what surprising connection links X and Y across economic and ecological framings?') may extract them more cheaply than building the substrate. SECOND, and most damaging, is the precision/value problem that has dogged LBD for THREE DECADES: 'Existing LBD methods are prone to proposing spurious discoveries or an abundance of low-quality ones... LBD produces more potential hypotheses than can be manually reviewed' (https://pmc.ncbi.nlm.nih.gov/articles/PMC6694578/). Multiplying lenses MULTIPLIES the combinatorial candidate space, making the spurious-hypothesis flood WORSE, not better. Novel != true != valuable: an engine that emits a billion cross-lens 'relationships no one thought of' has produced noise, not discovery, unless the verify/rank step is extraordinarily good — and that verifier, not the generator, is where all the value and all the unsolved difficulty actually live. THIRD, the competitive frontier is already here and validated: Google's AI co-scientist (Nature 2026, https://www.nature.com/articles/s41586-026-10644-y) already does multi-agent generate/debate/evolve/rank hypothesis discovery with WET-LAB-VALIDATED novel drug-repurposing and liver-fibrosis results — it occupies the agentic-hypothesis-discovery space today without needing a bespoke paraconsistent quad store. So 'no one has thought to do this' is plainly false on the discovery side; what's left unclaimed is only the substrate/persistence-and-curation architecture, which is a narrower (and harder-to-monetize) claim than the founder's framing.
- **What must be true:** The vision holds as an ADVANCE only if ALL of these hold: (1) The bottleneck is the VERIFIER/curator, and donto's paraconsistent hold-and-anchor design is what makes verification tractable at scale — i.e., the value must come from holding millions of speculative edges as durable, evidence-anchored, re-curable state (so verification improves monotonically as evidence arrives) rather than from the lens generation itself. If the generator is the product, it's just AI co-scientist with extra symbolic plumbing. (2) Many DEEP lenses must beat one frontier LLM asked directly — there must be demonstrable lift from explicit decomposition (recall/precision/novelty) over simply prompting a top model for cross-domain connections, otherwise the bitter lesson wins. This is testable and currently UNTESTED in the codebase (the 6-lens sweep today extracts facts WITHIN an entity; it does not yet generate cross-ENTITY relationship hypotheses — the founder's actual payoff is not yet built). (3) A cheap, high-precision ranking/certification layer must exist (the Lean-4 overlay + argument edges + evidence anchoring) that filters the combinatorial flood to the rare valuable few WITHOUT human review of every candidate — the unsolved 30-year LBD problem. (4) There must be a domain where ground-truth verification is cheap and the payoff of a single true cross-lens link is high (drug repurposing, genealogy bridging documents, materials), so the spurious-hypothesis flood is survivable. (5) 'Novelty' must be operationalized as 'novel AND survives evidence-anchored verification,' never raw count of generated edges. If instead the pitch stays 'a million facts / a million connections,' the claim is REFUTED by the bitter lesson and the LBD precision literature simultaneously.
- **Evidence:** https://news.uchicago.edu/story/don-r-swanson-information-science-pioneer-1924-2012; https://pmc.ncbi.nlm.nih.gov/articles/PMC6694578/; https://www.nature.com/articles/s41586-026-10644-y; https://deepmind.google/blog/co-scientist-a-multi-agent-ai-partner-to-accelerate-research/; https://en.wikipedia.org/wiki/Bitter_lesson; https://arxiv.org/pdf/2501.00226; https://en.wikipedia.org/wiki/Conceptual_blending; https://www.lisedunetwork.com/ranganathans-pmest-the-foundation-of-faceted-classification/

---

## Area findings

### literature-based-discovery

Literature-Based Discovery (LBD) is the single most direct intellectual ancestor of the founder's "find connections nobody made" vision, and it is far more developed than most people realize. Its founding insight, Don R. Swanson's 1986 concept of "undiscovered public knowledge," is precisely the founder's premise: knowledge that is logically derivable from the union of two existing bodies of literature, but that no single human ever assembled because no one read both literatures. Swanson formalized this as the ABC syllogism: if literature reports A→B (e.g., Raynaud's disease involves elevated blood viscosity/platelet aggregation) and a *separate, non-co-citing* literature reports B→C (fish oil/eicosapentaenoic acid reduces blood viscosity), then a plausible, untested A→C link (fish oil treats Raynaud's) exists in the "complementary but disjoint" literatures. He published the fish-oil/Raynaud's hypothesis in 1986 and it was clinically confirmed by a 1989 trial (DiGiacomo); his 1988 "Migraine and magnesium: eleven neglected connections" produced 11 indirect links supporting magnesium-deficiency→migraine, later clinically supported. These are existence-proofs that the method generates real, non-obvious, testable discoveries from text alone.

The field splits discovery into two modes that map cleanly onto the founder's two use-cases. **Open discovery** ("serendipity mode"): start from A, find all B intermediates, rank all candidate C's — a fan-out search for unexpected endpoints. **Closed discovery** ("verification mode"): given a fixed A and C (a hypothesis you already suspect), find and rank the B-paths that would explain/support it. donto's "generate many speculative relationships then verify the valuable few" is exactly open-then-closed discovery. The hard engineering problem LBD has wrestled with for 40 years is *ranking*: open discovery generates a combinatorially overwhelming candidate list, so the entire literature is essentially a competition over scoring functions for "which of these millions of latent links is worth a human's attention." Classic systems (Swanson & Smalheiser's Arrowsmith, Hristovski's BITOLA, Petric's RaJoLink, Weeber's concept-based DAD-system) rank B/C candidates by frequency, tf-idf, and co-occurrence association measures; LION LBD (Pyysalo/Cambridge, 2019) added a rich menu — Jaccard, normalized PMI, symmetric conditional probability, chi-squared, log-likelihood — over a graph of ~27M PubMed abstracts with NER-grounded entities. The recurring lesson, brutally relevant to donto: even with good ranking, the true target often sits at rank 56–299 (closed) or rank 15–120,000 (open) in LION's own evaluation — i.e., the signal is real but buried, and *precision of ranking is the entire game*.

The modern shift (roughly 2018→present) moved LBD from explicit co-occurrence to learned representations. SemMedDB/Semantic MEDLINE (Kilicoglu, Rindflesch, NLM) replaced raw co-occurrence with ~130M *typed* semantic predications (subject-predicate-object triples like "Drug-X TREATS Disease-Y") extracted by the SemRep parser, enabling discovery over a typed knowledge graph rather than bag-of-terms — a direct precursor to donto's quad/predicate structure (note: NLM deprecated SemMedDB on 31 Dec 2024). Then knowledge-graph-embedding methods (TransE, RDF2Vec, complex link prediction) and contextual embeddings (BioBERT-based, temporal-difference embeddings) reframed open discovery as *link prediction on a literature KG* — and crucially adopted **time-sliced evaluation**: train on literature before year Y, test whether the model predicts links that were actually published after Y. This is the field's hard-won, honest evaluation protocol and donto should adopt it directly. Embedding methods improved recall of plausible links but lost interpretability (you get a score, not a B-path), spawning a tension between ranked-list quality and explainability that remains unresolved.

The frontier (2024-2026) is finally *agentic and partially multi-perspective* — which is where the founder's vision overlaps most and is least uniquely novel. Markus Buehler's SciAgents (MIT, 2024/2025, Advanced Materials) samples *paths through a large ontological knowledge graph* (built from ~1,000 papers, 33K nodes/49K edges) and runs a multi-agent team — an Ontologist that defines the concepts on the path, Scientist agents that draft a hypothesis spanning the path, and a Critic that adversarially reviews — to surface cross-domain connections (e.g., silk ↔ energy-intensive materials) that classical ABC could never reach because the link is a multi-hop, multi-domain narrative rather than a single B-term. Google DeepMind's AI Co-Scientist (Nature, 2025/2026) runs Generation/Reflection/Ranking/Evolution/Proximity/Meta-review agents with an Elo tournament of "scientific debates" to rank competing hypotheses, grounded in literature + ChEMBL/UniProt, and produced *experimentally validated* results (a liver-fibrosis drug-repurposing candidate that blocked ~91% of a scarring response at Stanford; an antimicrobial-resistance mechanism that matched years of unpublished lab work at Imperial). These systems have already operationalized "agents break a problem down, propose cross-domain links, then critique/rank them" at impressive quality. What they have NOT done is (a) decompose entities through the *full spectrum of human analytical lenses* (they are scientifically/biomedically scoped — causal/mechanistic, not philosophical/aesthetic/semiotic/teleological), (b) treat identity as a query-time hypothesis, or (c) *persist* the millions of rejected/speculative/contradictory links as durable, evidence-anchored legal state. They generate, rank, surface the top few, and discard the rest. That discard is donto's white space.

**Foundational works:**

- **Undiscovered Public Knowledge** — Don R. Swanson (1986): Knowledge can be logically implied by the union of two existing literatures yet remain unknown because no human read both. The founding premise of 'connections nobody made' — and almost verbatim the founder's thesis. Published in Library Quarterly; the related 'A ten-year update' (1996) extended it. https://www.journals.uchicago.edu/doi/10.1086/601720
- **Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge (the ABC model)** — Don R. Swanson (1986): The canonical existence-proof and the ABC syllogism: A→B and B→C in disjoint, non-co-citing literatures imply a testable A→C. Hypothesis (fish oil treats Raynaud's) was clinically confirmed in 1989. This is the precise mechanism donto's cross-lens link discovery generalizes. https://muse.jhu.edu/article/403510/summary
- **Migraine and Magnesium: Eleven Neglected Connections** — Don R. Swanson (1988): Second validated demonstration — 11 independent indirect links all supporting magnesium-deficiency→migraine, later clinically supported. Shows the method finds *convergent multi-path* evidence, not just single bridges — relevant to donto scoring a link by how many lenses independently surface it. http://abel.lis.illinois.edu/tutorial/swanson_pbm_1988.pdf
- **Arrowsmith System + open/closed discovery** — Don R. Swanson & Neil R. Smalheiser (1997-1999): First operational LBD tool: two PubMed searches define literatures A and C; system computes a ranked 'B-list' of shared intermediate terms filtered by semantic type. Established the open-discovery (A→?) vs closed-discovery (A→?→C) distinction that maps onto donto's generate-vs-verify split. https://en.wikipedia.org/wiki/Arrowsmith_System
- **The Place of Literature-Based Discovery in Contemporary Scientific Practice** — Neil R. Smalheiser (2008/2012): The field's honest self-assessment: ABC finds only a narrow slice of possible discoveries (single-bridge links), generates overwhelming candidate volumes, and includes 'gap analysis' (problems nobody is studying because they fall between disciplines) — directly naming the combinatorics and inter-disciplinary-gap problems donto inherits. http://abel.lis.illinois.edu/tutorial/smalheiser_LBD_preprint_2008.pdf
- **Literature-based knowledge discovery: the state of the art (survey)** — Sebastian, Siew, Orimaye et al. (2012): Comprehensive map of ranking methods (linking-term measures, tf-idf, association measures) and systems (Arrowsmith, BITOLA, RaJoLink). Names the core limitations donto must solve: knowledge bottlenecks, terminology inconsistency, ranking reliability, scaling cost, and experimental validation difficulty. https://arxiv.org/pdf/1203.3611
- **SemMedDB / Semantic MEDLINE (typed semantic predications)** — Halil Kilicoglu, Thomas Rindflesch et al. (NLM) (2012): Moved LBD from co-occurrence to ~130M typed subject-predicate-object predications (e.g., TREATS, CAUSES) over 37M citations, enabling discovery over a typed knowledge graph — the closest classical precursor to donto's predicate/quad substrate. (Deprecated by NLM 31 Dec 2024.) https://academic.oup.com/bioinformatics/article/28/23/3158/195282

**Modern AI systems:**

- **LION LBD** — Interactive neural/co-occurrence LBD system for cancer biology over ~27M PubMed abstracts; NER-grounded entities to 6 ontologies; ranks candidate A→B→C links with a menu of association measures (Jaccard, normalized PMI, SCP, chi-squared, log-likelihood); supports both open and closed discovery with drill-down to source sentences. Adds a CNN 'hallmarks of cancer' classifier. _[Replicated 5 of Swanson's classic discoveries and 5 modern cancer A-B-C chains; but honestly reports targets at rank 56-299 (closed) and rank 15-120,000 (open); manual review judged ~44% (closed) / ~34% (open) of top candidates potentially valid. Open-source at lbd.lionproject.net.]_ https://pmc.ncbi.nlm.nih.gov/articles/PMC6499247/
- **Neural open/closed LBD + KG link-prediction for drug repurposing** — Reframes LBD as link prediction on a literature/biomedical knowledge graph using KG embeddings (TransE, RDF2Vec, ComplEx) and BioBERT-based contextual embeddings; introduces time-sliced evaluation (train pre-year-Y, test post-Y published links). Applied to COVID-19 and Alzheimer's drug repurposing on SemMedDB-derived graphs. _[Embedding methods improve recall of plausible links but sacrifice the interpretable B-path; time-slicing produced ranked drug candidates with subsequent literature/clinical-trial support. The field's standard honest evaluation protocol now.]_ https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0232891
- **SciAgents** — MIT (Ghafarollahi & Buehler). Samples PATHS through a large ontological knowledge graph (~33K nodes/49K edges from ~1,000 papers) and runs a multi-agent team — Ontologist (defines concepts on the path), Scientist agents (draft + refine a hypothesis spanning the path), Critic (adversarial review) — to surface cross-domain connections classical ABC cannot reach (multi-hop, multi-domain narratives). Closest existing analog to the founder's agentic many-step decomposition. _[Published in Advanced Materials (2025); generated novel bio-inspired materials hypotheses (e.g., silk-mycelium, dandelion-pigment composites) rated more novel/feasible than baselines; open-source (lamm-mit/SciAgentsDiscovery). Validation is mostly in-silico/LLM-judged, not yet wet-lab.]_ https://arxiv.org/abs/2409.05556
- **Google DeepMind AI Co-Scientist** — Multi-agent (Generation, Reflection, Ranking, Evolution, Proximity, Meta-review + Supervisor) hypothesis engine; ranks competing hypotheses via an Elo tournament of simulated 'scientific debates' (AlphaGo-style self-play); grounds in literature + ChEMBL/UniProt/AlphaFold; scales test-time compute on verification. _[Nature paper (2025/2026). Experimentally validated: a liver-fibrosis drug-repurposing candidate blocked ~91% of a scarring response at Stanford; reproduced in days an antimicrobial-resistance gene-transfer mechanism that took an Imperial College lab years (unpublished); Calico confirmed an integrated-stress-response hypothesis. Strongest evidence to date that agentic LBD yields real discoveries.]_ https://deepmind.google/blog/co-scientist-a-multi-agent-ai-partner-to-accelerate-research/
- **SKiM (Serial KinderMiner) + reproducible-pipelines LBD** — Generalized, domain-agnostic LBD system (Cowell/Blewitt) extending ABC beyond biomedicine with simple, scalable ranking; part of a 2025 push ('Make LBD Great Again through Reproducible Pipelines') to fix the field's reproducibility/gold-standard/time-slicing crisis and standardize evaluation. _[SKiM recovers Swanson's discoveries and scales to PubMed; the reproducibility movement is a reaction to LBD results that don't replicate across implementations — a direct warning for donto's evaluation design.]_ https://arxiv.org/pdf/2502.16450

**Relevance to the lens engine:** BORROW: (1) The open/closed discovery distinction is donto's exact two-mode architecture — open discovery = generate speculative cross-lens links; closed discovery = the verification/curation pass on a suspected link. Name and build both explicitly. (2) Time-sliced evaluation is the field's hard-won, non-gameable validation protocol: train donto on the corpus pre-year-Y, measure whether its top-ranked machine-proposed links were later asserted/published. This is how you prove the engine works without manual labeling, and it is essentially the only honest LBD metric. (3) Convergent multi-path scoring (Swanson's 'eleven connections', LION's path-accumulation functions): a cross-lens link is far more credible when MULTIPLE independent lenses surface it — donto should rank a hypothesis-link by how many of its lenses converge on it, not by any single lens's confidence. (4) Typed predications over bag-of-words (SemMedDB lesson) — donto's quad structure already has this advantage; preserve typed argument edges. (5) Drill-down to the source byte: LION's and Arrowsmith's usability came from letting a human see the originating sentence — donto's evidence-anchoring is the same affordance and is essential for trust. AVOID: (1) The ranking-precision trap — LION honestly shows true links buried at rank 15-120,000 in open mode; raw fan-out without a strong scorer produces an unusable haystack. donto's many-lens combinatorics will be far worse than ABC's single-B-term blowup, so ranking/pruning is THE make-or-break problem, not generation. (2) Co-occurrence-only signal generates spurious links; prefer typed/argumentative edges and require explanatory paths. (3) The reproducibility crisis — pin corpora, seeds, and evaluation splits from day one. (4) Don't discard the rejected candidates the way SciAgents/Co-Scientist do — that persistence IS donto's differentiator (see white space).

**Already done vs white space:** ALREADY DONE (the founder should NOT assume 'no one has thought to do this'): The core thesis — that machine-assembled connections across disjoint knowledge can constitute genuine, validated discovery — is 40 years old and clinically proven (Swanson 1986/1988). Open/closed discovery, candidate-link ranking, scaling to ~27-37M documents, typed-predication knowledge graphs, KG-embedding link prediction with time-sliced evaluation, and now multi-agent path-sampling + adversarial critique + tournament ranking (SciAgents, Co-Scientist) are all built and, in the agentic case, producing wet-lab-validated discoveries in 2025-2026. 'Agents decompose a problem, propose cross-domain links, and critique/rank them' is effectively the state of the art, not white space. GENUINE WHITE SPACE for donto: (1) MANY-LENS DECOMPOSITION as the link-generation substrate. Every LBD system to date is mono-perspective — biomedical/causal/mechanistic. None decomposes entities through the *full spectrum* (philosophical, semiotic, teleological, aesthetic, phenomenological, ethical, mereological, ecological) and then mines link candidates at lens INTERSECTIONS. LBD finds A-B-C bridges within one ontology; donto's bet is that the richest unmade connections live *between incommensurable lenses*, which no LBD system attempts. (2) PERSISTING the rejected/speculative/contradictory candidates as durable, evidence-anchored, paraconsistent legal state. Every LBD/agentic system generates millions of candidates, surfaces the top-k, and throws the rest away. donto's hypothesis_only + contradiction-frontier + supports/rebuts/undercuts edges let the discarded 99.9% remain queryable forever and be re-ranked as the corpus and lenses evolve — a standing 'latent-structure reservoir' no LBD system maintains. (3) IDENTITY-AS-HYPOTHESIS at query time. LBD assumes entity grounding/NER is settled before discovery; donto lets the merge itself be a weighted, lens-dependent hypothesis — which is where many false LBD links actually come from (spurious co-occurrence of ambiguously-grounded entities). (4) Lean-4 certification of the rare valuable link — formal shape/rule verification of a discovered relationship has no analog in LBD, which validates only empirically/clinically.

**Hard problems:**
- RANKING / combinatorial explosion is the central unsolved problem: open discovery already produces overwhelming candidate lists in single-ontology ABC (LION buries true links at rank 15-120,000); donto's cross-product of many lenses over all entities makes this dramatically worse — the scorer, not the generator, decides whether the engine is usable.
- EVALUATION without ground truth: there is no gold standard for 'a connection nobody made' (by definition). Time-slicing (predict post-year-Y published links) is the only honest metric, but it only validates connections that *eventually got published* — it cannot score genuinely novel never-published links, which are exactly the target. This is a deep epistemic bind.
- SPURIOUS correlation vs causal/explanatory link: co-occurrence and embedding similarity surface statistically-associated but meaningless pairs; distinguishing a real latent relationship from coincidence at scale remains unsolved (LION's own undirected co-occurrence edges flagged as a key weakness).
- NOISE from extraction and entity grounding: LBD links are only as good as NER/relation extraction; ambiguous entity resolution manufactures false bridges. donto's identity-as-hypothesis helps but pushes the ambiguity into the ranking, not away.
- REPRODUCIBILITY: LBD results notoriously fail to replicate across implementations (the 2025 'reproducible pipelines' movement exists because of this); corpus, preprocessing, and split choices dominate outcomes.
- INCOMMENSURABILITY across lenses: assembling a coherent A→C claim that spans, e.g., a phenomenological and an economic lens has no established semantics — how do you even *type* a cross-lens edge, let alone score its strength? No prior LBD work addresses inter-paradigm linking.
- VALIDATION cost / the last mile: even perfectly ranked hypotheses require expensive human or wet-lab verification; the rate-limiting step is curation, and the rare valuable link is hidden among many plausible-but-wrong ones (Co-Scientist mitigates with tournaments but still needs a lab).
- INTERPRETABILITY vs accuracy tradeoff: embedding/LLM methods rank better but lose the explicit explanatory path that makes a discovery trustworthy and actionable; donto must deliver both a score AND a human-readable lens-path.


### bisociation-computational-creativity

The founder's intuition — "connections no one thought of, because no one holds all the lenses at once" — is, almost word for word, Arthur Koestler's BISOCIATION. In "The Act of Creation" (1964), Koestler argues every creative act (the comic Haha, the scientific Aha, the artistic Ah) shares one structure: perceiving a situation or idea simultaneously in two self-consistent but HABITUALLY INCOMPATIBLE "matrices" / frames of reference (M1, M2). Ordinary thought is ASSOCIATION — moving within a single plane/matrix. Creativity is BISOCIATION — the collision/fusion of two planes that normally never touch. The single richest precedent: the value is not the facts inside a frame, it is the relation that springs from intersecting two frames. This is the founder's thesis, stated in 1964.

Koestler's idea was operationalized into a real computational program by the EU FP7 BISON project (2008-2012), summarized in Michael Berthold (ed.), "Bisociative Knowledge Discovery: An Introduction to Concept, Algorithms, Tools, and Applications" (Springer LNCS 7250, 2012; 32 chapters; consortium incl. Berthold/Konstanz, Nada Lavrac & Dunja Mladenic/Jozef Stefan Institute, Werner Dubitzky, Christian Borgelt). They formalized bisociation as discovery of bridges between weakly-connected or disjoint "domains" inside a heterogeneous graph called a BisoNet (Bisociative Information Network — nodes are concepts/units from many sources, edges are evidential relations). Three computational TYPES of bisociation were distinguished: (1) bridging CONCEPTS (a single term/node co-occurring in two otherwise-unlinked domains — the classic b-term); (2) bridging GRAPHS / structural similarity (two subgraphs in different domains share an isomorphic relational pattern — analogy); (3) bridging by GRAPHS (a connecting path/subgraph that links two domains). Crucially they tried to make "bisociativeness" a RANKABLE score — distinguishing a genuinely surprising cross-domain link from a trivially common one.

The concrete, working instantiation is CrossBee (Cross-Context Bisociation Explorer; Jursic, Cestnik, Urbancic, Lavrac, ICCC 2012; http://crossbee.ijs.si). You feed it two document sets from two domains (e.g. two non-interacting literatures); it ranks candidate BRIDGING TERMS ("b-terms") by a BISOCIATION SCORE computed as an ENSEMBLE of text-mining heuristics (frequency, tf-idf, outlier-ness, appearance in both domains, etc.) voting together, then offers side-by-side document inspection so a human EXPERT verifies the link. This is a direct ancestor of donto's lens-engine: machine over-generates candidate cross-context links; human (or downstream certifier) disposes. Its intellectual root is older still — Don Swanson's LITERATURE-BASED DISCOVERY ("Undiscovered Public Knowledge," 1986): two literatures (Raynaud's disease and fish oil; migraine and magnesium) never co-cited, but logically linked through a shared bridging concept B (blood viscosity), yielding a testable A-C hypothesis later clinically confirmed. Swanson's ABC model (A-B + B-C therefore maybe A-C) is the canonical computational template for "relationships no one drew because the two literatures were isolated."

Running alongside bisociation is CONCEPTUAL BLENDING / conceptual integration (Gilles Fauconnier & Mark Turner, "The Way We Think," 2002). Where Koestler collides two frames, blending integrates them: two (or more) INPUT mental spaces selectively project into a BLENDED space via a GENERIC space, and the blend develops EMERGENT structure not present in either input (their canonical example: "the Buddhist monk" riddle; "this surgeon is a butcher"). Blending has been computationally modeled: Joseph Goguen's algebraic/category-theoretic Unified Concept Theory; Pereira's DIVAGO (2005, optimality-principle metrics); and the EU COINVENT project (Schorlemmer, Kutz, Confalonieri, Pease, et al., 2014-2016) which models blends as AMALGAMS (knowledge-transfer via colimits in category theory) and applied them to mathematical concept invention and music. The third lens is Margaret Boden ("The Creative Mind," 1990; "Creativity and Art," 2010): creativity comes in three kinds — COMBINATIONAL (novel combinations of familiar ideas — bisociation/blending live here), EXPLORATORY (finding unvisited points inside an existing conceptual space / set of rules), and TRANSFORMATIONAL (changing the rules of the space so previously-impossible ideas become thinkable). The donto vision spans all three: many lenses = many conceptual spaces; cross-lens intersection = combinational; pushing a lens "to the utmost" = exploratory; a lens that rewrites another's assumptions = transformational. Modern LLM work has revived all of this: PopBlends (Petridis et al., CHI 2023 — LLM+knowledge-base conceptual blends for design), LiveIdeaBench (2024-25, benchmarking LLM divergent thinking on single-keyword scientific idea generation), Nature/Sci-Reports studies showing LLMs reach population-average creativity but not top-decile humans, and multi-agent hypothesis engines (Google's AI Co-Scientist 2025, SciMON ACL 2024, Sakana AI Scientist) that generate-debate-rank-evolve cross-literature hypotheses — exactly the "agentic many-lens generate-then-verify" loop, but without a paraconsistent substrate to hold the rejected/contradictory candidates.

**Foundational works:**

- **The Act of Creation (bisociation)** — Arthur Koestler (1964): Creativity = bisociation: perceiving one idea in two self-consistent but HABITUALLY INCOMPATIBLE frames/matrices (M1,M2) at once. The payoff is the collision of frames, not the contents of one frame. This IS the founder's thesis, stated 60 years ago — the single most important precedent. https://en.wikipedia.org/wiki/The_Act_of_Creation
- **Undiscovered Public Knowledge / Raynaud's-fish-oil + migraine-magnesium (Literature-Based Discovery, ABC model)** — Don R. Swanson (1986 (Raynaud's), 1988 (migraine), Arrowsmith 1990s): Two non-interacting literatures (A and C) linked through a shared bridging concept B yield a true, testable, previously-undrawn relationship A-C. The canonical 'connection no human drew because the two corpora were isolated' — the computational ancestor of the whole lens-engine idea, with real clinical confirmation. https://news.uchicago.edu/story/don-r-swanson-information-science-pioneer-1924-2012
- **The Way We Think: Conceptual Blending and the Mind's Hidden Complexities (conceptual integration)** — Gilles Fauconnier & Mark Turner (2002 (theory from mid-1990s)): Two input mental spaces project selectively into a blended space (via a generic space); the blend has EMERGENT structure absent from both inputs. Intersection of frames doesn't just connect — it CREATES new relational structure. Directly models 'a relationship that emerges only at the intersection of lenses.' https://pages.ucsd.edu/~scoulson/spaces/fauconnier05.pdf
- **The Creative Mind: Myths and Mechanisms (three kinds of creativity)** — Margaret A. Boden (1990 / 2nd ed. 2004; Creativity and Art 2010): Combinational (novel combos — where bisociation/blending sit), Exploratory (new points in an existing conceptual space), Transformational (changing the space's rules). Gives the founder a taxonomy: each lens is a conceptual space; cross-lens links are combinational; pushing a lens to the limit is exploratory; a lens rewriting another's rules is transformational. https://www.themarginalian.org/2025/08/22/margaret-boden-creativity/
- **Bisociative Knowledge Discovery: An Introduction to Concept, Algorithms, Tools, and Applications (BISON project, BisoNets)** — Michael R. Berthold (ed.); Dubitzky, Kötter, Lavrač, Mladenić, Borgelt et al. (2012 (Springer LNCS 7250; EU FP7 BISON 2008-2012)): The most direct prior art: an entire research program that computationalized Koestler. Formalized bisociation as bridging across domains in a BisoNet (heterogeneous evidence graph), defined three bisociation types (bridging concept / structural-similarity graph / bridging graph), and tried to make 'bisociativeness' a rankable score. Read this before building — it is donto's lens-engine, minus the paraconsistent substrate and the agents. https://link.springer.com/book/10.1007/978-3-642-31830-6
- **CrossBee: Cross-Context Bisociation Explorer (+ ensemble b-term heuristics)** — Matjaž Juršič, Bojan Cestnik, Tanja Urbančič, Nada Lavrač (2012 (ICCC)): A working generate-then-verify system: machine ranks candidate bridging terms by an ENSEMBLE of heuristics (a bisociation score), human expert inspects/confirms. Exactly donto's intended workflow (over-generate speculative cross-lens links, curate the rare valuable ones) — proves the architecture and shows where it gets stuck (too many candidates, expert is the bottleneck). https://computationalcreativity.net/iccc2012/wp-content/uploads/2012/05/226-Jursic.pdf
- **Computational conceptual blending: Goguen's Unified Concept Theory, Divago, COINVENT amalgams** — Joseph Goguen; Francisco Câmara Pereira (Divago); Schorlemmer, Kutz, Confalonieri, Pease (COINVENT) (1999-2006 (Goguen/Divago); 2014-2016 (COINVENT FP7)): Shows blending CAN be made algorithmic (blends = amalgams / colimits in category theory; optimality principles as quantitative metrics) and that the hard part is SELECTING good blends from a combinatorial explosion of possible ones. The selection/optimality problem is precisely donto's 'verify the rare valuable hypotheses' problem. https://www.iiia.csic.es/~enric/papers/Ch1-CoInvent.pdf

**Modern AI systems:**

- **CrossBee + TextFlows bridging-term workflows** — Web tool (crossbee.ijs.si) that takes two document sets, ranks cross-domain bridging terms via an ensemble heuristic bisociation score, and gives side-by-side inspection for human verification. Later re-implemented in the TextFlows platform. _[Reproduced Swanson's migraine-magnesium bridging terms; academic adoption in the cross-domain literature-mining community; the canonical running prototype of computational bisociation. Not at web scale, single-pair-of-domains at a time.]_ http://crossbee.ijs.si/
- **PopBlends** — LLM + knowledge-base pipeline that auto-suggests conceptual blends (e.g. pop-culture x brand) using Fauconnier-Turner blending strategies; supports both divergent and convergent design ideation. _[CHI 2023; user study: people found ~2x more blend suggestions with half the mental demand vs. without. Demonstrates LLM+KB beats LLM-alone for combinational creativity.]_ https://savvaspetridis.github.io/papers/popblends.pdf
- **Google AI Co-Scientist** — Multi-agent (Gemini 2.0) hypothesis-generation system: Generation/Reflection/Ranking/Evolution/Proximity/Meta-review agents run self-play scientific debate + ranking tournaments to evolve novel, literature-grounded hypotheses. _[Feb 2025; reported wet-lab-validated proposals (drug repurposing for AML, novel antimicrobial-resistance mechanism, liver-fibrosis targets). The closest existing thing to 'agents generate cross-domain relationships, then rank/verify' — but hypotheses live in a transient run, not a persistent contradiction-preserving store.]_ https://deepmind.google/blog/co-scientist-a-multi-agent-ai-partner-to-accelerate-research/
- **SciMON / LiveIdeaBench / The AI Scientist (Sakana)** — SciMON (ACL 2024) retrieves 'inspirations' from prior literature and optimizes ideas explicitly for novelty; LiveIdeaBench (2024-25) benchmarks LLM divergent thinking from single keywords across 22 domains; Sakana's AI Scientist runs end-to-end idea->experiment->paper. _[SciMON shows novelty-optimized retrieval beats vanilla generation; LiveIdeaBench (40+ models, 1180 keywords) finds idea-generation poorly predicted by general-intelligence benchmarks — a key signal that lens-diversity, not model IQ, may drive discovery.]_ https://arxiv.org/abs/2412.17596
- **Spark / serendipitous knowledge-discovery systems & ABC LBD tools** — Modern literature-based-discovery engines (Spark, Arrowsmith descendants, SemMedDB-driven ABC pipelines, word-embedding bridging-term detection) that mine bridging concepts between literatures and rank candidate A-C hypotheses. _[Active biomedical-discovery subfield; main published finding is sobering — over-generation of candidates and lack of agreed evaluation standards are the binding constraints, not generation capacity.]_ https://link.springer.com/article/10.1186/s12859-019-2989-9
- **LLM divergent-creativity comparative studies** — Controlled studies measuring LLM novelty/originality/flexibility/diversity vs. humans on divergent-thinking, problem-solving, and creative-writing tasks. _[2025: LLMs match/exceed average human creativity but top-decile humans still beat every model; LLM outputs cluster (lower diversity) — relevant warning that a single agent over many lenses may collapse toward homogeneous 'connections.']_ https://www.nature.com/articles/s41598-025-25157-3

**Relevance to the lens engine:** BORROW: (1) The vocabulary and metrics — frame donto's payoff explicitly as bisociation (Koestler) and conceptual blending (Fauconnier-Turner): a discovered relationship is valuable in proportion to how HABITUALLY INCOMPATIBLE its two source lenses/domains are. Steal CrossBee's idea of a rankable BISOCIATION SCORE computed as an ENSEMBLE of heuristics over a heterogeneous graph — donto already is that graph (a BisoNet by another name). (2) Swanson's ABC bridging template is the cleanest first product: surface A-B and B-C claims sitting in two different ctx:* contexts/lenses that are never co-cited, propose A-C as a hypothesis_only edge, anchor B as the bridge with evidence. (3) Boden's taxonomy gives a roadmap and honest framing: most cross-lens output will be combinational; treat exploratory (push one lens to its limit) and transformational (one lens rewrites another's assumptions) as harder, rarer, higher-value tiers. (4) COINVENT/Divago teach that the bottleneck is SELECTION/optimality among a combinatorial explosion of blends — design for ranking and pruning from day one, not generation. AVOID / be warned: (a) The b-term/blend space explodes combinatorially; CrossBee, LBD tools, and blending systems ALL hit the same wall — far more candidate links than any human can review, and no agreed way to tell signal from noise. Donto's edge must be that its paraconsistent substrate can HOLD the explosion as legal hypothesis_only state forever (where prior systems had to discard), with the Lean-4 overlay + evidence-anchoring + argument edges (supports/rebuts/undercuts) as the eventual VERIFY/prune mechanism — this is genuinely the missing piece in every prior system. (b) LLM creativity clusters/homogenizes (Nature 2025); a single agent run over many lenses risks producing samey 'connections.' Force lens-diversity structurally (distinct prompts/personas/conceptual spaces per lens, as the AI Co-Scientist's specialized agents do) and measure diversity, not just count. (c) Resist raw-volume framing — Koestler, Swanson and BISON all insist the win is the RARE high-bisociativeness link across distant frames, not millions of intra-frame facts; the founder's refined 'depth-of-decomposition-then-intersection' view is correct and should be the headline.

**Already done vs white space:** ALREADY DONE (the founder must not reinvent these): (1) The core idea — "creativity = connecting two habitually-incompatible frames" — is Koestler 1964, named bisociation; it is not new. (2) "Find relationships no human drew because two corpora/frames are isolated" is Swanson's literature-based discovery (1986) and was clinically validated. (3) An entire EU research program (BISON, Berthold ed. 2012) computationalized exactly this: BisoNets, three formal bisociation types, rankable bisociativeness, and a working tool (CrossBee) that over-generates cross-domain bridging links and has a human verify them. (4) "Intersection of frames yields emergent relational structure" is conceptual blending (Fauconnier-Turner 2002), and it has been made algorithmic (Goguen, Divago, COINVENT amalgams). (5) "Agents generate-debate-rank-evolve cross-literature hypotheses" is the 2024-25 AI-co-scientist / SciMON / Sakana wave. So "no one has thought to do this" is, frankly, false at the level of the concept and even of single-pair tooling. GENUINE WHITE SPACE (where donto is actually novel): (a) SCALE + MANY LENSES SIMULTANEOUSLY — every prior system bisociates TWO domains/literatures at a time chosen by a human; nobody runs the FULL spectrum of analytical lenses agentically over ALL entities at once and harvests the combinatorial set of cross-lens intersections. (b) A PARACONSISTENT, CONTRADICTION-PRESERVING SUBSTRATE THAT CAN HOLD THE SPECULATION FOREVER — this is the deepest gap. CrossBee/LBD/COINVENT/AI-Co-Scientist all generate transient candidates that are discarded if not immediately validated; none has a legal, queryable, permanent home for unanchored, mutually-contradictory, hypothesis_only relationship-claims with typed argument edges. (c) IDENTITY-AS-HYPOTHESIS + EVIDENCE-FIRST + LEAN-CERTIFICATION as the curation layer — using formal proof to certify the rare valuable shapes out of the speculative frontier is, as far as the literature shows, unattempted. (d) Closing the loop: generate (agents) -> hold (paraconsistent quad store) -> rank (bisociation score) -> certify (Lean) -> promote, as one persistent system rather than a one-shot pipeline. The novelty is NOT the lens idea and NOT bisociation; it is the AGENTIC-MANY-LENS + PERSISTENT-PARACONSISTENT-HOLD + FORMAL-VERIFY combination at substrate scale.

**Hard problems:**
- COMBINATORIAL EXPLOSION: with L lenses over N entities the cross-lens link space is enormous (>= N^2 x L^2 candidate intersections); generation is trivially cheap, so the system is instantly drowned in candidates. Every prior system (CrossBee, LBD, COINVENT) hits this wall.
- EVALUATION HAS NO GROUND TRUTH: the LBD field's own consensus is that there are no agreed evaluation standards and results swing with the dataset/method; a 'good' bisociation is defined by later human/empirical validation that you don't have at generation time. Distinguishing a profound cross-lens link from a coincidence is the central unsolved problem.
- SIGNAL-VS-NOISE / TRIVIALITY: most cross-frame links are either trivially true (a stopword-like bridge term co-occurs everywhere) or spurious. CrossBee needed an ensemble of heuristics + a human just to surface the few real ones; ranking bisociativeness reliably is unsolved at scale.
- THE OPTIMALITY/SELECTION PROBLEM (Goguen, COINVENT): conceptual-blending theory's optimality principles (e.g. 'good form,' 'web,' 'unpacking') are notoriously hard to formalize and compute; choosing the few good blends from the combinatorial set has no clean algorithm.
- LLM HOMOGENIZATION: a single agent over many lenses tends to produce clustered, low-diversity outputs (Nature 2025), undercutting the whole 'connections no one thought of' premise unless lens-diversity is enforced structurally and measured.
- DEFINING 'HABITUALLY INCOMPATIBLE' / DOMAIN BOUNDARIES OPERATIONALLY: bisociativeness depends on the two frames being genuinely distant; but quantifying frame-distance/domain-membership in a substrate where everything is in one graph is itself unsolved — too-distant looks like noise, too-close is mere association.
- VERIFICATION COST AND SCOPE: Lean-4 certification can validate logical shapes/rules but cannot certify EMPIRICAL truth of a proposed real-world relationship; bridging the gap between 'formally well-formed hypothesis' and 'true discovery' still requires external evidence/experiment, which doesn't scale to millions of candidates.
- PROVENANCE & PARACONSISTENT BLOAT: holding every speculative, contradictory relationship-claim forever risks an unmanageable frontier; deciding what to garbage-collect / down-weight vs. preserve as legal state, without losing the rare gem, is an open governance problem.


### analogy-structure-mapping

Analogical reasoning is the most directly relevant intellectual tradition to the lens-engine vision, because a "lens comparison across entities" IS structurally an analogy: it asks whether the *system of relations* holding among one entity's parts also holds among another's, independent of surface features. The field's canonical theory is Dedre Gentner's Structure-Mapping Theory (1983): an analogy maps relational structure from a *base* domain to a *target*, and the quality of a mapping is governed by the **systematicity principle** — people (and good algorithms) prefer to carry over deep, interconnected *systems* of higher-order relations (causal, mathematical) rather than isolated attributes or surface features. This was operationalized in the **Structure-Mapping Engine (SME)** (Falkenhainer, Forbus & Gentner, 1986/1989), a local-to-global structural alignment algorithm that, given two predicate-calculus representations, returns correspondences, a structural-evaluation score, and *candidate inferences* — new claims about the target imported from the base. The candidate-inference output is exactly the "relationship no one thought to draw" the founder wants: SME doesn't just match, it *generates novel hypotheses* by projecting unmatched base structure onto the target.

The second great lineage is Douglas Hofstadter's Fluid Analogies Research Group (FARG) and its books *Fluid Concepts and Creative Analogies* (1995) and *Surfaces and Essences* (Hofstadter & Sander, 2013). Hofstadter's radical claim — "analogy is the core, the fuel and fire, of all thinking" — reframes categorization, perception, and concept-formation themselves as analogy-making. The computational model, **Copycat** (Mitchell & Hofstadter), differs philosophically from SME: rather than receiving fixed representations and aligning them, Copycat *builds* its representations fluidly via a **Slipnet** (a conceptual network whose link-lengths/"conceptual slippage" change dynamically), a **Workspace** (a blackboard of perceptual structures), a **Coderack** of stochastic **codelets** (parallel micro-agents that compete/cooperate), and a global **temperature** that anneals the search and serves as a quality proxy. The deep lesson for donto: *representation is not given, it is constructed under pressure*, and the same situation supports many rival construals — which maps cleanly onto donto's "identity-is-a-hypothesis" and paraconsistent-frontier stance.

The third tradition is Fauconnier & Turner's **Conceptual Blending / Conceptual Integration** (1990s–2002, *The Way We Think*). Where structure-mapping is asymmetric (base→target), blending is many-to-one: two-or-more input mental spaces, a generic space of shared structure, and a *blend* that selectively projects from each input and crucially generates **emergent structure** (via composition, completion, elaboration) present in *neither* input. This is the theoretical name for "relationships that emerge at the intersection of lenses" — the payoff the founder describes is essentially emergent structure in a blend. Computational blending (Goguen's algebraic/category-theory amalgams, the COINVENT project, divago) exists but is brittle and hard to evaluate.

The scale story arrived with Dafna Shahaf, Aniket Kittur, Joel Chan and Tom Hope: **"Accelerating Innovation Through Analogy Mining"** (KDD 2017, Best Paper) learned *purpose* and *mechanism* vector representations from product descriptions (crowdsourcing + RNNs) so that analogies could be mined from messy real-world repositories (the patent/idea corpus) — finding products with the same purpose but different mechanism, or vice versa. **SOLVENT / the Analogy Search Engine** (Chan, Hope et al., 2018) extended this to scientific papers, annotating background/purpose/mechanism/findings and embedding them so cross-domain research analogies surface that pure IR misses. This is the closest existing relative of donto's vision at the document level — but it uses *one* coarse facet schema (purpose×mechanism), not a full spectrum of philosophical/linguistic/temporal/ethical/etc. lenses, and it does not hold contradictory mappings as durable state.

The 2023–2026 LLM wave reopened everything. Webb, Holyoak & Lu (*Nature Human Behaviour* 2023) reported "emergent analogical reasoning" in GPT-3/4 (Raven's-style matrices, letter strings, story analogies) at or above human level zero-shot. This was sharply contested: Lewis & Mitchell (2024) and Hodel & West showed performance *collapses* on **counterfactual** variants (permuted/synthetic alphabets) where humans stay robust — evidence the apparent reasoning leans on training-data similarity, not domain-general structure mapping. Webb et al. (2024) replied that with code-execution/tool augmentation the capacity generalizes. Newer work splits the difference and is most useful to donto: hybrid systems like **YARN** (Khojasteh et al., 2026) explicitly *re-fuse* Gentner-style structural mapping with LLM-derived multi-level abstractions, finding that pure LLM prompting fails on "far" (low-surface-similarity) analogies and pure SME scores below random, but LLM-abstraction-then-structural-align beats both — a direct template for donto. "Parallelograms Strike Back" (2026) even argues LLMs now generate better analogies than people in some settings. Net: LLMs are excellent at *proposing* candidate cross-domain relations and at *abstracting* messy text into mappable structure, but unreliable at *certifying* whether a mapping is structurally valid and not surface-pattern-matching — precisely the gap donto's evidence-anchoring + Lean-4 certification + paraconsistent hold-without-collapse could fill.

**Foundational works:**

- **Structure-Mapping Theory (SMT)** — Dedre Gentner (1983): An analogy maps a SYSTEM of relations from base to target, ignoring surface/object features; the 'systematicity principle' says deep interconnected higher-order (causal/mathematical) structure is preferred over isolated attributes. This is the formal definition of what 'comparing two entities through a lens' actually is. https://groups.psych.northwestern.edu/gentner/papers/Gentner83.pdf
- **The Structure-Mapping Engine (SME)** — Brian Falkenhainer, Kenneth Forbus, Dedre Gentner (1986 / 1989): A local-to-global structural alignment algorithm that returns not just correspondences and a match score but CANDIDATE INFERENCES — novel claims projected from base onto target. The candidate-inference step is the literal engine for 'a relationship no one thought to draw.' https://groups.psych.northwestern.edu/gentner/papers/FalkenhainerForbusGentner89.pdf
- **Copycat / Fluid Concepts and Creative Analogies (FARG)** — Douglas Hofstadter & Melanie Mitchell (1995): Analogy as 'high-level perception': representations are CONSTRUCTED fluidly (Slipnet + Workspace + stochastic codelets + temperature), not given. The same situation supports rival construals chosen under pressure — the cognitive-science analogue of donto's identity-as-hypothesis and contradiction frontier. https://en.wikipedia.org/wiki/Fluid_Concepts_and_Creative_Analogies
- **Surfaces and Essences: Analogy as the Fuel and Fire of Thinking** — Douglas Hofstadter & Emmanuel Sander (2013): The strong thesis that ALL thinking — categorization, concept formation, perception — is analogy-making. Justifies treating every 'lens' as an analogy-generating apparatus rather than a mere feature extractor. https://www.basicbooks.com/titles/douglas-hofstadter/surfaces-and-essences/9780465018475/
- **Conceptual Blending / Conceptual Integration (The Way We Think)** — Gilles Fauconnier & Mark Turner (1998 / 2002): Two+ input mental spaces + a generic space project selectively into a BLEND that contains EMERGENT structure present in neither input (via composition, completion, elaboration). This is the precise theoretical name for the founder's 'relationships that emerge at the intersection of lenses.' https://markturner.org/blendaphor.html
- **Analogical mapping / multiconstraint theory (similarity, structure, purpose)** — Keith Holyoak & Paul Thagard (ACME/ARCS) (1989 / 1995): Analogy is governed by simultaneous soft constraints — semantic similarity, structural parallelism, and pragmatic purpose — satisfied in parallel rather than strict isomorphism. Argues a lens-engine needs goal/purpose weighting, not just structural match. https://onlinelibrary.wiley.com/doi/10.1207/s15516709cog1303_1

**Modern AI systems:**

- **Analogy Mining (purpose-mechanism)** — Learns 'purpose' and 'mechanism' vector embeddings from product/idea descriptions (crowdsourcing + RNN) so analogies can be mined from large messy repositories (e.g. the patent corpus): same purpose / different mechanism, or vice versa. KDD 2017 Best Paper. _[Best Paper + Best Student Paper KDD 2017; found analogies usable by experts; seeded a whole research line on computational analogy at scale.]_ https://arxiv.org/abs/1706.05585
- **SOLVENT / Analogy Search Engine** — Mixed-initiative system annotating scientific papers by background/purpose/mechanism/findings, embedding them to surface cross-domain research analogies that pure information retrieval misses. _[Found more (and more useful) analogies than IR baselines; annotations generalized across domains; experts rated discovered analogies inspiring. Closest existing relative of donto's vision at document level — but ONE facet schema, no contradiction-holding.]_ https://arxiv.org/abs/1812.06974
- **SciAgents** — Multi-agent (ontologist / scientist-1 / scientist-2 / critic) system over a 33K-node, 48K-edge ontological knowledge graph that samples RANDOMIZED heuristic PATHS between distant concept nodes to seed cross-domain hypotheses, then expands and critiques them; benchmarks novelty against Semantic Scholar. _[Published in Advanced Materials (Wiley, 2025); revealed 'hidden interdisciplinary relationships previously considered unrelated' in bio-inspired materials. Strongest existing proof that KG-path-sampling + agentic critique surfaces non-obvious cross-domain relations — but no paraconsistent persistence, no formal certification, single domain validated.]_ https://arxiv.org/abs/2409.05556
- **Emergent analogical reasoning in LLMs (Webb, Holyoak, Lu)** — Showed GPT-3/4 solve novel matrix/letter-string/story analogies zero-shot at or above human level — the empirical basis for using LLMs as the 'lens' analogy-proposers. _[Nature Human Behaviour 2023, highly cited; GPT-4 reported even stronger. Establishes LLMs as competent analogy PROPOSERS.]_ https://www.nature.com/articles/s41562-023-01659-w
- **Counterfactual analogy evaluation (Lewis & Mitchell)** — Tests LLM analogy on counterfactual variants (permuted/synthetic alphabets) far from training data; GPT performance collapses while humans stay robust — the key caution against trusting LLM-proposed mappings as ground truth. _[Widely cited rebuttal; reframed the debate as 'proposal vs verification.' Directly motivates donto's need to HOLD-then-CERTIFY rather than trust LLM mappings.]_ https://arxiv.org/abs/2402.08955
- **YARN (LLM-abstraction + structure mapping)** — Hybrid pipeline: LLMs decompose narratives into units and produce multi-level abstractions (conceptual/evaluative/narrative-arc/stage), then a structural-mapping algorithm aligns them — explicitly re-fusing Gentner SMT with LLMs. _[Beats both pure-LLM CoT (0.46 vs 0.41 MCQ) and pure structural mapping (which scores BELOW random, 0.17) on far/low-surface analogies. Direct architectural template for donto: LLM abstracts -> structural aligner verifies.]_ https://arxiv.org/abs/2603.29997
- **Parallelograms Strike Back** — 2026 study arguing LLMs now generate higher-quality analogies than human participants in several generation settings. _[Evidence the proposal/generation half of the pipeline is increasingly LLM-solvable; shifts the bottleneck to selection/verification/curation — donto's strong suit.]_ https://arxiv.org/abs/2603.19066

**Relevance to the lens engine:** A lens-to-lens comparison across two entities IS an analogy in Gentner's exact sense, so this field hands donto a ready vocabulary and tooling. BORROW: (1) The systematicity principle as a relevance filter — rank machine-proposed relationships by the SIZE and INTERCONNECTEDNESS of the relational system they share under a lens, not by surface attribute overlap; this is the antidote to the combinatorial-noise problem (most cross-lens pairs will be junk). (2) SME's candidate-inference mechanism as the literal generator of 'relationships no one thought to draw' — when two entities align structurally under, say, the teleological lens, project the UNMATCHED base structure onto the target as a new hypothesis_only edge. (3) The Hope/Shahaf/SOLVENT purpose×mechanism schema as proof that faceting documents into structured aspects and embedding each facet separately yields better cross-domain matches than holistic embeddings — donto generalizes this from 2 facets to N lenses. (4) SciAgents' randomized KG-path-sampling between distant nodes as a concrete way to PROPOSE candidate relationships across donto's 39.5M statements without enumerating all pairs. (5) The YARN result that LLM-abstraction-THEN-structural-alignment beats both pure LLM and pure SME — donto should use LLM lenses to ABSTRACT entities, then a structural/Lean-certified aligner to VALIDATE, never trusting the LLM's raw mapping. (6) Copycat's temperature/codelet stochasticity and Fauconnier-Turner's emergent-structure vocabulary to name and rank the payoff. AVOID: (a) treating LLM-proposed analogies as ground truth — the Lewis & Mitchell counterfactual collapse shows they pattern-match; donto must keep them as weighted, evidence-anchored, contestable claims (its native mode). (b) Requiring strict isomorphism (classic SME brittleness) — Holyoak-Thagard multiconstraint and donto's paraconsistency both argue for soft, purpose-weighted, contradiction-tolerant matching. (c) One fixed facet schema — donto's many-lens ambition is exactly the generalization SOLVENT stopped short of.

**Already done vs white space:** ALREADY DONE (the founder should not reinvent these): (1) The core theory that cross-domain relationship discovery = structural analogy, with a working algorithm that emits NOVEL hypotheses (SME candidate inferences) — 40 years old. (2) Analogy MINING AT SCALE over messy real-world repositories using learned facet embeddings — Hope/Shahaf (patents 2017) and SOLVENT (scientific papers 2018) already demonstrated 'find cross-domain analogies humans missed, experts find them useful.' (3) AGENTIC, multi-agent, KG-path-sampling cross-domain hypothesis generation that 'reveals hidden interdisciplinary relationships' — SciAgents (2024-25) is a published, peer-reviewed instance of a large chunk of the founder's pitch, in materials science. (4) LLMs as competent analogy proposers AND as the abstraction layer feeding a structural aligner (YARN 2026). So 'use AI to break entities down and find cross-domain relationships' is, at the level of a single facet schema in a single domain, NOT new. GENUINE WHITE SPACE: (a) The FULL SPECTRUM of lenses — every prior system uses one or a few facets (purpose/mechanism; ontological KG edges). Nobody has run a dozen+ heterogeneous analytical lenses (phenomenological, semiotic, ethical, aesthetic, mereological, teleological...) over the SAME entities and looked for relationships at lens INTERSECTIONS. The intersection-of-many-lenses is real white space. (b) PARACONSISTENT, PERSISTENT HOLDING of speculative/contradictory machine-proposed mappings as durable first-class state with typed argument edges (supports/rebuts/undercuts) — every analogy-mining system today is one-shot retrieval; none KEEPS the rejected and the contradictory mappings as a queryable frontier over time. (c) EVIDENCE-ANCHORING each proposed relationship to a source byte + bitemporal provenance — SOLVENT/SciAgents do not anchor or version their analogies. (d) FORMAL CERTIFICATION (Lean-4) of the rare valuable mappings' structural validity — no analogy system formally proves a mapping's shape. The combination (many-lens + agentic-proposal + paraconsistent-hold + evidence-anchor + certify) at substrate scale is, as far as the literature shows, unbuilt.

**Hard problems:**
- COMBINATORIAL EXPLOSION / signal-to-noise: N entities x M lenses x pairwise comparison is astronomically large, and the overwhelming majority of cross-lens 'relationships' will be spurious. SME's systematicity score and Holyoak-Thagard multiconstraint help rank, but no scalable, calibrated relevance filter for many-lens intersections exists. SciAgents' random-path sampling is a heuristic dodge, not a solution.
- EVALUATION / what counts as a GOOD discovered relationship: analogy quality is notoriously hard to measure; SOLVENT and SciAgents fall back on expert judgment or Semantic-Scholar novelty checks. There is no agreed automatic metric distinguishing a profound cross-domain insight from a superficial pun, so curation cost stays high.
- LLM SURFACE-PATTERN-MATCHING vs genuine structure mapping: Lewis & Mitchell's counterfactual collapse shows LLM-proposed analogies may be training-data echoes, not valid structural mappings. Verifying a machine mapping is actually structurally sound (not surface) is unsolved at scale — this is exactly the gap donto's certification layer must close.
- REPRESENTATION / the 'tractability vs flexibility' dilemma (the SME-vs-Copycat split): SME needs clean predicate-calculus input (where does it come from?); Copycat builds representations fluidly but doesn't scale beyond microworlds. Getting LLM-derived abstractions that are both rich enough to map and clean enough to align reliably (YARN's finding: 'no single abstraction works best across all settings') is open.
- CAUSAL and higher-order relation transfer: every recent system (YARN, narrative-analogy work) reports that models capture object/attribute correspondences but mis-transfer CAUSAL and higher-order relations — yet higher-order relations are precisely what systematicity says make an analogy valuable. The most valuable mappings are the hardest to get right.
- ASYMMETRY and DIRECTION: structure-mapping is directional (base->target) and blending is many-to-one with selective projection; deciding which entity is base, which lens dominates, and what to project (vs suppress) into an emergent 'blend' has no principled automatic answer.
- GROUNDING / hallucinated relationships: agentic LLM proposers will confabulate plausible-sounding cross-domain links with no evidential basis; without donto-style mandatory evidence-anchoring, the discovery engine becomes a serendipity-shaped hallucination generator.


### kg-link-prediction-completion

Knowledge graph completion (KGC), a.k.a. link prediction, is the machine-learning field devoted to scoring the plausibility of unstated triples (h, r, t) so that missing/latent edges can be inferred from observed ones. It is the single most directly relevant prior art to donto's "discover relationships no one drew" vision — it has been predicting unstated relationships at scale for a decade. The dominant paradigm is the geometric/algebraic EMBEDDING model: TransE (Bordes et al., NeurIPS 2013) treats a relation as a translation h + r ≈ t in real vector space; DistMult uses a bilinear diagonal product; ComplEx (Trouillon et al., ICML 2016) moves to complex-valued embeddings so the Hermitian dot product can score asymmetric relations differently by argument order; RotatE (Sun et al., ICLR 2019) models a relation as a rotation in complex space, letting one model express symmetry/antisymmetry, inversion, AND composition simultaneously. These are scored against a corrupted-negative ranking objective and evaluated by MRR / Hits@k on benchmarks (FB15k-237, WN18RR, YAGO3-10). They are cheap, scalable to tens of millions of edges, and genuinely surface unstated facts — but they are SHALLOW relational pattern-matchers, transductive (no embedding exists for an unseen entity), and they encode only the latent geometry of co-occurrence, not meaning.

A second, older and more interpretable tradition is RULE MINING: AMIE / AMIE+ / AMIE3 (Galárraga et al., WWW 2013 onward) mine closed Horn rules ("Datalog") with support/confidence under a partial-completeness assumption; AnyBURL (Meilicke et al., IJCAI 2019, VLDBJ 2023) samples bottom-up paths and generalizes them into rules anytime, and — strikingly — a simple symbolic rule learner MATCHES OR BEATS most embedding models on link prediction while producing human-readable, evidence-bearing explanations. This matters enormously for donto: rules are inherently auditable and map naturally onto donto's typed argument edges and Lean-certifiable shapes. Hybrids now feed embedding-predicted links back to enrich the graph before rule mining (Betz/Meilicke et al., 2024).

The frontier moved to two places. (1) GNN / path-based and INDUCTIVE KGC: GraIL (Teru et al., 2020) reasons over enclosing subgraphs so it generalizes to unseen entities; NBFNet (Zhu et al., NeurIPS 2021) reframes link prediction as a learned generalized Bellman-Ford over paths and is a strong SOTA; and ULTRA (Galkin et al., ICLR 2024) is the watershed — a single FOUNDATION MODEL that does zero-shot link prediction on ANY KG with any entity/relation vocabulary, by learning representations of the *graph of relations* (how relations interact) rather than fixed per-entity embeddings, beating graph-specific baselines across 50+ KGs. This is the closest thing to "one model, every domain" — but it still operates purely on graph topology, not on cross-modal or deep-semantic content. (2) LLM-AUGMENTED KGC (2023-2026): KICGPT (Wei et al., EMNLP 2023 / 2024) couples a structure-aware retriever with an LLM reranker to fix the long-tail problem; KG-LLM, SAT (structure-aware alignment-tuning, 2025), DrKGC (subgraph-retrieval-augmented LLMs, 2025), and ontology-enhanced LLM-KGC (2025) all inject the LLM's world knowledge and natural-language semantics as a second signal. These finally bring lexical/semantic understanding to KGC — the bridge to donto's agentic-lens idea — but they bolt the LLM onto a single structural task, not onto a many-lens decomposition.

Crucially, the field is honest about what it MISSES. (a) Degree/popularity bias: embedding KGC preferentially scores high-degree "rich club" entities, so it amplifies what is already well-studied and overlooks the long tail (Shomer et al., WWW 2023; biological-KG topology study, bioRxiv 2024) — the opposite of serendipity. (b) Plausibility ≠ novelty ≠ truth: KGC ranks how much a candidate edge *resembles the existing distribution*, so "best" predictions are often the most obvious/redundant ones, and benchmarks (FB15k/WN18) are inflated by reverse-triple leakage and binarized n-ary relations (Akrami et al.; "On Large-scale Evaluation," 2025). (c) Calibration is poor, especially under the realistic open-world assumption — confidence scores do not equal probabilities of truth (Tabacof & Costabello, EMNLP 2020; "Using Model Calibration to Evaluate Link Prediction," WWW 2024; KGE-Calibrator, EMNLP 2025). (d) Contradictions: standard KGC assumes a single consistent truth and cannot natively HOLD a contradiction; even uncertain-KG embeddings (UKGE, Chen et al., AAAI 2019) model confidence but still struggle to represent negative/false links as legal state. Every one of these gaps is something donto's paraconsistent, evidence-first, calibration-agnostic substrate is architecturally built to absorb.

**Foundational works:**

- **TransE — Translating Embeddings for Modeling Multi-relational Data** — Antoine Bordes, Nicolas Usunier, Alberto García-Durán, Jason Weston, Oksana Yakhnenko (2013): A relation is a translation in vector space (h + r ≈ t). Established that unstated edges can be predicted cheaply at scale by geometry — the founding move of embedding-based relationship discovery, and the simplest baseline to beat. https://proceedings.neurips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html
- **ComplEx — Complex Embeddings for Simple Link Prediction** — Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, Guillaume Bouchard (2016): Complex-valued embeddings + Hermitian dot product let one model score asymmetric/antisymmetric relations differently by argument order, while staying linear in time/space. Shows the SCORING GEOMETRY must match the relation's algebraic type — directly relevant to donto querying identity/relations 'under a lens'. https://arxiv.org/abs/1606.06357
- **RotatE — Knowledge Graph Embedding by Relational Rotation in Complex Space** — Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, Jian Tang (2019): Modeling each relation as a rotation captures symmetry, antisymmetry, inversion AND composition in one model, plus self-adversarial negative sampling. The clearest statement that different RELATION PATTERNS need different algebraic structure — a single embedding space cannot represent all relationship types equally. https://arxiv.org/abs/1902.10197
- **AMIE / AMIE+ / AMIE3 — symbolic Horn-rule mining over KGs** — Luis Galárraga, Christina Teflioudi, Katja Hose, Fabian Suchanek (2013-2020): Mines human-readable logical rules with support/confidence under a partial-completeness assumption. The interpretable, evidence-bearing alternative to black-box embeddings — maps onto donto's typed argument edges and Lean-certifiable shapes. https://github.com/dig-team/AMIE
- **AnyBURL — Anytime Bottom-Up Rule Learning for KG Completion** — Christian Meilicke, Melisachew Wudage Chekol, Daniel Ruffinelli, Heiner Stuckenschmidt (2019 (VLDBJ 2023)): Samples paths bottom-up and generalizes them to rules anytime; a SIMPLE symbolic learner that matches/beats embeddings on link prediction with full explanations. Proof that interpretable path-rules are competitive with opaque vectors — the right substrate for an audit-first system. https://link.springer.com/article/10.1007/s00778-023-00800-5
- **NBFNet — Neural Bellman-Ford Networks (path-based GNN link prediction)** — Zhaocheng Zhu, Zuobai Zhang, Louis-Pascal Xhonneux, Jian Tang (2021): Reframes link prediction as a learned generalized Bellman-Ford over paths between a pair of nodes — combining the interpretability of path-reasoning with the power of GNNs, and generalizing inductively to unseen entities. https://proceedings.neurips.cc/paper_files/paper/2021/file/f6a673f09493afcd8b129a0bcf1cd5bc-Paper.pdf
- **UKGE — Embedding Uncertain Knowledge Graphs** — Xuelu Chen, Muhao Chen, Weijia Shi, Yizhou Sun, Carlo Zaniolo (2019): Embeds confidence scores (not binary truth) and uses probabilistic soft logic to infer confidence for unseen facts. The honest precursor to donto's weighted-hypothesis stance — but it still cannot hold mutually-contradictory claims as legal state, which is exactly donto's differentiator. https://web.cs.ucla.edu/~yzsun/papers/2019_AAAI_UKG.pdf

**Modern AI systems:**

- **ULTRA (Foundation Model for KG Reasoning)** — A single pre-trained model that does ZERO-SHOT link prediction on any KG with any entity/relation vocabulary, by learning representations of the 'graph of relations' (relation-to-relation interactions) rather than fixed per-entity embeddings. UltraQuery extends it to inductive logical-query answering. _[Across 57 KGs, zero-shot inductive performance often matches or beats baselines TRAINED on each specific graph; the dominant transferable-KGC result of 2024. Open-source (DeepGraphLearning/ULTRA), HF checkpoints.]_ https://arxiv.org/abs/2310.04562
- **KICGPT (LLM with Knowledge in Context)** — Couples a structure-aware triple-retriever with an LLM reranker; encodes KG structure into in-context demonstrations (Knowledge Prompt) to fix the long-tail-entity problem without extra finetuning. _[EMNLP 2023; SOTA or near-SOTA on FB15k-237 / WN18RR especially for long-tail entities; widely cited template for LLM+retriever KGC.]_ https://arxiv.org/abs/2402.02389
- **SciAgents (Buehler lab, MIT)** — Multi-agent system over an ontological knowledge graph (~33k nodes / 49k edges from ~1,000 papers) that traverses the graph to surface hidden interdisciplinary connections and autonomously generates+refines scientific hypotheses with multiple specialized LLM agents. The closest existing realization of the founder's 'agents traverse a graph to find links no human drew' vision. _[Published in Advanced Materials (2025); reported to reveal previously-unrelated interdisciplinary links in bio-inspired materials and generate mechanistically-grounded hypotheses. Open-source (lamm-mit/SciAgentsDiscovery).]_ https://arxiv.org/abs/2409.05556
- **SAT / DrKGC / Ontology-Enhanced LLM-KGC (2025 LLM-KGC wave)** — Family of structure-aware LLM-KGC methods: SAT aligns graph embeddings with NL space via contrastive multi-task tuning; DrKGC does dynamic subgraph retrieval-augmented LLM completion across general+biomedical domains; ontology-enhanced variants inject schema constraints into the LLM. _[SAT reports 8.7%-29.8% link-prediction improvement over prior SOTA on four benchmarks; DrKGC strong cross-domain (general + biomedical). Mostly 2025 arXiv/venue papers.]_ https://arxiv.org/abs/2509.01166
- **Drug-repurposing / literature-based-discovery KGC pipelines** — Casts hypothesis generation as link prediction over literature-derived biomedical KGs (SemMedDB, custom). The real-world proof that KGC produces actionable NOVEL hypotheses (gene-disease, drug-disease), the closest deployed instance of donto's serendipity-engine payoff. _[Rare-disease repurposing KGE reached AUROC ~0.89 on known indications while proposing novel candidates; COVID-19 repurposing via KGC (2020); active subfield with wet-lab follow-ups.]_ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6937428/
- **Calibration tooling (Tabacof & Costabello; KGE-Calibrator)** — Methods+evaluations to turn KGC scores into trustworthy probabilities; show calibration works under closed-world but degrades badly under the realistic open-world assumption, and that calibrated scores measurably improve human-AI collaborative curation. _[Tabacof & Costabello (EMNLP 2020); 'Using Model Calibration to Evaluate Link Prediction' (WWW 2024); KGE-Calibrator (EMNLP 2025) calibrates without hurting ranking. Open-source (Yang233666/KGE-Calibrator).]_ https://aclanthology.org/2020.emnlp-main.667/

**Relevance to the lens engine:** BORROW (don't reinvent): (1) Use KGC as the cheap, scalable FIRST PASS that proposes candidate edges over donto's 39.5M statements — ULTRA-style inductive/foundation models and AnyBURL-style path-rules both run at this scale and need no per-entity training, fitting a substrate where 'identity is a hypothesis.' (2) Adopt rule mining (AMIE/AnyBURL/NBFNet paths) as the EXPLAINABLE generator: every proposed edge arrives with a path/rule that can be written as a donto typed argument edge (supports/rebuts) and handed to the Lean-4 overlay for shape certification — turning machine guesses into auditable claims. (3) Treat calibration as a first-class output, not an afterthought: store the model score AND a calibrated open-world probability on each hypothesis_only edge so curation can triage. (4) Use the LLM-KGC wave (KICGPT, SAT, DrKGC) as the bridge from structure to SEMANTICS — they are the existence proof that natural-language meaning improves link prediction. AVOID: (a) Don't let embedding-style scoring be the arbiter of value — it is degree-biased ('rich club') and rewards REDUNDANT, distribution-conforming edges, which is anti-serendipity; the rare valuable cross-domain link will score LOW by construction. donto should explicitly up-weight low-prior, cross-context (cross-ctx:*) candidates rather than top-ranked ones. (b) Don't inherit the single-consistent-truth assumption baked into every KGC loss; donto's paraconsistent frontier is precisely the part KGC cannot do. (c) Don't trust benchmark MRR as a proxy for discovery quality — it is inflated by leakage and measures plausibility, not novelty or truth.

**Already done vs white space:** ALREADY DONE (the founder should NOT believe 'no one has thought to do this'): Predicting unstated relationships at massive scale is a solved, decade-old industry — TransE→RotatE→ComplEx→NBFNet→ULTRA do it transductively and now zero-shot across 50+ graphs. Generating NOVEL, actionable cross-entity hypotheses is done in drug repurposing / literature-based discovery (with wet-lab validation). Explainable, evidence-bearing link proposals exist (AMIE, AnyBURL, NBFNet paths). Confidence/uncertainty on proposed edges exists (UKGE, calibration work). And — most pointedly — AGENTS traversing a knowledge graph to surface previously-unrelated interdisciplinary links and auto-generate hypotheses ALREADY EXISTS in SciAgents (Buehler, 2025) and the broader agentic-graph-discovery wave (GraphAgents, cross-domain materials design, 2026). So the 'agents break things down and find links no human drew' core is real and demonstrated. GENUINE WHITE SPACE (the defensible novelty): (1) The MANY-LENS decomposition as the GENERATIVE engine. All prior KGC predicts within a SINGLE relation vocabulary / single ontology / single modality; nobody systematically decomposes each entity through the full spectrum of analytical lenses (mereological, teleological, semiotic, phenomenological, ethical, ecological...) and then mines relationships at the INTERSECTION of lenses. SciAgents traverses one graph; it does not multiplex perspectives. (2) The PARACONSISTENT, contradiction-preserving SUBSTRATE for holding millions of speculative, mutually-incompatible machine-proposed edges forever as legal state — no KGC system can do this; they all collapse to one truth. (3) The EVIDENCE-FIRST byte-anchoring of every speculative edge plus a Lean-4 certification path, giving a generate-hold-verify lifecycle that the ML field has no equivalent for (KGC outputs a ranked list, not a curated, source-anchored, machine-checked claim store). The honest novelty is therefore NOT 'discover unstated links' (done) but 'the AGENTIC + MANY-LENS generation × PARACONSISTENT/EVIDENCE-FIRST holding × certifiable verification PIPELINE at substrate scale' — the combination, not any single piece.

**Hard problems:**
- Plausibility ≠ novelty ≠ truth: KGC scores reward edges that resemble the existing distribution, so the highest-ranked predictions are the most redundant/obvious — the genuinely valuable cross-domain link scores LOW. No good objective exists for 'surprising-yet-true.'
- Degree/popularity bias ('rich club'): embedding KGC over-scores well-connected entities and ignores the long tail, amplifying what is already studied — structurally anti-serendipity (Shomer et al. WWW 2023).
- The combinatorial explosion: many lenses × millions of entities = astronomically many candidate intersection-edges. Generating is cheap; the bottleneck is RANKING/curation and the cost of false positives flooding the substrate.
- Evaluation has no ground truth for novel discovery: benchmark MRR is inflated by reverse-triple leakage and n-ary binarization, and rewards rediscovering known edges; measuring whether a never-before-drawn link is correct requires expensive external (often wet-lab/expert) validation.
- Calibration under the open-world assumption: KGC confidence scores are not probabilities of truth, and existing calibration largely fails OWA — so triaging which speculative edges to verify is unreliable (Tabacof & Costabello 2020; WWW 2024).
- Contradiction handling: every standard KGC loss assumes one consistent truth and cannot natively represent/hold mutually-contradictory or false links — exactly the state donto wants to keep, with no off-the-shelf method to score within it.
- Cross-modal / deep-semantic links: embeddings encode only co-occurrence geometry within one relation vocabulary; they cannot represent analogical, teleological, semiotic, or phenomenological relations, nor links that span modalities — the lenses the founder cares about are precisely the ones current KGC cannot embed.
- Inductive generalization to NEW relation TYPES: even ULTRA generalizes to new entities/graphs but the field still struggles when the RELATION vocabulary itself is unseen or open-ended — a many-lens engine continuously invents new relation types.
- Noise and hallucination from LLM-augmented generation: LLM-KGC and agentic generators produce fluent but spurious edges; without strict evidence-anchoring and certification they pollute the graph faster than humans can curate.


### ai-scientific-hypothesis-generation

This field is the single closest existing analogue to donto's "many-lens relationship-discovery engine," and its 60-year arc is essential context. The intellectual root is Don Swanson's Literature-Based Discovery (LBD, 1986): his "undiscovered public knowledge" thesis holds that independently-created literature fragments can be logically related yet never connected, and his ABC model (A relates to B, B relates to C, therefore hypothesize A-C) found the fish-oil/Raynaud's link purely by bridging non-interacting MEDLINE literatures — later clinically validated. This is EXACTLY the founder's intuition ("relationships no human thought to draw because no human holds all the literatures/lenses at once"), and it predates LLMs by 40 years. LBD's whole premise is that the value is in the intersection/bridge term, not the facts inside either literature — identical to the founder's "payoff is at the intersection of lenses."

The second lineage is closed-loop autonomous science: Ross King's Robot Scientist Adam (Cambridge/Aberystwyth, 2009, first machine to autonomously discover new scientific knowledge — yeast functional genomics) and Eve (drug screening), now Genesis. The critical lesson here is that Adam/Eve close the loop — they generate hypotheses, design discriminating experiments, RUN them with lab robotics, and revise. This is the "verify/curate" half of the founder's vision made physical, and it's the part pure-text systems lack. The third lineage is embedding/representation-based latent-knowledge extraction: Tshitoyan et al. (Nature 2019, "mat2vec") trained Word2vec on 3.3M materials-science abstracts and showed the unsupervised embeddings recommended thermoelectric materials YEARS before their actual discovery — i.e., the "latent structure of future discoveries is already embedded in past text." This is the strongest empirical proof that machine-readable latent relationships exist in a corpus and can be surfaced. The fourth lineage is knowledge-graph link prediction as hypothesis generation: drug-repurposing KGs (DRKG, COVID-19 KGs using GNNs/ComplEx/ensemble KG embeddings) frame a new drug-disease hypothesis literally as predicting a missing edge, validated via AUROC/AUPRC and explanatory paths — this is the paradigm donto's substrate is architecturally nearest to.

The 2024-2026 wave fuses LLM agents with all of the above. SciAgents (Ghafarollahi & Buehler, MIT, arXiv:2409.05556, Advanced Materials 2025) is the MOST relevant single system: it builds a large ontological knowledge graph (~33K nodes / 49K edges from ~1,000 papers), then samples a PATH between two concepts — crucially WITH INJECTED RANDOMNESS / random waypoints to force non-deterministic, exploratory, serendipitous bridges — and hands that path to a multi-agent pipeline (Ontologist → Scientist-1 proposes hypothesis → Scientist-2 adds mechanism/experiment → Critic evaluates → novelty checked against Semantic Scholar). It explicitly claims to reveal "hidden interdisciplinary relationships previously considered unrelated." This is essentially the founder's engine for one domain, minus the paraconsistency and the persistent contradiction-holding substrate. Google DeepMind's AI co-scientist (Feb 2025, Gemini 2.0) is the most mature: a Supervisor orchestrates Generation, Reflection, Ranking, Evolution, Proximity, and Meta-review agents; hypotheses compete in an Elo TOURNAMENT via simulated scientific debate (self-play), and test-time compute scales the search. It produced wet-lab-validated results: AML drug-repurposing candidates that inhibited tumor viability, anti-fibrotic epigenetic targets in liver organoids, and independently re-derived a then-unpublished antimicrobial-resistance mechanism (phage capsid gene transfer). Adjacent recent systems — BioDisco (dual-mode KG+literature evidence, iterative feedback, and a notable TEMPORAL evaluation that tests whether a hypothesis is confirmed by post-cutoff literature), KG-CoI / Knowledge-Grounded LLMs (arXiv:2411.02382), TruthHypo/KnowHD, and Bayes-Entropy collaborative agents — all converge on grounding hypotheses in graphs to fight hallucination and ranking/refining to control quality.

The honest verdict on what LLM ideation actually delivers: Si, Yang & Hashimoto (Stanford, arXiv:2409.04109, 100+ NLP researchers) found LLM-generated ideas were judged statistically MORE NOVEL than expert ideas (p<0.05) but slightly less feasible — encouraging for the founder. BUT the follow-up "Ideation-Execution Gap" (arXiv:2506.20803, 2025) had 43 experts actually EXECUTE the ideas (100+ hrs each): after execution, LLM ideas' scores collapsed on every metric and human ideas overtook them. The lesson directly applicable to donto: surface novelty is cheap and machine-abundant; durable value requires execution/verification, which is exactly why a substrate that can HOLD speculative relationships cheaply and selectively VERIFY the rare valuable ones is the right architecture — but the verification step is where all the real difficulty (and value) lives.

**Foundational works:**

- **Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge (Literature-Based Discovery; ABC model)** — Don R. Swanson (1986): Logically-related but non-interacting literatures hide valid hypotheses at the bridge (B) term; value is in the cross-literature intersection no single reader holds — the 40-year-old direct ancestor of the founder's many-lens intersection idea. https://muse.jhu.edu/article/403510/summary
- **The Automation of Science / Robot Scientist Adam (and Eve, Genesis)** — Ross D. King et al. (2009): First machine to autonomously generate AND experimentally test hypotheses in a closed loop — proves the 'verify the rare valuable hypothesis' half; pure generation without a disposing/verification loop is not discovery. https://www.cam.ac.uk/research/news/robot-scientist-becomes-first-machine-to-discover-new-scientific-knowledge
- **Unsupervised word embeddings capture latent knowledge from materials science literature (mat2vec)** — Tshitoyan, Dagdelen, Weston, Dunn, Persson, Ceder, Jain et al. (2019): Unsupervised embeddings over 3.3M abstracts recommended materials YEARS before discovery — empirical proof that latent future-relationships are already encoded in an existing corpus and can be surfaced without labels. https://www.nature.com/articles/s41586-019-1335-8
- **Knowledge-graph link prediction as hypothesis generation (DRKG / COVID-19 drug-repurposing KGs; ComplEx, GNNs, ensemble KG embeddings)** — Multiple (DRKG team; Hsieh et al.; Ioannidis et al.) (2020-2023): A new relationship hypothesis = a predicted missing edge, scored (AUROC/AUPRC) and explained via supporting paths — the paradigm donto's quad substrate is architecturally closest to; relationships are rankable predictions, not facts. https://arxiv.org/pdf/2212.03911
- **Discovering Research Hypotheses Using Knowledge Graph Embeddings** — Springer / KG-embedding LBD line (2021): Frames hypothesis discovery over a paper-derived KG as link prediction (ComplEx) — generalizes LBD/Swanson into embedding space, the bridge between symbolic LBD and modern neural discovery. https://link.springer.com/chapter/10.1007/978-3-030-77385-4_28

**Modern AI systems:**

- **SciAgents (MIT, Ghafarollahi & Buehler)** — Multi-agent (Ontologist, Scientist-1, Scientist-2, Critic) discovery over a ~33K-node ontological KG; samples a PATH between two concepts with INJECTED RANDOMNESS / random waypoints to force serendipitous interdisciplinary bridges, then generates+mechanizes+critiques the hypothesis and checks novelty against Semantic Scholar. The single closest existing analogue to the founder's lens-intersection engine. _[Published in Advanced Materials (2025); open-source (lamm-mit/SciAgentsDiscovery); claims to reveal hidden interdisciplinary material relationships 'previously considered unrelated'; evaluated mainly by expert/critic-agent judgment, not wet-lab at scale.]_ https://arxiv.org/abs/2409.05556
- **Google DeepMind AI co-scientist** — Supervisor orchestrates Generation, Reflection, Ranking, Evolution, Proximity, Meta-review agents (Gemini 2.0); hypotheses compete in an Elo TOURNAMENT via self-play scientific debate; test-time compute scales the search; iterative evolution refines. _[Wet-lab-validated: AML drug-repurposing candidates inhibited tumor viability; anti-fibrotic epigenetic targets in human liver organoids; independently re-derived an unpublished antimicrobial-resistance mechanism (phage capsid gene transfer). Enterprise pilots (Daiichi Sankyo, Bayer).]_ https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/
- **BioDisco** — Multi-agent biomedical hypothesis generation with DUAL-MODE evidence (KG + literature), iterative feedback, and a TEMPORAL evaluation that tests whether a hypothesis is confirmed by literature published AFTER the model's cutoff — a retro-validation protocol highly relevant to donto's bitemporality. _[Reports evaluation on PubMedQA, GPQA, CardBioMedBench, HypoBench and custom temporal datasets (2025 preprint; adoption unknown).]_ https://arxiv.org/pdf/2508.01285
- **KG-CoI / Knowledge-Grounded LLM hypothesis generation** — Injects KG subgraphs into a chain-of-ideas to ground LLM hypotheses, with a KG-supported hallucination-detection step. _[Reports reduced hallucination vs ungrounded LLM ideation; research-stage.]_ https://arxiv.org/pdf/2411.02382
- **Si/Yang/Hashimoto human study + Ideation-Execution Gap** — The two-part empirical reality check: large blinded studies on whether LLM-generated research ideas are actually good (ideation-only, then post-execution). _[Ideation: LLM ideas judged MORE novel than experts (p<0.05). Execution (arXiv:2506.20803, 43 experts, 100+ hrs each): LLM ideas' scores collapsed on all metrics; humans overtook — surface-novelty does not survive verification.]_ https://arxiv.org/abs/2409.04109
- **Bayes-Entropy collaborative agents / TruthHypo-KnowHD / hypothesis-eval benchmarks (ResearchBench, IdeaBench, LiveIdeaBench, HypoBench)** — Newest wave: uncertainty/entropy-driven iterative hypothesis optimization, truthfulness-vs-hallucination evaluation combining literature+KG retrieval, and a benchmark ecosystem for novelty/feasibility/truthfulness. _[E.g. Bayes-Entropy reports Shannon-entropy drop of 0.92 over 12 iterations; benchmarks expose that most systems optimize novelty/diversity while neglecting truthfulness.]_ https://arxiv.org/pdf/2508.01746

**Relevance to the lens engine:** BORROW: (1) Swanson's ABC/bridge logic and SciAgents' random-waypoint path sampling are the proven mechanics for generating cross-domain relationships no human posited — donto can run the same over its 39.5M-statement graph, but with MANY analytical lenses as the typed dimensions the bridge can traverse, not just co-occurrence. SciAgents needed only ~1,000 papers; donto already has the substrate. (2) The disposer/loop is non-negotiable: Adam/Eve and the AI co-scientist show value comes only when generation is paired with a ranking + verification mechanism (Elo tournament, critic agents, novelty-vs-Semantic-Scholar, and ultimately experiment). donto's Lean-4 certification + evidence-anchoring + argument edges (supports/rebuts/undercuts) ARE a disposer — wire the agent-proposed relationships into it (mirrors donto's own 'agent-proposes / Lean-disposes' rosie-search pattern). (3) BioDisco's temporal evaluation maps perfectly onto donto's bitemporality: hold a speculative relationship as legal state now, and let later-ingested evidence retro-confirm or rebut it without rewriting history. (4) mat2vec proves a cheap embedding pass over the corpus can pre-rank which speculative edges are worth materializing — use it as a candidate generator before expensive agentic decomposition. AVOID / heed the warnings: the Ideation-Execution Gap is the central caution — machine novelty is abundant and cheap; do NOT treat 'hundreds of speculative relationships per pass' (donto already extracts ~483 facts/pass) as the win condition, because surface novelty evaporates under execution. The differentiated value is the VERIFICATION funnel, not generation volume. Also avoid SciAgents/co-scientist's reliance on a single LLM judge as ground truth (your own memory note: 'no authority is ground truth') — donto's paraconsistent, contradiction-preserving design is precisely the antidote, letting you keep rival relationship-claims and their argument edges instead of collapsing to one ranked answer. Combinatorial blow-up (path/lens-pair explosion) is the operational risk SciAgents controls with bounded random sampling and the co-scientist controls with Elo pruning — donto needs an equivalent bounded-candidate + ranking gate (it already has the pattern in its bounded-candidate /search query).

**Already done vs white space:** ALREADY DONE (founder should not reinvent): The core thesis that valuable relationships live unconnected across literatures/domains and can be surfaced mechanically — Swanson proved it in 1986 and it has been clinically validated. 'Latent future relationships already encoded in the corpus' — Tshitoyan/mat2vec proved it (2019). Multi-agent, ontological-KG, randomized-path, serendipitous cross-domain hypothesis generation with a critic and a novelty check — SciAgents IS this (2024-25), for materials. Tournament-ranked, self-debating, evolving multi-agent hypothesis generation with REAL wet-lab validation — Google's AI co-scientist (2025). Relationship-as-link-prediction over biomedical KGs with explanatory paths — the entire drug-repurposing-KG field (2020-23). Evidence-grounded, hallucination-resisting, KG+literature hypothesis generation — KG-CoI, BioDisco, TruthHypo (2024-25). Temporal/retro-validation of hypotheses — BioDisco. Honest evals of whether any of it produces durable value — Si et al. + Ideation-Execution Gap. So the components 'agentic,' 'many-perspective decomposition,' 'cross-domain bridging,' and 'KG substrate' each individually EXIST. GENUINE WHITE SPACE: (1) The full SPECTRUM of human analytical lenses as first-class, typed, persistent dimensions — every existing system uses ONE implicit lens (semantic similarity / domain ontology / co-citation). Nobody has made philosophical, mereological, teleological, semiotic, phenomenological, ethical, ecological etc. lenses explicit, simultaneous, and cross-indexed so relationships emerge at lens INTERSECTIONS. (2) A PARACONSISTENT, contradiction-PRESERVING substrate that HOLDS millions of mutually-incompatible machine-proposed relationships forever as legal state with typed argument edges — every existing discovery engine collapses to a single ranked hypothesis list and discards rivals; donto's hold-without-collapse + identity-as-hypothesis is genuinely unexplored at scale. (3) Domain-GENERALITY — all proven systems are narrow (yeast, materials, biomed); a domain-agnostic engine over a general 39.5M-statement substrate is untested. (4) Evidence-anchoring to source bytes + Lean-4 certification of the rare valuable edge as the disposer is a verification architecture no one has assembled. The combination — agentic many-lens decomposition + paraconsistent hold + formal/evidence verification, domain-general, at substrate scale — is novel even though no individual ingredient is.

**Hard problems:**
- The ideation-execution gap: machine-generated novelty is abundant and cheap but collapses under real execution/verification (Stanford 2025) — generation volume is NOT the win; the verification funnel is, and it is expensive and largely unsolved at scale.
- Evaluation/ground-truth problem: there is no reliable automated metric for whether a discovered relationship is TRUE and VALUABLE vs merely novel; LLM-as-judge and Elo tournaments are proxies that can be gamed, and a hypothesis's worth often can't be known without wet-lab/empirical execution.
- Hallucination and spurious-bridge generation: agents fabricate plausible-but-false relationships; KG-grounding (KG-CoI, BioDisco) reduces but cannot eliminate it (hallucination is argued mathematically inevitable), and grounding to a contradiction-laden substrate complicates 'grounding' itself.
- Combinatorial explosion: the space of lens-pairs x entity-pairs x paths is astronomically larger than any one-lens system; SciAgents and the co-scientist only tame the single-lens version via bounded random sampling and tournament pruning — many-lens intersection search needs a fundamentally better candidate-generation/pruning theory.
- Noise vs signal at scale: most surfaced cross-domain relationships are trivial, coincidental, or already-known; the precision problem (finding the rare Swanson-grade bridge among millions of junk edges) is the field's core unsolved difficulty.
- Knowledge-graph/ontology construction and maintenance cost, plus the un-anchored-relationship problem: holding speculative edges forever is cheap to write but expensive to keep coherent, and a paraconsistent store risks degenerating into an unqueryable contradiction soup without strong argument-edge curation and lens-scoped query semantics.
- Defining and operationalizing the 'lenses' rigorously: 'push each lens to the utmost of human understanding' is under-specified — turning philosophical/teleological/phenomenological analysis into reproducible, comparable, machine-typed features (not just prose) is an open problem with no benchmark.


### foundational-faceted-ontologies

This tradition supplies the *rigorous theory of the lens itself* — what an orthogonal analytical dimension is, how to decompose an entity through several at once, and (critically for donto's "relationship discovery" payoff) how implicit structure emerges from the intersection of dimensions. It splits into three lineages that the founder's vision unknowingly braids together.

(1) Upper / foundational ontologies define a small set of *top-level categories* through which any entity can be viewed — the formal backbone of "lenses." BFO (Barry Smith, Buffalo; the upper ontology of the OBO Foundry / ~hundreds of biomedical ontologies) splits reality into **continuants** (3D enduring things, with independent/dependent/quality/role/disposition/function sub-distinctions) vs **occurrents** (4D processes), unifying 3D-ist and 4D-ist views in one frame. DOLCE (Gangemi, Guarino, Masolo, Borgo; the LOA in Trento, ~2002) is explicitly "cognitive/linguistic-biased" — it carves the categories *underlying natural language and common sense* (endurants, perdurants, qualities, abstracts), and its **Descriptions & Situations (D&S)** extension is the single most relevant piece here: it reifies *descriptions* (roles, concepts, parameters) separately from the *situations/states-of-affairs* they "satisfy," so the *same* facts can be re-interpreted under many descriptions/perspectives without conflict — a near-exact formal analogue of donto's "identity/relationship is a hypothesis queried under a lens." SUMO (Niles & Pease, Teknowledge/Articulate, 2000) is a large, fully axiomatized ontology with first-order reasoning (Sigma/Vampire/E provers) and a complete manual mapping of *every WordNet synset* to SUMO terms — the best example of bridging a lexical lens to a formal one. Cyc (Lenat, 1984–) is the deepest precedent for the *paraconsistent / contextual* angle: its **microtheories (Mt)** scope assertions to assumption-contexts so globally contradictory views (relativistic vs Newtonian physics, fiction vs fact, conflicting economic theories) coexist without exploding — Cyc deliberately is *locally* consistent but *globally* contradiction-tolerant, exactly donto's posture.

(2) Faceted classification is the *methodology of many orthogonal lenses*. S. R. Ranganathan's **Colon Classification** (1933) and its **PMEST** fundamental categories — **P**ersonality (the focal entity), **M**atter (substance/material), **E**nergy (action/process/operation), **S**pace, **T**ime — were the first **analytico-synthetic** scheme: you *analyze* a subject into facets, then *synthesize* a compound class number by combining foci from independent facets with connecting symbols (the colon). The deep claim is that a *small set of orthogonal facets can compose to express an unbounded space of compound subjects no one enumerated in advance* — precisely the combinatorial generativity the founder wants, stated in 1933. PMEST is the historical ancestor of (a) modern **faceted search/navigation** (Pollitt, Shneiderman, Marchionini, and especially **Marti Hearst's Flamenco**, Berkeley 2000s — multi-dimensional filter UIs everywhere now), and (b) BFO/DOLCE-style category systems. The founder's "philosophical, temporal, causal, mereological, teleological…" list is, structurally, a much larger PMEST.

(3) Frame semantics + Formal Concept Analysis give the *emergence engine*. Charles Fillmore's **frame semantics** (1970s–80s) and **FrameNet** (ICSI Berkeley, 1997–) say a word's meaning is only graspable against a whole **frame** — a structured scene with **frame elements** (roles): the COMMERCIAL_TRANSACTION frame binds Buyer, Seller, Goods, Money. Frames are reusable relational "lenses" with typed slots — the conceptual template for any per-lens schema and for relation extraction (semantic role labeling). **Formal Concept Analysis** (Rudolf Wille, "Restructuring Lattice Theory," 1981; Ganter & Wille's *Mathematical Foundations*, 1996/1999, on Birkhoff lattice theory and Peirce/Port-Royal logic) is the deepest mathematical realization of the founder's exact payoff. From a binary **formal context** (objects × attributes table), a **Galois connection** between extents and intents produces **formal concepts** (maximal object-set/attribute-set pairs where neither can grow), and ordering them yields a **concept lattice** — a complete lattice whose nodes are *emergent concepts the analyst never named*, plus a canonical basis of **attribute implications** (A→B: "every object with all of A has all of B") computable via attribute exploration. **This is literally a machine that surfaces latent concepts and rules from an object×attribute matrix** — the founder's "relationships no human thought to draw." Its multi-relational extension, **Relational Concept Analysis (RCA)** (Rouane-Hacène, Huchard, Napoli, Valtchev, 2013), iterates FCA over *several* object sorts linked by relations, abstracting links into relational attributes and producing a *family* of coupled lattices — i.e., discovering cross-entity relations across multiple "kinds" at once, which is structurally what donto's many-lens cross-entity discovery aims at.

**Foundational works:**

- **Colon Classification & PMEST (analytico-synthetic faceted classification)** — S. R. Ranganathan (1933): Decompose any subject into a small set of ORTHOGONAL fundamental facets — Personality, Matter, Energy, Space, Time — then SYNTHESIZE compound subjects by combining foci across facets. A small lens set composes to an unbounded, never-enumerated subject space: the founder's combinatorial generativity, stated in 1933. https://en.wikipedia.org/wiki/Colon_classification
- **Restructuring Lattice Theory / Formal Concept Analysis** — Rudolf Wille (with Bernhard Ganter) (1981 (Ganter & Wille foundations 1996/1999)): From an object×attribute table, a Galois connection yields formal concepts (extent/intent pairs) ordered into a concept LATTICE plus a canonical basis of attribute implications. A literal machine for surfacing EMERGENT concepts and rules implicit in data — the closest classical analogue of donto's 'relationships no one thought to draw.' https://en.wikipedia.org/wiki/Formal_concept_analysis
- **Frame Semantics & FrameNet** — Charles J. Fillmore (ICSI Berkeley team) (1976–1985 theory; FrameNet 1997–): A word's meaning is only intelligible against a structured FRAME — a scene with typed roles (frame elements). Frames are reusable relational lenses with slots; the template for any per-lens schema and for relation/role extraction (SRL). https://en.wikipedia.org/wiki/FrameNet
- **Cyc & microtheories (contextual, contradiction-tolerant KB)** — Douglas Lenat (MCC / Cycorp) (1984–): Scope assertions to assumption-contexts (microtheories) so globally contradictory worldviews coexist without explosion — locally consistent, globally contradiction-tolerant. The deepest precedent for donto's paraconsistent/lens-scoped stance. https://en.wikipedia.org/wiki/Cyc
- **DOLCE + Descriptions & Situations (D&S / DUL)** — Gangemi, Guarino, Masolo, Borgo, Oltramari (LOA Trento) (2002 (D&S ~2004; DUL)): A cognitively-biased upper ontology whose D&S pattern REIFIES descriptions (roles/concepts/parameters) apart from the situations they satisfy — so the same facts can be re-read under many descriptions/perspectives without conflict. A formal analogue of 'identity/relationship is a hypothesis under a lens.' https://arxiv.org/pdf/2308.01597
- **Basic Formal Ontology (BFO)** — Barry Smith et al. (Buffalo) (~2002; ISO/IEC 21838-2 in 2020): A small realist top-level split — continuants (objects, qualities, roles, dispositions, functions) vs occurrents (processes) — unifying 3D and 4D views; the de facto upper ontology of the OBO Foundry. Shows a disciplined, minimal lens-backbone that hundreds of domain ontologies actually share. http://ontology.buffalo.edu/bfo/
- **Suggested Upper Merged Ontology (SUMO) + WordNet mapping** — Ian Niles & Adam Pease (Teknowledge/Articulate) (2000–2003): A fully first-order-axiomatized upper ontology with theorem-prover reasoning (Sigma) and a COMPLETE manual mapping of every WordNet synset to SUMO terms — the strongest worked example of bridging a lexical lens to a formal/logical lens. https://en.wikipedia.org/wiki/Suggested_Upper_Merged_Ontology
- **Relational Concept Analysis (RCA)** — Rouane-Hacène, Huchard, Napoli, Valtchev (2013): Extends FCA to MULTI-relational data: iterates over several object sorts linked by relations, abstracts links into relational attributes, and converges to a coupled FAMILY of lattices that reveal cross-sort implications and connections. Closest formal match to donto's cross-entity, multi-kind discovery. https://link.springer.com/article/10.1007/s10472-012-9329-3

**Modern AI systems:**

- **The Lattice Representation Hypothesis of LLMs** — Bo Xiong (Stanford), 2026 — proposes LLM embeddings encode FCA-style concept LATTICES: attribute directions with thresholds act as separating half-spaces whose intersections induce a concept lattice; concept meet/join become geometric intersection/union. Bridges the Linear Representation Hypothesis with FCA. _[On 5 WordNet domains, LDA recovered concept-attribute relations at 71-83% F1 (physical domains); projection-based subsumption up to 77.1% F1; meet/join produced coherent generalizations (e.g. 'predator' as join of dog+wolf). Research result, not a product.]_ https://arxiv.org/html/2603.01227v1
- **Faceted search / navigation (Flamenco lineage)** — Marti Hearst's Flamenco (Berkeley, 2000s), building on Pollitt/Shneiderman/Marchionini — turned Ranganathan's facets into multi-dimensional filter UIs; now ubiquitous in e-commerce, library, and enterprise search. _[Dominant production paradigm for navigating multi-dimensional item collections; validated by usability studies. Mature, widely deployed.]_ https://people.ischool.berkeley.edu/~hearst/papers/hcir08.pdf
- **FrameNet + neural frame-semantic parsing / SRL** — FrameNet (ICSI) plus neural semantic-role-labeling models that auto-tag frames and roles in text — operationalizing the 'frame as relational lens' for extraction at scale. _[FrameNet has 1,200+ frames / 13,000+ lexical units; SRL is a standard NLP task with strong neural baselines. Mature resource + active research.]_ https://en.wikipedia.org/wiki/FrameNet
- **SUMO + Sigma reasoning system** — Large axiomatized upper ontology run through automated theorem provers (Vampire/E) via Sigma; WordNet-linked for NL-to-logic. _[Tens of thousands of axioms; used for QA, word-sense disambiguation, formal reasoning. Stable, academically active.]_ https://www.gabormelli.com/RKB/Suggested_Upper_Merged_Ontology_(SUMO)
- **Microsoft GraphRAG + LLM-driven KG construction (2024-2025)** — LLM pipelines that extract entities/relations into knowledge graphs and use community detection for corpus-spanning QA — the dominant 2024-25 way to auto-build relational structure from text (NOT facet-theoretic, but the de facto competitor to a 'lens engine' for relationship surfacing). _[GraphRAG open-sourced 2024; KG-construction reported reaching production maturity / strong ROI across industries in 2024-25. Widely adopted.]_ https://aclanthology.org/2025.emnlp-main.1249.pdf
- **ConExp / FCA tooling & attribute exploration** — Concept Explorer and successors compute concept lattices, canonical implication bases, and run semi-automatic attribute exploration (expert-in-the-loop KB completion) from object×attribute contexts. _[Handles lattices up to ~millions of concepts; mature for small/medium contexts. Niche academic tooling, not web-scale.]_ https://arxiv.org/abs/2411.06675

**Relevance to the lens engine:** BORROW (4 concrete imports): (1) PMEST's analytico-synthetic principle as the design contract for lenses — keep each lens ORTHOGONAL and have the value be in the *synthesis* (foci combined across facets), not in any single facet. This is the founder's intuition, already formalized in 1933; treat it as a constraint (lenses should be as independent as possible) rather than reinventing it. (2) FCA/RCA as the literal back-end for the 'relationship discovery' step: once agents fill many lenses, project the cross-lens output into formal contexts (object×attribute) and per-relation RCA contexts, then *compute* the concept lattice + canonical implication basis. The emergent concepts and implications ARE the 'relationships no one thought to draw' — and they come with a provenance-free, deterministic derivation that pairs perfectly with donto's evidence-anchoring and Lean-4 certification (FCA implications are exactly the shape Lean can verify). (3) DOLCE's Descriptions & Situations and Cyc's microtheories as prior art for donto's 'identity/relationship is a hypothesis under a lens' and contradiction-holding — donto should cite these as the lineage it extends, and reuse D&S's description/situation split as the modeling pattern for 'a relationship-claim viewed under lens X.' (4) Frames (FrameNet) as the per-lens schema template — each lens defines typed roles to fill, making extraction structured and relation-ready. AVOID: (a) the upper-ontology trap of forcing one universal, globally-consistent category tree — BFO/SUMO spent two decades on alignment wars; donto's paraconsistent, lens-relative stance is *the* differentiator, so do NOT collapse lenses into a single canonical ontology. (b) Classical FCA's brittleness — it requires *exact binary* incidence and is noise-sensitive and worst-case exponential in concepts; LLM-extracted attributes are noisy and graded, so use fuzzy/relaxed FCA, bounded candidate generation, and donto's hypothesis_only/contradiction-frontier to absorb noise instead of letting it explode the lattice. (c) The FrameNet/Cyc lesson that hand-curation does not scale — the whole bet must be that AGENTS fill lenses cheaply; that agentic fill is the genuinely new ingredient these classical systems lacked.

**Already done vs white space:** ALREADY DONE (the founder should not claim these as novel): (1) The idea that a small set of ORTHOGONAL lenses composes to an unbounded analytical space — Ranganathan, 1933 (PMEST). (2) That implicit concepts and relationships can be *automatically derived* from an object×attribute table — FCA, Wille 1981; and across multiple related kinds — RCA, 2013. This is the founder's core 'discovery engine' payoff, mathematically solved decades ago for clean binary data. (3) Holding mutually-contradictory claims in scoped contexts without explosion — Cyc microtheories, 1984+; locally-consistent/globally-tolerant KBs are old news. (4) Re-interpreting the same facts under many descriptions/perspectives — DOLCE D&S, ~2004; multi-perspective and aspect-oriented ontology development are established sub-fields. (5) Bridging a lexical lens to a formal lens (SUMO×WordNet) and frames-as-relational-lenses (FrameNet) — done. (6) Even the modern hint that LLM embeddings *already contain* FCA-style concept lattices — Stanford 2026. GENUINE WHITE SPACE: No prior system combines all four legs at once — (i) AGENTIC, LLM-driven population of MANY heterogeneous human-analytical lenses (philosophical/teleological/aesthetic/semiotic/ecological, far beyond PMEST's five or BFO's continuant/occurrent split), at (ii) WEB-SCALE over a (iii) PARACONSISTENT, evidence-anchored, bitemporal substrate that can hold the resulting speculative cross-lens relations FOREVER as legal state, with (iv) a verification layer (FCA-implication mining + Lean-4 certification) to promote the rare valuable hypotheses. FCA/RCA assumed clean curated contexts and tiny scale; Cyc/DOLCE assumed human knowledge engineers; FrameNet assumed manual annotation; upper ontologies assumed one consistent world. The novel claim that survives scrutiny is the *integration*: agents as the lens-fillers, paraconsistency as the holding-tank for cross-lens serendipity, and FCA/Lean as the disciplined harvester. That specific assembly appears genuinely unexplored.

**Hard problems:**
- Combinatorial blow-up: FCA concept lattices are worst-case exponential in context size, and cross-lens relationship candidates grow combinatorially with #lenses × #entities — the 'discovery' space is mostly junk, so the engine is bottlenecked on RANKING/pruning, not generation. PMEST composition and donto's hypothesis-holding both face this.
- Noise vs exactness mismatch: classical FCA/RCA need exact binary incidence and have no inherent noise tolerance, but LLM-extracted attributes are noisy, graded, and hallucination-prone — naive contexts produce garbage lattices and spurious implications. Needs fuzzy/relaxed FCA, calibration, and source-anchoring.
- Lens orthogonality is aspirational, not guaranteed: real analytical lenses (causal, teleological, ethical) overlap and interact; PMEST itself blurs Personality/Matter. Non-orthogonal lenses inflate the combinatorics and make emergent 'intersections' artifacts of redundancy rather than real discovery.
- Evaluation / ground truth: there is no benchmark for 'a valuable relationship no human thought of.' Distinguishing genuine serendipity from coincidence, restatement, or LLM confabulation is unsolved — and donto's no-authority-is-ground-truth stance makes automatic scoring even harder.
- Paraconsistency at scale: Cyc kept consistency LOCAL within microtheories with heavy engineering; donto wants to hold millions of unanchored, mutually-contradictory machine-proposed relations cheaply. Query-time lens evaluation over a contradiction frontier of that size is an open performance + semantics problem (which argument edges win under which lens, computed fast).
- Curation/verification throughput: the historical killer of Cyc/FrameNet/upper-ontologies was that human curation could not keep up. The lens engine inherits this at the back end — even if agents generate cheaply, promoting the rare valuable hypothesis (Lean certification, evidence review) is human-bottlenecked unless verification is itself largely automated.
- Cost and depth-control of agentic lens-filling: 'to the utmost of human understanding' per lens per entity is unbounded compute; deciding how deep each lens goes, and for which entities, without a payoff signal, is an unsolved economic/scheduling problem.


### semantic-decomposition-primitives

The unifying claim of this tradition is that meaning is not atomic — it decomposes into a small, recurring set of deeper, comparable components. Five major frameworks instantiate this in importantly different ways, and together they form the most direct intellectual ancestry for donto's "many lenses on every entity" vision.

(1) Wierzbicka's Natural Semantic Metalanguage (NSM) is the most radical reductionist program: roughly 65 indefinable, cross-linguistically universal "semantic primes" (I, YOU, SOMETHING, GOOD, BAD, DO, HAPPEN, KNOW, WANT, THINK, BECAUSE, IF, NOT, BEFORE, PART, KIND, LIKE...) plus a universal mini-grammar and ~50 "semantic molecules" (man, water, hands). Any concept, however culture-specific, is "explicated" as a paraphrase built only from primes, so two concepts from different cultures become directly comparable at the prime level. NSM is the purest expression of "break meaning to the utmost" — a finite alphabet of thought.

(2) Pustejovsky's Generative Lexicon (GL, 1991/1995) is the single most lens-like framework and the most architecturally relevant. Its QUALIA STRUCTURE assigns every noun FOUR modes of explanation, explicitly derived from Aristotle's four aitiai (via Moravcsik 1975): FORMAL (what kind of thing it is), CONSTITUTIVE (its parts/material — mereology), TELIC (its purpose/function — teleology), and AGENTIVE (how it came into being — origin/causation). A noun like 'book' carries formal=physical object, constitutive=pages/text, telic=read(x), agentive=write(x); 'door' carries telic=pass-through, etc. GL's generative devices — type coercion ("begin a book" coerces to "begin reading"), co-composition ("bake a cake" vs "bake a potato"), selective binding ("fast car" binds to the telic driving event) — and its dot-objects/complex types (book = PHYSICAL•INFORMATION, a single entity legitimately under two types at once) solve LOGICAL POLYSEMY without sense enumeration. This is essentially a four-lens decomposition built into the lexicon, and the dot-object is a near-exact precedent for "one entity, multiple co-present aspects."

(3) Schank's Conceptual Dependency (CD, late 1960s–70s, Yale) decomposes all event meaning into ~11 primitive ACTs (ATRANS abstract-transfer/give, PTRANS physical-transfer/go, MTRANS mental-transfer/tell, MBUILD, INGEST, EXPEL, MOVE, GRASP, PROPEL, ATTEND, SPEAK) plus conceptual cases and states, so paraphrases ("John gave Mary a book" / "Mary took a book from John") collapse to one canonical, language-independent representation enabling inference. CD scaled up into scripts/plans/goals (SAM, PAM). It is the canonical predicate-decomposition lens and the historical lesson in over-reduction.

(4) Jackendoff's Conceptual Semantics treats meaning as a level of THOUGHT (Conceptual Structure), built from a fixed ontology of categories — Event, State, Thing, Place, Path, Property, Amount — combined by functions like GO, BE, STAY, CAUSE, INCH. Crucially Jackendoff argues decomposition is the cognitive-science method itself: meanings are decomposed into primitives "as the semantic equivalents of phonological features."

(5) The modern, data-driven heirs are Universal Decompositional Semantics (UDS; White, Reisinger, Rawlins, Van Durme, 2016–2020) and Abstract Meaning Representation (AMR; Banarescu et al. 2013). UDS is the most directly transferable to donto: instead of discrete categories it annotates each predicate/argument with many SCALAR, real-valued, confidence-weighted properties across orthogonal dimensions — 18 semantic proto-role properties (volition, sentience, causation, change-of-state, grounded in Dowty 1991), genericity, factuality, time/duration, event aspect (telicity/dynamicity), 26 entity supersenses — over a single graph (PredPatt). That is precisely "many independent lenses, each a graded hypothesis, layered on one graph." AMR is the production-scale graph meaning-representation (rooted DAG over PropBank predicates, "who did what to whom," abstracting away syntax), now with strong LLM parsers (Smatch ~86) and a 52-language MASSIVE-AMR corpus — but it deliberately drops tense, number, quantifier scope, and figurative meaning.

The throughline for the founder: every one of these is, in effect, a fixed set of LENSES that turn an entity or predicate into deep, comparable atoms. GL's qualia literally are four lenses; UDS's property sheets are dozens of scalar lenses. The relationship-discovery payoff donto wants is exactly what these atoms enable: once two entities are decomposed into the same primitive vocabulary, latent cross-entity relations (shared telic purpose, shared agentive origin, matching proto-role profiles) become computable rather than guessed.

**Foundational works:**

- **Natural Semantic Metalanguage (NSM) — semantic primes & universals** — Anna Wierzbicka (with Cliff Goddard) (1972 (14 primes); 1996 Semantics: Primes and Universals; ~65 primes by 2002): All meaning reduces to ~65 indefinable, cross-linguistically universal primes + a universal mini-grammar; any concept is an 'explication' (paraphrase) in primes, making concepts from different cultures directly comparable at the atomic level. The purest 'break meaning to the utmost' program. https://en.wikipedia.org/wiki/Natural_semantic_metalanguage
- **The Generative Lexicon + QUALIA STRUCTURE (formal/constitutive/telic/agentive)** — James Pustejovsky (1991 (Computational Linguistics 17:4); 1995 book): Every noun decomposes into four Aristotelian 'modes of explanation' = four lenses (FORMAL/kind, CONSTITUTIVE/parts, TELIC/purpose, AGENTIVE/origin). Generative devices (type coercion, co-composition, selective binding) + dot-objects (book = PHYSICAL•INFORMATION) resolve logical polysemy without sense enumeration. The most lens-like and architecturally relevant precedent. https://aclanthology.org/J91-4003.pdf
- **Aristotle's four aitiai (causes) via Moravcsik's reading — the philosophical root of qualia** — Aristotle; J.M.E. Moravcsik (1975) (c.350 BCE; 1975): Material, formal, efficient (agentive), and final (telic) cause = four irreducible 'modes of explanation' for any thing. Pustejovsky's qualia are an explicit modern operationalization. This is the deepest ancestor of 'analyze every entity through purpose, origin, parts, kind.' https://en.wikipedia.org/wiki/Four_causes
- **Conceptual Dependency (CD) — primitive ACTs (ATRANS/PTRANS/MTRANS...)** — Roger Schank (1969–1977 (Stanford then Yale); scripts/plans/goals with Abelson 1977): All event meaning reduces to ~11 primitive ACTs + cases + states, giving a canonical, language-independent representation so paraphrases collapse to one form and support inference. The canonical predicate-decomposition lens — and the cautionary tale on over-reduction/coverage. https://en.wikipedia.org/wiki/Conceptual_dependency_theory
- **Conceptual Semantics — Lexical Conceptual Structure (Event/State/Thing/Place/Path/Property; GO/BE/CAUSE)** — Ray Jackendoff (1983 Semantics and Cognition; 1990 Semantic Structures; 2002): Meaning is a level of THOUGHT built from a fixed ontology of categories and functions; decomposition into primitives ('semantic equivalents of phonological features') is the very method of cognitive science. Grounds the claim that lensing entities into primitive components is scientifically principled, not arbitrary. https://en.wikipedia.org/wiki/Conceptual_semantics
- **Thematic Proto-Roles and Argument Selection (Proto-Agent / Proto-Patient)** — David Dowty (1991 (Language 67:3, 547–619)): Thematic roles are not discrete labels but CLUSTERS of independent entailment properties (volition, sentience, causation, change-of-state...). The direct conceptual seed of UDS's scalar, multi-property decomposition — i.e. role meaning is itself a multi-lens, graded profile. https://www.cs.rochester.edu/u/james/Papers/Dowty.1991.pdf
- **Frame Semantics / FrameNet — frames as situation-lenses with frame elements** — Charles J. Fillmore (FrameNet at ICSI Berkeley) (1976–1985; FrameNet from 1997): A word's meaning is only understood relative to a structured background frame (a stereotyped situation) it evokes, whose slots (frame elements) are filled by participants. Complements decomposition with a situational/relational lens; ~1200 frames + frame-to-frame relations are a ready-made lens library. https://en.wikipedia.org/wiki/FrameNet
- **Universal Decompositional Semantics (UDS) + Decomp toolkit** — Aaron Steven White, Drew Reisinger, Kyle Rawlins, Benjamin Van Durme, Elias Stengel-Eskin (2016 (EMNLP); 2019–2020 UDS1.0 + Decomp): One semantic graph (PredPatt) annotated with MANY orthogonal, real-valued, confidence-weighted property dimensions (18 proto-roles, genericity, factuality, time, event aspect, 26 entity supersenses). The closest existing realization of 'layer many graded lenses, each a hypothesis, on a single graph.' https://arxiv.org/abs/1909.13851
- **Abstract Meaning Representation (AMR) for Sembanking** — Laura Banarescu, Claire Bonial, Nathan Schneider, Martha Palmer et al. (2013 (LAW VII)): Whole-sentence meaning as a rooted directed acyclic graph over PropBank predicates ('who did what to whom'), abstracting away syntax so paraphrases share a representation. The production-scale, parseable graph format — but deliberately omits tense/number/scope/figurative meaning. https://people.cs.georgetown.edu/nschneid/p/amr.pdf
- **Undiscovered Public Knowledge / Literature-Based Discovery (the ABC model)** — Don R. Swanson (later Neil Smalheiser) (1986 (Raynaud–fish-oil); 1991 update): If literature A relates to B and B to C but no one has connected A–C, the A–C relation is latent, discoverable public knowledge. The canonical prior art for donto's exact payoff — 'relationships between entities no human thought to draw' — done over decomposed concept terms, not full lenses. https://www.journals.uchicago.edu/doi/10.1086/601720
- **Lexical Decomposition: For and Against (the Fodor critique)** — Jerry Fodor & Ernie Lepore vs. decompositionalists (Pustejovsky, Hale & Keyser) (1998 (and ongoing)): Atomists argue most lexical meaning is primitive/undecomposable and decomposition smuggles in unverifiable structure. The essential counter-argument the founder must answer: do machine-proposed primitive decompositions carry real, falsifiable content, or just rename the problem? https://www.cs.ox.ac.uk/files/240/lexdecomp.pdf

**Modern AI systems:**

- **Decomp toolkit + UDS1.0 dataset (decomp.io)** — Open Python toolkit + dataset that stores a sentence as one semantic graph with many scalar, confidence-weighted decompositional property layers (proto-roles, genericity, factuality, time, event aspect, entity type); queryable. The most concrete 'many graded lenses on one graph' system in existence. _[UDS1.0 unifies 5 decompositional annotation sets; standard benchmark for decompositional semantic parsing (Stengel-Eskin et al. 2020). Research-grade, modest adoption; not LLM-native.]_ https://github.com/decompositional-semantics-initiative/decomp
- **AMR parsers (LeakDistill / self-knowledge-distillation) + MASSIVE-AMR** — State-of-the-art sentence-to-graph meaning parsers and a 52-language, ~84k-graph multilingual corpus; the mature graph meaning-representation pipeline. _[Smatch ~84.6–86.1 on standard AMR benchmarks; GPT-4 zero-shot ~100% structural validity but lower accuracy; large multilingual corpus. Active 2024–2025 survey activity.]_ https://arxiv.org/html/2505.03229v1
- **LLM ontology-guided KG construction (EDC, ODKE+, OntoKG-style, schema-grounded extraction)** — LLM pipelines that extract triples/entities and canonicalize them against an ontology; some explicitly route extractions into 'intrinsic/rigid-sortal' vs 'relational/mixin cross-cutting' modules — a partial echo of multi-facet decomposition. _[Production maturity reported across finance/health/manufacturing in 2024–2025; surveys note most systems still LACK evidence-grounding/corroboration of triples — exactly donto's evidence-first niche.]_ https://arxiv.org/pdf/2510.20345
- **LLM hypothesis-generation / scientific-discovery agents (multi-agent debate, graph-bridging, RAG-grounded)** — Agentic systems that generate, critique, and rank novel hypotheses, sometimes bridging concepts across causal graphs or noninteractive literatures — the modern, agentic re-implementation of Swanson's LBD. _[Active 2024–2025 surveys (HKUST-KnowComp; NAACL/EMNLP 2025); biomedical drug-combination multi-agent results in iScience 2025. Evaluation of genuine novelty remains the open problem.]_ https://arxiv.org/html/2504.05496v1
- **Structured-Representation + LLM studies (SR-LLM; 'Role of Semantic Representations in the LLM Era')** — Work probing whether feeding explicit symbolic/decompositional structures (AMR, etc.) into LLMs helps or hurts reasoning. _[Mixed/negative: naive injection of structured representations into prompts can DEGRADE LLM reasoning — a direct caution for how donto should surface lenses to/from agents.]_ https://arxiv.org/html/2502.14352v1

**Relevance to the lens engine:** BORROW, concretely: (1) GL's qualia are a ready-made, defensible STARTING lens-set — donto's 6 apertures could be extended with formal/constitutive/telic/agentive, which are entity-level (your current 6 are text-extraction-level) and yield exactly the cross-entity links you want (shared telic purpose, shared agentive origin, part-of overlap). (2) GL's DOT-OBJECT (book = PHYSICAL•INFORMATION) is a near-perfect formal precedent for donto's 'identity is a hypothesis' / one entity legitimately under multiple co-present aspects — cite it; it gives your design philosophical pedigree. (3) UDS is your closest sibling and the single best model to imitate: store each lens as SCALAR, CONFIDENCE-WEIGHTED, ORTHOGONAL properties on a graph rather than discrete labels — that is exactly what a paraconsistent, hypothesis-weighted substrate wants, and it makes 'relationship at the intersection of lenses' a vector-similarity / shared-profile query. (4) NSM is the right vocabulary-design discipline: a small, comparable atom set is what makes two entities from different domains LINEABLE at all; without a shared decompositional alphabet, cross-entity discovery degrades to surface string matching. (5) Swanson/LBD is your proof-of-concept and your evaluation template (ABC model; closed vs open discovery; replicate a known discovery to validate). (6) FrameNet's ~1200 frames are a free situational-lens library. AVOID: (a) Schank's mistake — a fixed, too-small primitive set that loses coverage (CD covered only a fraction of real-event corpora); keep lenses OPEN/extensible, not a closed alphabet. (b) AMR's deliberate amnesia — it drops tense, number, scope, figurative meaning; donto must NOT collapse those, since temporal/modal/figurative differences are often where the novel relation hides (and your bitemporal + paraconsistent design is built precisely to keep them). (c) Feeding raw symbolic structure into LLMs naively (SR-LLM result: it can hurt) — let agents produce decompositions but mediate the structure carefully. (d) The Fodor trap — make each lens's output FALSIFIABLE and evidence-anchored (your byte-level evidence + Lean certification is the right answer to 'is this decomposition real content or relabeling?').

**Already done vs white space:** ALREADY DONE (do not reinvent): The four-lens-per-entity idea (GL qualia, 1991), the many-graded-lenses-on-one-graph idea (UDS), the small-universal-atom idea (NSM/CD/Jackendoff), the situation/frame lens (FrameNet), the scalar-multi-property role decomposition (Dowty→UDS), the whole-sentence graph (AMR), and — critically — the 'discover relationships no human drew across decomposed concepts' idea (Swanson's literature-based discovery, 1986, and the entire 2024–2025 LLM-hypothesis-generation field). The founder's belief that 'no one has thought to do this' is FALSE at the level of any single component; relationship-discovery-via-decomposition is a 40-year-old research program. GENUINE WHITE SPACE (the real novelty is the COMBINATION at scale, not any piece): (1) No prior system runs the FULL SPECTRUM of human analytical lenses (philosophical+ethical+aesthetic+economic+ecological+semiotic+phenomenological, far beyond linguistic) — every framework above is linguistic/lexical, narrow by design; an agentic engine that applies dozens of heterogeneous interpretive lenses is genuinely unattempted. (2) No prior decomposition substrate is PARACONSISTENT and contradiction-preserving — UDS/AMR/GL assume one correct analysis; donto can hold mutually contradictory lens-outputs as legal state forever, which is exactly right for 'speculative machine-proposed relations.' (3) No prior LBD/KG system is simultaneously evidence-first-to-the-byte AND formally certifiable (Lean) — this directly answers the Fodor critique and the surveys' #1 complaint (extracted triples lack corroboration). (4) Agentic generation of lenses at 39M-statement scale with HOLD-then-VERIFY is new: classic LBD/KG curate eagerly; donto's 'generate speculative, hold without collapsing, certify the rare valuable few' is an unexplored operating model. The honest pitch: the lenses, the atoms, and the discovery goal are all prior art; the AGENTIC + MANY-HETEROGENEOUS-LENS + PARACONSISTENT-EVIDENCE-FIRST-CERTIFIABLE substrate, at this scale, is the defensible novelty.

**Hard problems:**
- Combinatorial explosion at the intersection: N entities x M lenses x pairwise relations is astronomically large; almost all candidate cross-lens relations are spurious. The hard part is not GENERATING relations (trivial) but RANKING/pruning them — Swanson's open-discovery suffers the same 'too many B-terms' problem.
- Evaluation of novelty vs. nonsense: there is no accepted metric for 'a valuable relationship no human thought of.' Novel-and-true, novel-and-false, and trivially-true are hard to separate automatically; LLM hypothesis-generation surveys flag this as the central unsolved issue.
- The Fodor/atomism objection made operational: does a machine-proposed primitive decomposition carry real falsifiable content, or just relabel the entity? Without grounding, lenses produce confident pseudo-structure (the documented LLM failure mode of copying surface tokens as 'concepts').
- Lens vocabulary commensurability: cross-entity discovery only works if decompositions share an alphabet (NSM's whole point). Heterogeneous lenses (ethical vs economic vs phenomenological) may not produce comparable atoms, so their 'intersection' may be ill-defined.
- Coverage vs. fixed primitives (Schank's lesson): too-small a primitive set loses coverage; too-open a set loses comparability. Finding the right granularity per lens is unsolved.
- Noise and confidence calibration: scalar/weighted decompositions (UDS-style) need well-calibrated confidence or the paraconsistent store fills with low-quality contradictions; agent-generated weights are typically miscalibrated.
- LLMs degrade on injected structured representations (SR-LLM finding): naively round-tripping rich symbolic lens-structure through agents can reduce reasoning quality, so the human/agent interface to the lens graph is itself a research problem.
- Scaling certification: Lean-style verification can certify shapes/rules but cannot adjudicate the empirical truth of a discovered relationship; deciding WHICH of millions of held hypotheses to spend verification/curation effort on is an open resource-allocation problem.
- Subjective/interpretive lenses (aesthetic, ethical, phenomenological) have no ground truth — they are inherently perspectival, so 'to the utmost of human understanding' may have no convergent target, only a distribution of defensible readings.


### network-science-of-discovery

The network-science / science-of-science tradition gives the most rigorous answer to the donto founder's central, unstated question: not "how do I generate connections?" but "which generated connections are VALUABLE?" Its core, empirically-validated finding is that value lives in a specific place — at the BRIDGES between otherwise-disconnected clusters, and in the ATYPICAL recombination of distant elements — but only when that novelty is anchored in convention. This is the field's deepest result and it directly contradicts a naive reading of the founder's intuition. Volume of connections is worthless; positionally-improbable connections are everything.

The lineage runs through three nested layers. (1) Network structure: Granovetter's "strength of weak ties" (1973) showed that novel information flows across bridges (weak, non-redundant ties), not within dense clusters where everyone already knows the same things. Burt formalized this as STRUCTURAL HOLES — a gap between two clusters with non-redundant information — and his "Structural Holes and Good Ideas" (AJS 2004, the Raytheon study) demonstrated empirically that managers whose networks SPAN holes have a "vision advantage": their ideas are disproportionately rated as valuable, less likely to be dismissed. Burt's line "the creative spark on which serendipity depends is to see bridges where others see holes" is almost a literal mission statement for a lens-intersection engine. (2) Combinatorics of discovery: Weitzman's "Recombinant Growth" (1998) and Arthur's "The Nature of Technology" (2009) model innovation as recombination of existing components, with the supply of ideas effectively unbounded — the binding constraint is the R&D/evaluation effort to test combinations, not the combinations themselves. Kauffman's "adjacent possible" and the TAP equation (Cortês, Steel, Kauffman et al.) formalize how the space of possible combinations explodes (a long plateau then a hockey-stick) as each new object opens new adjacent recombinations. (3) Empirical scoring of novelty value: Uzzi, Mukherjee, Stringer & Jones, "Atypical Combinations and Scientific Impact" (Science 2013, 17.9M papers) is the keystone. They measure a paper's combinations by z-scoring every pair of co-referenced journals against a degree-preserving randomized null (how surprising is this pairing vs chance), then take the paper's MEDIAN conventionality and its 10th-percentile TAIL novelty. The hit finding: the highest-impact papers are NOT the most novel — they sit in the high-conventionality / high-tail-novelty quadrant. A bedrock of convention with a sharp intrusion of one atypical combination is 2x more likely to be a hit. Pure novelty underperforms.

The science-of-science tradition also quantifies the OPPOSITE problem the founder will hit. Foster, Rzhetsky & Evans, "Tradition and Innovation in Scientists' Research Strategies" (ASR 2015), mapped millions of biomedical claims as a network of chemical relationships and showed scientists overwhelmingly play it safe (extending known nodes) because the reward premium for risky bridging strategies, though real (higher expected impact), is insufficient to compensate for the higher chance of being ignored. Wang, Veugelers & Stephan, "Bias Against Novelty in Science" (Research Policy 2017), showed the most novel papers are SYSTEMATICALLY undervalued in short windows, suffer delayed recognition, and are cited mainly in "foreign" fields — precisely because no single evaluator holds all the lenses. This is the strongest external validation of the founder's thesis: there is a real, measurable surplus of value in cross-lens bridges that human, discipline-bounded evaluation leaves on the table. The Funk/Owen-Smith CD-index and the Park et al. (Nature 2023) "disruption is declining" work give an alternative, network-based way to score whether a connection CONSOLIDATES or DISRUPTS its neighborhood.

Crucially for donto's paraconsistent design, Chen, Ding & Evans-style work — "New Directions in Science Emerge from Disconnection and Discord" (arXiv 2103.03398) — shows that DISAGREEMENT/contradiction between clusters, not just disconnection, is the strongest predictor of where new scientific directions emerge. Bridges that span a structural hole AND carry discord are disproportionately generative. This is the empirical warrant for holding contradictions as legal state rather than collapsing them: a contradiction frontier IS a map of where novel directions are most likely.

The throughline for a discovery-scoring engine: a discovered relationship should be scored not by plausibility alone but by (a) the network DISTANCE/improbability of the entities it bridges (structural-hole span, z-score atypicality), (b) the CONVENTIONALITY of its surrounding scaffold (Uzzi: anchor the leap in known ground), and (c) the presence of unresolved DISCORD across the bridge. Score for surprise-given-grounding, not for either alone.

**Foundational works:**

- **The Strength of Weak Ties** — Mark Granovetter (1973): Novel information flows across weak, bridging ties between clusters — not within dense clusters where knowledge is already redundant. The bridge, not the hub, carries novelty. First formal claim that valuable connections are positional. https://www.cs.cmu.edu/~jure/pub/papers/granovetter73ties.pdf
- **Structural Holes / Brokerage; 'Structural Holes and Good Ideas' (AJS 2004, Raytheon study)** — Ronald S. Burt (1992 / 2004): A 'structural hole' is a gap between clusters with non-redundant information; people whose networks SPAN holes have a 'vision advantage' — their ideas are disproportionately judged valuable, less often dismissed. 'The creative spark on which serendipity depends is to see bridges where others see holes.' This is the empirical core: value is at the bridge, and it's measurable. http://www.ronaldsburt.com/research/files/SHGI.pdf
- **Atypical Combinations and Scientific Impact (Science, 17.9M papers)** — Uzzi, Mukherjee, Stringer, Jones (2013): The keystone scoring method. z-score every co-referenced journal pair vs a degree-preserving randomized null; take a paper's MEDIAN conventionality + 10th-percentile TAIL novelty. Hits are 2x more likely in the HIGH-conventionality + HIGH-tail-novelty quadrant. Pure novelty underperforms — anchor the atypical leap in conventional ground. The exact recipe for scoring a discovered relationship for VALUE not just plausibility. https://www.science.org/doi/10.1126/science.1240474
- **Tradition and Innovation in Scientists' Research Strategies (ASR)** — Foster, Rzhetsky, Evans (2015): Mapped millions of biomedical claims as a network of chemical relations; scientists overwhelmingly choose conservative (extend-known-node) strategies. Risky bridging strategies have higher expected impact but the premium is too small to offset the elevated chance of being ignored — so the search space is systematically under-explored. The market gap donto could exploit: machines bear the risk humans rationally avoid. https://arxiv.org/abs/1302.6906
- **Recombinant Growth + The Nature of Technology (combinatorial models of innovation)** — Martin Weitzman; W. Brian Arthur (1998 / 2009): Innovation = recombination of existing components. The supply of possible combinations is effectively unbounded; the binding constraint is the R&D/EVALUATION effort to test them, not generating them. Tells the founder: generation is cheap (true for LLM lenses too), so the engine's whole value is the evaluation/triage filter, not the combinatorial firehose. https://mattsclancy.com/wp-content/uploads/2023/01/Recombinant-Growth.pdf
- **The 'adjacent possible' & the TAP equation** — Stuart Kauffman; Cortês, Steel, Herriot et al. (2000 / 2022): Each realized combination opens new ADJACENT combinations; the space grows as a long plateau then an explosive hockey-stick. Explains why a many-lens decomposition keeps yielding new bridges as it runs (each new entity/lens-fact expands the adjacent possible) — but also why uncontrolled expansion combinatorially explodes and must be bounded. https://arxiv.org/abs/2204.14115
- **New Directions in Science Emerge from Disconnection and Discord** — Chen, Ding, Evans et al. (2021): New scientific directions emerge most where clusters are both DISCONNECTED (structural hole) AND in DISCORD (contradictory). Disagreement is generative, not noise. Direct empirical warrant for donto's paraconsistent, contradiction-preserving design: the contradiction frontier is a map of where novel relationships are most likely to pay off. https://arxiv.org/pdf/2103.03398
- **Bias Against Novelty in Science** — Wang, Veugelers, Stephan (2017): The most novel (highest-atypicality) papers are systematically undervalued in short windows, show delayed recognition, and are cited mostly in 'foreign' fields. Because no single discipline-bounded evaluator holds all lenses, real cross-lens value is left on the table — the strongest external validation of the founder's 'no one holds all the lenses at once' intuition. https://www.nber.org/papers/w22180
- **Conceptual Blending / Combinational Creativity** — Fauconnier & Turner; Margaret Boden; Arthur Koestler (bisociation) (2002 / 1990 / 1964): The cognitive-science account of HOW novelty arises from combining distant mental spaces (blending), and Boden's taxonomy (combinational / exploratory / transformational). Gives a vocabulary for what a lens-intersection actually produces and why cross-domain blends feel creative — the micro-mechanism beneath the macro network finding. https://en.wikipedia.org/wiki/Conceptual_blending

**Modern AI systems:**

- **SciAgents (MIT, Buehler lab)** — Multi-agent system that builds a large ontological knowledge graph (~33K nodes / 49K edges from ~1000 papers), then samples RANDOM (not shortest) paths between two distant/random concept nodes to seed a hypothesis; specialized agents (Ontologist, Scientist_1/2, Critic) expand the path into a structured proposal (hypothesis, mechanism, novelty, unexpected properties) and score novelty/feasibility against Semantic Scholar. This is the closest existing system to donto's lens-intersection vision. _[Published in Advanced Materials (2024/2025); generated genuinely cross-domain bioinspired-materials hypotheses (e.g. silk + structural-coloration). Random-path sampling explicitly chosen to 'infuse the path with a richer array of concepts' — empirical confirmation that bridging distant nodes beats shortest-path.]_ https://arxiv.org/abs/2409.05556
- **Accelerating science with human-aware AI (Sourati & Evans)** — Builds a hypergraph of materials, properties, and the researchers who study them; 'human-aware' random walks model not just what's logically possible but what's COGNITIVELY REACHABLE by the human expert crowd — then deliberately 'avoids the crowd' to surface valuable 'alien' hypotheses far from human reach. The single most relevant value-scoring idea for donto. _[Nature Human Behaviour 2023. Human-aware models improved prediction of which discoveries will actually be made by UP TO 400% over content-only models, especially in sparse literature; the inverse mode generates promising hypotheses 'unlikely to be imagined until the distant future.']_ https://pubmed.ncbi.nlm.nih.gov/37443269/
- **Literature-Based Discovery (Swanson ABC) and modern KG link-prediction descendants** — Swanson's 1986 fish-oil/Raynaud's discovery: A-B and B-C links in disjoint literatures imply an untested A-C. Modern versions do temporal link-prediction / graph-embedding over biomedical KGs (e.g. AGATHA, SemMedDB-based systems, the active-curriculum temporal-graph LBD work) to propose A-C edges that bridge disconnected literatures. _[Swanson's original hypotheses (fish oil/Raynaud's, magnesium/migraine) were later clinically validated. Modern LBD is an active field; link-prediction over biomedical KGs is the canonical 'find the missing bridge edge' formulation that donto's hypothesis_only edges resemble.]_ https://link.springer.com/article/10.1007/s10462-024-10885-1
- **Mat2vec — unsupervised word embeddings capture latent knowledge (Tshitoyan et al.)** — Word2vec over 3.3M materials-science abstracts; the embedding geometry encoded undiscovered structure — recommending thermoelectric materials YEARS before they were reported in the literature. Proof that latent cross-document relationships exist and are extractable without supervision. _[Nature 2019. Predicted several materials later experimentally confirmed as thermoelectrics; demonstrated relationships 'lay dormant' in the literature, recoverable by geometry — the empirical existence proof for donto's latent-structure thesis.]_ https://www.nature.com/articles/s41586-019-1335-8
- **Analogy Mining / cross-domain bridging (Hope, Chan, Kittur, Shahaf)** — Learns purpose-vs-mechanism representations from patents/product descriptions so a problem in one domain can be matched to a structurally-analogous solution in a DISTANT domain — operationalizing structural-hole brokerage as a retrieval problem. _[KDD 2017 Best Paper; follow-up PNAS 2019 'Scaling up analogical innovation with crowds and AI' showed analogies surfaced by the system led humans to generate more creative solutions. Directly validates the value of bridging distant domains.]_ https://arxiv.org/abs/1706.05585
- **AI co-scientist (Google DeepMind) / Robin / agentic discovery survey** — General multi-agent systems (generate / debate / rank / evolve hypotheses) for end-to-end hypothesis generation, design, and analysis. Tournament-style ranking among competing hypotheses is the relevant pattern for triaging donto's many machine-proposed relationships. _[Co-Scientist (Gemini 2.0) reported novel hypotheses in drug repurposing/AMR later validated in wet-lab; Robin reportedly cut a discovery cycle from ~900 human-hours to under 2. Caveat: a 2026 critique ('Agentic AI Scientists Are Not Built For Autonomous Scientific Discovery') argues current agents over-produce plausible-but-unvalidated hypotheses.]_ https://deepmind.google/blog/co-scientist-a-multi-agent-ai-partner-to-accelerate-research/
- **Hypothesis-evaluation benchmarks (TruthHypo/TruthEval, ScholarEval, ProjectionBench)** — 2024-2025 systems that grade generated hypotheses on truthfulness, soundness (have analogous methods worked before?), novelty, and contribution — combining literature retrieval + KG retrieval to filter the firehose. _[Active 2025 research; consistent finding that LLMs generate MORE NOVEL but LESS VALID hypotheses than humans — the exact triage problem donto must solve. Directly relevant to building a value/validity filter atop a high-volume generator.]_ https://www.ijcai.org/proceedings/2025/0873.pdf

**Relevance to the lens engine:** This area is donto's scoring layer — it tells the engine how to RANK the relationships its many-lens decomposition proposes. BORROW: (1) Uzzi's exact recipe — for any discovered relationship, compute a z-score atypicality against a degree-preserving randomized null over the entity graph, then favor relationships that pair HIGH conventional scaffolding with a HIGH-novelty TAIL (a single surprising bridge anchored in known ground), not maximal novelty. This converts 'plausible' into 'valuable.' (2) Burt's structural-hole span — score a proposed edge by how many non-redundant clusters it connects and how large the hole it bridges; brokerage betweenness over the entity graph is a directly computable value signal, and donto already has the quad graph to compute it. (3) Sourati-Evans 'avoid the crowd' — model which relationships are already cognitively reachable (densely co-occurring, low surprise) and DOWN-weight them; up-weight the 'alien' bridges far from existing co-mention, which is where the unrecovered surplus value sits. (4) Chen/Ding/Evans discord+disconnection — donto's contradiction frontier is not a bug to resolve but a PRIORITY MAP: rank candidate relationships highest where they bridge disconnected clusters that also carry argument-edge discord (supports/rebuts). donto's paraconsistent substrate is uniquely able to hold and exploit this signal where a consistency-enforcing store would have destroyed it. (5) Weitzman/Arthur/Kauffman combinatorics — internalize that generation is cheap and unbounded; the engine's entire moat is the triage filter, and the adjacent-possible explosion means you MUST bound exploration (sample paths, cap fan-out) or drown. AVOID: (a) optimizing for raw novelty or raw volume — Wang/Veugelers/Stephan and Uzzi both show pure novelty is low-value and even penalized; (b) shortest-path / nearest-neighbor relationship discovery — SciAgents found random/distant paths strictly better for creativity; (c) treating LLM-rated plausibility as value — the 2025 benchmarks show LLMs over-produce plausible-invalid hypotheses, so plausibility must be a gate, never the ranking. Net: donto should ship a 'brokerage + atypicality + discord' composite score as the lens it applies at query time to triage machine-proposed hypothesis_only edges, and use Lean-4 certification only on the thin top slice that survives.

**Already done vs white space:** ALREADY DONE (the founder should not reinvent): (1) The CORE THESIS that valuable connections live at bridges/atypical combinations is not a hunch — it is one of the most replicated results in social science (Granovetter→Burt→Uzzi→Foster/Evans, across millions of papers). (2) The exact MATH to score a connection's value-improbability already exists and is open (Uzzi z-score atypicality, Burt brokerage/effective-size, CD/disruption index, Novelpy package). (3) MANY-LENS / cross-domain GRAPH TRAVERSAL to generate bridging hypotheses is a shipped product category — SciAgents (random-path graph reasoning), Sourati-Evans (human-aware walks), analogy mining, LBD link-prediction, AI co-scientist all do 'find the bridge no one drew.' (4) The empirical proof that latent cross-document relationships exist and are recoverable (mat2vec) is settled. GENUINE WHITE SPACE — the defensible combination: (a) PERSISTENT, PARACONSISTENT HOLDING of speculative relationships as first-class legal state. Every system above generates hypotheses transiently and either validates-or-discards them; NONE holds a durable, contradiction-preserving, evidence-anchored frontier of millions of unresolved machine-proposed edges that can be re-queried, re-scored, and accreted over time as new lenses/entities arrive. donto's bitemporal contradiction store turns one-shot generation into a compounding asset. (b) SCALE + GENERALITY: the discovery systems are domain-locked (materials, biomedicine); donto is a general 39.5M-statement substrate, so it can compute brokerage/atypicality across domains that have never been jointly indexed — exactly the foreign-field surplus Wang/Veugelers/Stephan showed is undervalued. (c) EVIDENCE-ANCHORING + LEAN CERTIFICATION of the survivors: no discovery system byte-anchors every claim AND offers a formal certification overlay, which is precisely what closes the 'plausible-but-invalid' gap the 2025 benchmarks expose. (d) IDENTITY-AS-HYPOTHESIS: discovery in these systems assumes fixed entities; donto's queryable-merge-under-a-lens means the SAME substrate can discover relationships under different identity resolutions — a genuinely unexplored degree of freedom. So 'no one has thought of the many-lens bridge idea' is FALSE; 'no one has built a persistent, paraconsistent, evidence-first, cross-domain substrate that holds and compounds the firehose and then certifies the survivors' is essentially TRUE and is the real moat.

**Hard problems:**
- The plausible-vs-valuable gap: LLM/agentic generators reliably produce MORE NOVEL but LESS VALID outputs (2025 benchmarks). Plausibility scoring is necessary but never sufficient; value requires the network/atypicality signal AND grounding, and even then most candidates are false. donto's filter, not its generator, is the whole ballgame.
- Combinatorial explosion / triage at scale: Weitzman/Kauffman guarantee the candidate space grows super-linearly (TAP hockey-stick). Across 39.5M statements x many lenses, the number of proposable bridges is astronomically larger than anything you can verify. You need cheap, computable pre-filters (brokerage, z-score) before any LLM/Lean touch, and a principled exploration budget.
- Defining the right null model for atypicality: Uzzi's z-score depends entirely on a degree-preserving randomized null. Over a heterogeneous, bitemporal, paraconsistent quad graph (not a clean co-citation network), what is the correct null? A wrong null makes every cross-context edge look 'atypical' and floods the frontier with junk surprise.
- The novelty-impact paradox / delayed recognition: Wang-Veugelers-Stephan show the most valuable novel connections are precisely the ones that look worthless on short horizons and only pay off later in foreign fields. Any greedy value score will systematically discard the highest-value bridges. You need scoring that tolerates delayed/foreign validation — hard to operationalize without ground-truth feedback loops.
- Evaluation without ground truth: there is no oracle for 'is this discovered relationship true/valuable?' Science-of-science uses future citations as a noisy proxy; donto has no equivalent. Bootstrapping a value signal without circular reliance on the same LLMs that generated the candidates is unsolved.
- Distinguishing generative discord from mere error: Chen/Ding/Evans show contradiction is generative — but most contradictions in an auto-extracted substrate are extraction noise, coreference failures, or stale facts, not productive scientific discord. Separating the valuable contradiction frontier from the garbage contradiction frontier is an open, donto-specific problem.
- Brokerage is computed over the WRONG graph if extraction is biased: structural-hole and weak-tie measures assume the absence of an edge means genuine disconnection. In an LLM-extracted graph, a missing edge often just means the extractor didn't read those two sources together — so a 'structural hole' may be an artifact of coverage, not a real bridge opportunity. Coverage bias contaminates the core value signal.
- Combinatorial creativity does not equal correctness: conceptual-blending/bisociation explains why cross-lens blends FEEL novel, but the cognitive-science tradition has no account of which blends are true. Borrowing the generative mechanism without a validity gate reproduces the hallucination problem at scale.


### multi-perspective-agentic-reasoning

The founder's vision — decompose any entity through the full spectrum of human analytical lenses (philosophical, temporal, causal, mereological, teleological, ethical, semiotic, etc.) and harvest the RELATIONSHIPS that emerge at the INTERSECTION of lenses — sits at the confluence of a deep philosophical lineage and a very active 2023-2026 AI research front. The intellectual root is PERSPECTIVISM (Nietzsche: knowledge is irreducibly perspectival, and crucially his *methodological* perspectivism — "the more affects we allow to speak about a thing, the more complete will be our concept of it"; Ortega y Gasset; Wittgenstein's aspect-seeing). The engineering root is Minsky's "Society of Mind" (1986): intelligence as the emergent product of many simple, specialized, non-intelligent agents. The discovery root is Don Swanson's Literature-Based Discovery (1986, fish-oil/Raynaud's): valuable knowledge already exists latently as UNCONNECTED public facts across disciplinary silos (the A-B-C model), and the payoff is connecting them — which is almost exactly the founder's "relationships no human thought to draw because no human holds all the lenses." Conceptual Blending Theory (Fauconnier & Turner) supplies the cognitive mechanism for why cross-frame combination is generative rather than merely additive.

The modern AI realization is the multi-agent / multi-perspective LLM literature. Du, Li, Tenenbaum & Mordatch's "multiagent debate" (2023, ICML 2024) showed multiple LLM instances proposing and critiquing over rounds improves factuality and math/strategic reasoning — explicitly framed as a "society of minds." Tree-of-Thoughts (Yao et al. 2023) and Graph-of-Thoughts (Besta et al. 2023/AAAI 2024) generalize single-chain reasoning to branched/graph search with self-evaluation, lookahead, backtracking, and — in GoT — *synergistic recombination of intermediate thoughts*, the structural analog of intersecting lenses. Solo-Performance-Prompting (Wang et al., NAACL 2024) is the most direct precursor to the founder's "many lenses on one object": a single LLM dynamically identifies and simulates multiple task-relevant PERSONAS ("cognitive synergy"), and critically finds that DYNAMICALLY-IDENTIFIED, fine-grained personas ("Film Expert") beat fixed generic ones ("Expert") — e.g. 79% vs 38% on Codenames — though synergy only EMERGES at GPT-4-level capability. Mixture-of-Agents (Wang et al. 2024) layers proposer LLMs whose outputs are aggregated, beating GPT-4-Omni on AlpacaEval. CAMEL (Li et al., NeurIPS 2023) and AutoGen operationalize role-based agent ensembles as infrastructure.

The single most important system for this vision is Google DeepMind's AI co-scientist (Gomes et al., arXiv 2502.18864, Feb 2025; Nature 2026). It is a near-literal instantiation of the generate→hold-many→curate-the-valuable pipeline the founder describes, with named specialized agents: a Generation agent proposes hypotheses; a PROXIMITY agent clusters them *specifically so the system does not collapse into a single line of thinking* (the anti-redundancy mechanism); a Reflection agent acts as virtual peer reviewer scoring novelty/correctness/rigor; a Ranking agent runs an Elo "idea tournament" of simulated debates; an Evolution agent recombines and refines top hypotheses; a meta-review agent feeds back. It produced *experimentally validated* novel findings (AML drug-repurposing candidates with in-vitro tumor inhibition; novel epigenetic liver-fibrosis targets validated in human organoids; in-silico rediscovery of an unpublished gene-transfer mechanism). This is concrete evidence that agentic multi-perspective generation-plus-curation yields genuinely novel, valuable relationships — not just redundancy.

On the founder's central empirical question — does diverse decomposition produce EMERGENT INSIGHT or just REDUNDANCY? — the literature gives a sharp, honest answer: BOTH, and which one you get is a design problem, not a guarantee. The rigorous theory is the bias-variance-DIVERSITY decomposition (Wood, Mu, Brown et al., JMLR 2023): an ensemble's expected error = average bias + average variance − DIVERSITY, where diversity is precisely member DISAGREEMENT. Diversity is provably valuable, BUT only when members are individually competent (if "experts disagree very frequently they are individually poor estimators"). The cautionary 2024-2026 evidence is strong: "Talk Isn't Always Cheap" (2509.05396) shows debate frequently DEGRADES accuracy — agents flip from correct to incorrect under social/peer pressure (conformity dominates truth-seeking), weak agents contaminate strong ones, and accuracy can fall over rounds. "Representational Collapse in Multi-Agent LLM Committees" (2604.03809) measured that 3 same-model agents under different role prompts had mean cosine similarity 0.888 and effective rank 2.17/3 — i.e. nominal "diversity" via persona prompts can be largely ILLUSORY. The "tyranny of the majority" / echo-chamber effect is documented repeatedly. The constructive response is diversity-AWARE design: diversity-aware message retention (2603.20640), structured disagreement analysis for uncertainty (DiscoUQ 2603.20975), and the co-scientist's Proximity-agent clustering — all aimed at PRESERVING genuine divergence instead of letting it collapse.

**Foundational works:**

- **Perspectivism (esp. Nietzsche's methodological perspectivism)** — Friedrich Nietzsche (also Ortega y Gasset, Karl Jaspers; Wittgenstein on aspect-seeing) (1887 (On the Genealogy of Morality III §12)): Knowledge is irreducibly perspectival; and crucially the PRESCRIPTIVE/methodological claim that engaging MORE affects and viewpoints on a thing yields a more complete concept of it — the explicit philosophical warrant for 'the more lenses, the fuller the understanding.' https://en.wikipedia.org/wiki/Perspectivism
- **The Society of Mind** — Marvin Minsky (1986): Intelligence is an emergent property of a large collection of simple, specialized, individually non-intelligent agents — the canonical blueprint for decomposing cognition into many narrow lenses/agents whose interaction produces understanding. https://en.wikipedia.org/wiki/Society_of_Mind
- **Literature-Based Discovery & 'undiscovered public knowledge' (the A-B-C model; fish-oil/Raynaud's)** — Don R. Swanson (1986): Valuable relationships already exist LATENTLY as facts that are individually published but never CONNECTED across disciplinary silos; discovery = joining A-B and B-C literatures to surface an unseen A-C link. This is precisely the founder's 'relationships no one thought to draw.' https://pmc.ncbi.nlm.nih.gov/articles/PMC7924697/
- **Conceptual Blending (Conceptual Integration) Theory** — Gilles Fauconnier & Mark Turner (2002 (The Way We Think)): Novel meaning arises when two distinct mental 'input spaces' (frames/lenses) are selectively projected into a blended space with emergent structure not present in either input — the cognitive mechanism for why combining lenses is generative, not merely additive. https://arxiv.org/pdf/2505.10948
- **Bias-Variance-Diversity decomposition (A Unified Theory of Diversity in Ensemble Learning)** — Danny Wood, Tingting Mu, Gavin Brown et al. (2023 (JMLR)): Formal proof that ensemble error = avg bias + avg variance − DIVERSITY, where diversity IS member disagreement; diversity is provably beneficial but only when members are individually competent. The mathematical answer to 'diverse insight vs. redundancy.' https://jmlr.org/papers/volume24/23-0041/23-0041.pdf

**Modern AI systems:**

- **Multiagent Debate ('society of minds')** — Multiple LLM instances independently propose answers, then read each other's reasoning and revise over several rounds toward consensus; improves factuality and math/strategic reasoning. The seminal modern multi-perspective-reasoning result. _[Du/Li/Tenenbaum/Mordatch, ICML 2024; widely replicated; significant gains on GSM8K/strategic reasoning and reduced hallucination vs single-pass.]_ https://arxiv.org/abs/2305.14325
- **Tree of Thoughts (ToT) / Graph of Thoughts (GoT)** — Generalize chain-of-thought to branched (tree) or arbitrary-graph search over reasoning units with self-evaluation, lookahead, backtracking; GoT adds synergistic RECOMBINATION and feedback of intermediate 'thoughts' — the structural analog of combining outputs across lenses. _[ToT (Yao et al. 2023) Game-of-24 success 4%→74%; GoT (Besta et al., AAAI 2024) improves quality and cuts cost vs ToT on sorting/set tasks.]_ https://arxiv.org/pdf/2401.14295
- **Solo Performance Prompting (SPP) — multi-persona self-collaboration / 'cognitive synergy'** — A SINGLE LLM dynamically identifies and simulates multiple task-relevant personas that collaborate — the closest existing precursor to 'many analytical lenses on one object then combine.' _[Wang et al., NAACL 2024. Dynamic fine-grained personas beat fixed generic ones (Codenames 79% vs 38%); reduces hallucination; BUT synergy EMERGES only at GPT-4-level capability (absent in GPT-3.5/Llama2-13b).]_ https://aclanthology.org/2024.naacl-long.15/
- **AI co-scientist (Gemini-based multi-agent)** — Generate→debate→evolve hypothesis engine with named specialized agents: Generation, Proximity (clusters to PREVENT collapse to one line of thought), Reflection (peer-review for novelty/rigor), Ranking (Elo 'idea tournament'), Evolution (recombine/refine), Meta-review. The most complete instantiation of generate-many / hold-many / curate-the-valuable. _[DeepMind, arXiv Feb 2025, Nature 2026. EXPERIMENTALLY VALIDATED novel results: AML drug-repurposing (in-vitro tumor inhibition), novel epigenetic liver-fibrosis targets (validated in human organoids), in-silico rediscovery of an unpublished gene-transfer mechanism.]_ https://arxiv.org/abs/2502.18864
- **Mixture-of-Agents (MoA)** — Layered architecture where multiple proposer LLMs' outputs are fed to aggregator LLMs that synthesize a better answer; exploits 'collaborativeness' — aggregation improves output even when individual auxiliary responses are weaker. _[Wang et al. 2024 (ICLR 2025). Open-source MoA 65.1% vs GPT-4-Omni 57.5% on AlpacaEval 2.0.]_ https://arxiv.org/abs/2406.04692
- **CAMEL / AutoGen (role-based agent infrastructure)** — Frameworks for orchestrating ensembles of role/persona-specialized agents that converse autonomously (CAMEL: inception prompting + role-playing; AutoGen: programmable multi-agent conversation with critics/supervisors/tools) — the plumbing for instantiating N lenses as agents. _[CAMEL (Li et al., NeurIPS 2023) and Microsoft AutoGen are widely adopted open-source multi-agent libraries underpinning much of the 2024-2026 multi-agent work.]_ https://arxiv.org/abs/2303.17760
- **Multi-agent KG construction (CooperKGC / KARMA / multi-view RE)** — Teams of specialized agents (NER, relation extraction, event extraction; entity-/concept-/mention-view inference) that collaboratively build knowledge graphs and reconcile/enrich triples with conflict resolution — directly relevant to lens-decomposition feeding a substrate. _[CooperKGC (arXiv 2312.03022) shows varied-expertise agents improve KGC; KARMA (2025) adds multi-agent enrichment with conflict resolution; multi-view RE improves relation extraction F1.]_ https://arxiv.org/pdf/2312.03022
- **Debate/committee FAILURE-MODE & collapse studies** — Empirical work documenting when multi-perspective debate HURTS: conformity (correct→incorrect flips under peer pressure), weak-agent contamination, accuracy decay over rounds, and 'representational collapse' where same-model persona agents are near-identical in embedding space. _['Talk Isn't Always Cheap' (2509.05396): debate degrades accuracy in many configs. 'Representational Collapse' (2604.03809): 3 same-model agents cosine 0.888, effective rank 2.17/3 — persona 'diversity' often illusory.]_ https://arxiv.org/html/2509.05396v2

**Relevance to the lens engine:** BORROW: (1) The co-scientist topology is the proven recipe for donto's lens engine — a Generation phase (run N lens-agents over an entity/text) feeding a PROXIMITY/clustering step (essential: without it you get redundancy collapse, not emergent relationships), then Reflection/Ranking via an Elo idea-tournament to curate the rare valuable cross-lens relationships, then Evolution to recombine survivors. donto's paraconsistent substrate is the ideal place to HOLD the generated-but-unranked tournament population that co-scientist keeps only in-memory. (2) SPP's strongest, most actionable lesson: DYNAMIC, task-specific lenses beat a fixed generic list — so rather than hard-coding the same 6/N philosophical lenses every time, let an agent pick the fine-grained lenses an entity actually rewards (a treaty rewards 'legal/temporal/diplomatic-game-theory'; a poem rewards 'prosodic/semiotic/phenomenological'). (3) GoT's recombination-of-thoughts is the literal mechanism for 'relationships at the intersection of lenses' — model lens-outputs as graph vertices and explicitly generate edges BETWEEN them; do not just concatenate per-lens fact lists. (4) Swanson LBD + conceptual blending are the right framing for the PAYOFF metric: a valuable output is an A-C relationship surfaced because lens-A and lens-C share a B-term — instrument for that, not for raw fact count. AVOID / GUARD AGAINST: (a) Redundancy/representational collapse — same base model under N persona prompts gives ~rank-2 'diversity' (cosine 0.888); donto must measure semantic diversity of lens outputs (effective rank / pairwise distance) and discount near-duplicates, or genuinely vary models/temperature/tools per lens. (b) Conformity & 'tyranny of the majority' — do NOT make lenses debate to consensus; donto's paraconsistent design is a STRENGTH here precisely because it can preserve minority/contradictory lens-claims as legal state (hypothesis_only, supports/rebuts/undercuts edges) instead of collapsing them — this is donto's genuine differentiator over every debate-to-consensus system. (c) Weak-agent contamination — a low-quality lens degrades the pool; gate lens-outputs by an individual-competence check (bias-variance-diversity theory: diversity only helps among competent members). (d) Cost/emergence floor — SPP shows synergy only emerges at frontier capability; budget for strong models on the generation lenses or you'll get redundancy, not insight.

**Already done vs white space:** ALREADY DONE (the founder should NOT assume 'no one has thought of this'): The core loop — run many specialized perspectives/agents over a problem, hold a population of candidate hypotheses, debate/rank/evolve them, and surface validated novel ones — is fully built and peer-reviewed in the AI co-scientist (Nature 2026), with WET-LAB-validated novel discoveries. 'Many personas/lenses on one object then combine' is done at the prompt level by SPP (NAACL 2024) and at the architecture level by MoA, CAMEL, AutoGen, multiagent debate, and ToT/GoT. The conceptual claim that latent cross-silo relationships are the prize is 40 years old (Swanson LBD) and being actively LLM-ified (Elicit, SKiM, the 2024 MDPI LBD work, hypothesis-generation surveys arXiv 2504.05496). 'Many critical lenses over one text' is standard literary pedagogy and is being studied for LLMs (arXiv 2507.11582). Multi-view/multi-agent KG construction with conflict resolution (CooperKGC, KARMA) overlaps donto's extraction layer. GENUINE WHITE SPACE (donto's defensible novelty is the COMBINATION, not any single piece): (1) PERSISTENCE & SCALE — every system above generates-and-discards within a single session/query; NONE durably HOLDS the full speculative cross-lens relationship population as queryable, bitemporal, evidence-anchored legal state across millions of entities. donto can keep the 99% of machine-proposed relationships that co-scientist throws away, forever, for later re-evaluation as lenses/evidence improve. (2) PARACONSISTENT CO-EXISTENCE — every debate/committee system is consensus-seeking and thus actively destroys the minority and contradictory readings; donto's contradiction-preserving substrate with typed argument edges (supports/rebuts/undercuts) and identity-as-hypothesis is, as far as the literature shows, UNIQUE as a place to let mutually-contradictory cross-lens relationship-claims coexist without collapse. (3) CROSS-ENTITY × CROSS-LENS at substrate scale — the systems above run many lenses over ONE object; the founder's distinctive move is harvesting relationships at the intersection of lenses ACROSS millions of entities simultaneously (a global LBD over a 39M-statement graph). That global, always-on, lens-indexed serendipity surface does not exist in the literature. (4) FORMAL CERTIFICATION of the curated survivors — pairing speculative generation with a Lean-4 overlay that can CERTIFY the rare valuable relationship's shape/rule is genuinely unexplored (co-scientist validates in wet labs / Elo, not by formal proof).

**Hard problems:**
- EVALUATION / GROUND TRUTH: a 'relationship no human ever thought of' has no label set; you cannot measure precision/recall on serendipity. The field's only proxies are Elo idea-tournaments (co-scientist), human-rated helpfulness (cross-domain analogy work, median 4/5), and downstream wet-lab/empirical validation — all expensive, slow, and not applicable to most of donto's domains (genealogy, etc.).
- DIVERSITY-VS-REDUNDANCY (the founder's own central worry): same-model-under-N-prompts collapses to ~rank-2 'diversity' (cosine 0.888, effective rank 2.17/3); naive N-lens decomposition will produce mostly redundant facts unless semantic diversity is actively measured and enforced (model/tool/temperature variation per lens), per bias-variance-diversity theory.
- COMBINATORIAL EXPLOSION at the intersection: relationships at the intersection of L lenses over E entities scale ~L^2 × E^2 candidate pairs; finding the rare valuable few is a needle-in-haystack ranking/pruning problem — co-scientist needs heavy test-time compute and an Elo tournament just for one problem, let alone a 39M-statement global sweep.
- NOISE & PLAUSIBLE-NONSENSE: LLMs readily generate confident, fluent, FALSE cross-domain connections (hallucinated analogies/links); without per-lens competence gating and evidence-anchoring, the substrate fills with seductive noise that is costly to refute.
- CONFORMITY / COLLAPSE in any debate-to-consensus step: agents flip correct→incorrect under peer pressure and the majority tyrannizes minorities — so the very mechanism used to 'combine' lenses can destroy the divergent signal that creates value (the reason donto should hold, not collapse).
- EMERGENCE CAPABILITY FLOOR & COST: cognitive synergy only appears at frontier-model capability (SPP: GPT-4 yes, GPT-3.5/Llama2 no); running N strong-model lenses over millions of entities is economically heavy, and weaker lenses actively contaminate the pool rather than diversify it.
- CURATION / TRUST: deciding WHICH of millions of held speculative relationships to promote toward 'verified' is an open human-in-the-loop + provenance + argument-evaluation problem; paraconsistency keeps everything alive but defers (does not solve) the question of what to actually believe.


### serendipity-novelty-evaluation

This field exists to answer the donto founder's make-or-break question directly: when a machine proposes a vast number of novel relationships, how do you tell a profound connection from pareidolia? Three research traditions converge on it, and all three have already discovered the same hard truth.

(1) **Computational serendipity in recommender systems** is the most mature. The field's consensus decomposition (Kotkov, Wang & Veijalainen 2016 survey; Murakami 2008; Ge, Delgado-Battenfeld & Jannach 2010; Adamopoulos & Tuzhilin 2014) is that serendipity = relevant AND novel AND unexpected/surprising, where each component is operationalized separately. The standard trick for *unexpectedness* is the "primitive prediction model" (Murakami/Ge): a recommendation is unexpected iff it would NOT have been produced by an obvious baseline — Runexp = R \ PM(u). Serendipity score SRDP then multiplies unexpectedness by usefulness (relevance/rating). Adamopoulos & Tuzhilin formalize unexpectedness as *distance from a set of expectations E* (items the user/system already takes for granted), explicitly separating it from novelty (unknown) and diversity (intra-list dissimilarity). The crucial, sobering lesson from this tradition (Kotkov et al., "The Dark Matter of Serendipity," CHIIR 2024): serendipity is fundamentally a *subjective, experienced* event, yet ~all systems measure only *afforded/observable* serendipity via objective proxies — so the metrics are biased and most genuinely serendipitous hits are invisible to them. There is no clean offline ground truth for "valuable surprise."

(2) **Surprise as a formal quantity.** Itti & Baldi's Bayesian Surprise (NIPS 2006 / Vision Research 2009) is the canonical operational definition: surprise = KL divergence between an observer's PRIOR and POSTERIOR beliefs after seeing data, D_KL(posterior‖prior). It is provably distinct from Shannon information/rarity (a rare-but-belief-irrelevant event has high Shannon surprisal but zero Bayesian surprise). Empirically it is "the strongest known attractor of human attention" (~72–84% of gaze shifts go to above-average-surprise locations). This has been ported to recommenders (Kim et al., "Topic-Level Bayesian Surprise and Serendipity," RecSys 2023) by tracking KL divergence between a user's prior and posterior topic distributions. Bayesian surprise is the most principled, substrate-friendly metric available for the lens engine: it is exactly "how much does this relationship change the model's beliefs."

(3) **Literature-Based Discovery (LBD)** — Swanson's 1986 Raynaud's/fish-oil and migraine/magnesium discoveries via the ABC model (A relates to B, B relates to C, A↔C unknown → hypothesize A–C) — is the closest historical analog to "relationships no human drew because no one held all the lenses." Critically, LBD has spent 30 years grappling with exactly the founder's evaluation problem and has NOT solved it. Two evaluation regimes exist, both flawed: *replication* (rediscover Swanson's 2–3 known cases — cherry-picked, no statistical power) and *time-slicing* (Yetisgen-Yildiz & Pratt 2009: pick cutoff year t, treat post-t co-occurrences of A–C absent before t as "discoveries," compute precision/recall/F/AUC/MAP/MRR). Sebastian/Moreau (Bioinformatics 2023, "addressing the subpar evaluation methodology") shows time-slicing is "too noisy": the gold standard is dominated by meaningless co-occurrences (Ebolavirus + Professional Burnout), the true-discovery fraction is "unknown and likely low," so the metric rewards co-occurrence *prediction*, not insight. There is no agreed benchmark, no shared task, no formal definition of "a discovery."

The unifying finding across all three traditions, plus computational creativity (Boden's new-surprising-valuable; Ritchie's novelty/quality/typicality; Lamb et al.'s 2019 survey of evaluation methods — CAT/Amabile, Colton's tripod, Jordanous's SPECS/components) and modern LLM-idea studies (Si, Yang & Hashimoto 2024 — 100+ reviewers found LLM ideas MORE novel but LESS feasible/valid; TruthHypo 2025 — explicit novelty↔validity tradeoff, high hallucination): **novelty is cheap and mechanizable; value is expensive and resists automation.** Generation is solved; discrimination is not. At scale this collides with the statistics of multiple comparisons / false discovery rate: an engine that proposes millions of cross-lens links is running millions of implicit hypothesis tests, so the EXPECTED number of spurious-but-surprising connections is enormous (apophenia by construction). Without FDR control, calibration, or downstream validation, "a connection no human ever drew" and "a connection no human ever drew because it's noise" are indistinguishable.

**Foundational works:**

- **Bayesian Surprise (D_KL posterior‖prior)** — Laurent Itti & Pierre Baldi (2006/2009): The only axiomatically-consistent definition of surprise: how much an observation moves beliefs (KL divergence prior→posterior), provably distinct from Shannon rarity. A belief-relative, model-internal surprise metric — exactly what a lens engine needs to score 'this relationship changes what the substrate believed.' http://ilab.usc.edu/publications/doc/Itti_Baldi06nips.pdf
- **Literature-Based Discovery & the ABC model** — Don R. Swanson (1986-1988): Hidden A–C relationships found via shared intermediary B that no single human saw because the literatures were disjoint (Raynaud–fish oil). The direct historical precedent for cross-lens relationship discovery — and a 30-year warning that evaluation, not generation, is the bottleneck. https://en.wikipedia.org/wiki/Literature-based_discovery
- **A Survey of Serendipity in Recommender Systems** — Denis Kotkov, Shuaiqiang Wang, Jari Veijalainen (2016): Canonical decomposition: serendipity = relevant AND novel AND unexpected; each component formalized separately. Establishes that you must measure relevance/novelty/unexpectedness/value as distinct axes, not collapse them into one 'interestingness' score. https://www.sciencedirect.com/science/article/abs/pii/S0950705116302763
- **Unexpectedness via a 'primitive prediction model' + serendipity score SRDP** — Tomoko Murakami et al.; Mouzhi Ge, Carla Delgado-Battenfeld, Dietmar Jannach (2008 / 2010): Operational unexpectedness = items NOT produced by an obvious baseline (Runexp = R \ PM(u)); SRDP = unexpectedness × usefulness. Directly portable: score a donto relationship as surprising iff a cheap heuristic (co-mention, embedding similarity, single-lens inference) would NOT have produced it. https://link.springer.com/chapter/10.1007/978-3-540-78197-4_5
- **On Unexpectedness in Recommender Systems (distance-from-expectations)** — Panagiotis Adamopoulos & Alexander Tuzhilin (2014): Formalizes unexpectedness as distance from a set of expectations E (the already-taken-for-granted), cleanly separating it from novelty and diversity, and combines it with utility. Gives the lens engine a principled 'expectation set' to measure surprise against. https://dl.acm.org/doi/pdf/10.1145/2559952
- **Serendipitous Information Retrieval (mechanisms to engineer serendipity)** — Elaine G. Toms (2000): Four mechanisms to provoke serendipity: blind chance (random node), the Pasteur 'prepared mind' (user/context profile), anomalies via deliberately POOR similarity, and reasoning by analogy. The 'poor similarity' and 'analogy' mechanisms are the design DNA of a cross-lens discovery engine. https://www.ercim.eu/publication/ws-proceedings/DelNoe01/3_Toms.pdf
- **Creativity = new, surprising, valuable; combinational/exploratory/transformational** — Margaret Boden (formalized by Geraint Wiggins; Graeme Ritchie's criteria) (1990-2007): Boden's three criteria are the evaluation rubric the whole field reuses; combinational creativity (novel combination of familiar ideas) is precisely 'relationships at the intersection of lenses.' Ritchie reframes value as novelty × quality × typicality — surprise alone is not enough. https://en.wikipedia.org/wiki/Computational_creativity
- **Time-sliced evaluation of LBD (gold-standard via future co-occurrence)** — Meliha Yetisgen-Yildiz & Wanda Pratt (2009): The standard retrospective-rediscovery protocol: cutoff year t, post-t-but-not-pre-t links are 'discoveries,' score with precision/recall/F. The best objective evaluation method that exists — and the paper that exposes why it is still inadequate (most gold links are noise). https://faculty.washington.edu/melihay/publications/LBDChapter2009.pdf
- **False Discovery Rate / multiple-comparisons control** — Yoav Benjamini & Yosef Hochberg (1995): When you test millions of relationships, the expected count of spurious 'significant' findings is huge; BH-style FDR control bounds the expected proportion of false positives among accepted discoveries. The mathematically inescapable governor on any high-volume relationship-discovery engine. https://en.wikipedia.org/wiki/False_discovery_rate

**Modern AI systems:**

- **SciAgents** — Multi-agent (Ontologist/Scientist/Critic) system that samples RANDOM paths between distant concepts in a 33K-node ontological knowledge graph and reasons over the path to propose hypotheses; explicitly uses random (not shortest) paths to maximize cross-domain surprise; a 'novelty assistant' agent scores novelty/feasibility via Semantic Scholar lookups (e.g. 8/7). _[Ghafarollahi & Buehler, MIT; arXiv Sep 2024, published Advanced Materials Dec 2024. Generated novel bio-inspired materials hypotheses. NOTE: authors admit NO systematic ranking across hypotheses is implemented — filtering 'remains future work.' The single most architecturally-aligned system to the donto vision.]_ https://pmc.ncbi.nlm.nih.gov/articles/PMC12138853/
- **Robin (FutureHouse)** — End-to-end multi-agent discovery: literature synthesis → hypothesis generation → experimental data analysis → refinement, with downstream wet-lab validation as the evaluation gate. _[Identified ripasudil as a novel candidate for dry AMD; experimentally validated in patient-derived RPE cells; ~2.5 months end-to-end. Demonstrates the only fully convincing answer to the precision problem: downstream empirical validation, not a metric.]_ https://www.futurehouse.org/research-announcements/demonstrating-end-to-end-scientific-discovery-with-robin-a-multi-agent-system
- **The AI Scientist (Sakana)** — Fully automated research loop (ideate → experiment → write → review) including an automated LLM REVIEWER that scores generated papers. _[Automated reviewer hit ~69% balanced accuracy / F1 exceeding NeurIPS-2021 inter-human agreement — but independent eval (arXiv 2502.14297) found mixed/overstated quality. Evidence that LLM-as-judge can approximate human filtering for value, with caveats.]_ https://arxiv.org/abs/2408.06292
- **Si–Yang–Hashimoto LLM research-ideation study** — Large-scale blind human evaluation (100+ NLP researchers) comparing LLM-generated vs human research ideas on novelty/excitement/feasibility/effectiveness. _[LLM ideas judged statistically MORE novel (p<0.05) but slightly LESS feasible; found LLM self-evaluation fails and generation lacks diversity. The cleanest empirical proof of the novelty-cheap / value-hard asymmetry.]_ https://arxiv.org/abs/2409.04109
- **TruthHypo / hallucination-aware hypothesis evaluation** — Benchmark separating novelty from truthfulness/grounding in LLM-generated scientific hypotheses, validating against PubMed. _[Documents an explicit novelty↔validity tradeoff: more creative outputs correlate with higher hallucination; a substantial fraction of generated hypotheses are invalidated against the literature. Quantifies the 'most generated links are noise' problem for LLMs.]_ https://arxiv.org/pdf/2505.14599
- **Topic-Level Bayesian Surprise for Recommenders** — Applies Itti-Baldi Bayesian surprise (KL divergence between prior and posterior topic distributions) as the serendipity signal in a recommender, balanced against a relevance term. _[RecSys-era work showing Bayesian surprise outperforms pure-novelty and pure-relevance baselines at finding unexpected-yet-valuable items. Demonstrates the surprise-vs-relevance balancing the lens engine will need.]_ https://arxiv.org/pdf/2308.06368
- **Conformal abstention / LLM-as-judge + HITL triage** — Methods to filter generated outputs by confidence: conformal-prediction abstention with theoretical hallucination-rate guarantees; self-consistency as a confidence proxy; human-on-the-loop triage of only high-risk/high-value items. _[Conformal abstention (arXiv 2405.01563) gives provable bounds on accepted-hallucination rate; KG-validation-with-HITL (IPM 2025) shows hybrid pipelines where discarded links become negative training examples. The practical toolkit for 'find the rare gold.']_ https://arxiv.org/pdf/2405.01563

**Relevance to the lens engine:** BORROW: (1) The decomposition discipline — never score a relationship with one number. Carry relevance, novelty (unknown-ness), unexpectedness (distance from an expectation set E), and value as SEPARATE axes (Kotkov/Adamopoulos-Tuzhilin). donto's bitemporal + evidence-first design already lets you compute novelty cheaply (is this triple absent from the substrate?) and unexpectedness via the 'primitive prediction model' trick (Murakami/Ge): a cross-lens link is surprising iff a cheap single-lens baseline would NOT have produced it — flag exactly the links that survive that subtraction. (2) Bayesian surprise (Itti-Baldi) is the ideal native scorer: D_KL between the substrate's belief BEFORE and AFTER admitting a hypothesis edge measures 'how much does this relationship change what donto believes' — belief-relative, not mere rarity, and it composes naturally with paraconsistency (a contradiction-inducing edge is maximally surprising). (3) SciAgents is the proof-of-concept of your exact mechanism — random paths between distant nodes to manufacture cross-domain surprise — so adopt its agent topology (Ontologist→Scientist→Critic) but FIX its admitted gap: it has no cross-hypothesis ranking/filtering. (4) Robin is the north star for the verification end: the only unambiguous serendipity metric is downstream validation; design donto's 'rare valuable' curation tier around external grounding (the evidence-anchor-to-source-byte and Lean-4 certification overlays are precisely the right substrate for this). (5) Toms' 'poor similarity' and 'analogy' mechanisms and Boden's combinational creativity legitimize the intersection-of-lenses thesis intellectually. AVOID: (1) Treating volume as the goal — every tradition shows novelty is cheap and the bottleneck is value-discrimination; a million unanchored links is the disease, not the cure. Hold them as hypothesis_only (donto already does this) but never surface them un-triaged. (2) Believing offline metrics certify value — time-sliced LBD evaluation rewards co-occurrence prediction, not insight; the 'dark matter' critique shows objective serendipity proxies miss most real serendipity. Treat any automatic 'interestingness' score as a triage filter, never a verdict. (3) Ignoring multiple comparisons — at your scale FDR is not optional; an engine proposing millions of links manufactures apophenia by construction. Make the false-discovery budget an explicit, tunable parameter and recycle rejected links as negatives (HITL-KG pattern). (4) LLM self-evaluation as the final gate — Si et al. and TruthHypo show models over-rate their own novel-but-invalid outputs.

**Already done vs white space:** ALREADY DONE (do not reinvent): (a) The conceptual decomposition of serendipity into relevance/novelty/unexpectedness/value, with formal metrics for each (Kotkov, Murakami, Ge, Adamopoulos-Tuzhilin). (b) A principled, belief-relative surprise metric (Itti-Baldi Bayesian surprise) and its recommender port. (c) The exact generative mechanism the founder describes — sampling random paths between distant concepts in a knowledge graph to surface 'connections no one drew' — is implemented and published (SciAgents). (d) Retrospective time-sliced evaluation of discovery systems (Yetisgen-Yildiz & Pratt) and the multiple-comparisons/FDR machinery. (e) End-to-end agentic discovery with real wet-lab validation (Robin) and large human studies of AI-idea novelty (Si et al.). So 'agents that decompose and propose cross-domain links and score their novelty' is NOT white space; it is a crowded, ~40-year lineage (LBD) plus a 2024-2026 agentic wave. GENUINE WHITE SPACE — the defensible combination: (1) MANY LENSES SIMULTANEOUSLY AS THE GENERATIVE SUBSTRATE. Every prior system uses ONE representation (a citation graph, one ontology, topic vectors). Nobody systematically decomposes each entity through the full spectrum of analytical lenses (mereological, teleological, semiotic, phenomenological, ethical, ecological...) and then mines the INTERSECTIONS across lenses for relationships. The lens-cross-product as the search space is novel. (2) A PARACONSISTENT, CONTRADICTION-PRESERVING HOLDING TANK. Every prior system must commit, prune, or collapse contradictory hypotheses; donto can legally hold mutually-contradictory machine-proposed relationships forever as hypothesis_only with typed supports/rebuts/undercuts argument edges and a contradiction frontier. This dissolves the field's worst constraint: you no longer must decide value at generation time — you can accumulate speculative links and let evidence/curation arrive asynchronously. No serendipity, LBD, or creativity system has this. (3) IDENTITY-AS-HYPOTHESIS + EVIDENCE-ANCHORING + LEAN-CERTIFICATION as the verification pipeline that the entire field is MISSING (it is the unsolved 'value' problem). The white space is not generating relationships — it is the principled architecture for HOLDING millions speculatively and VERIFYING the rare valuable ones with byte-level evidence and machine-checkable proof. That triage/verification layer over a many-lens generator is unexplored.

**Hard problems:**
- No ground truth for 'valuable surprise.' Value is subjective, experienced, and context/time-dependent (Kotkov 'Dark Matter'); offline proxies measure afforded not experienced serendipity. You cannot certify profundity with a metric — only triage with one.
- The precision/pareidolia problem at scale. A many-lens cross-product generates a combinatorial explosion of candidate links; multiple-comparisons statistics guarantee a huge expected count of spurious-but-surprising connections. Distinguishing a profound link from apophenia requires FDR control AND external grounding that automatic surprise scores cannot provide.
- Novelty↔validity tradeoff. Maximizing surprise/novelty mechanically increases hallucination and invalidity (TruthHypo; Si et al. found LLM ideas more novel but less feasible). The most surprising link is disproportionately likely to be wrong.
- Surprise vs. relevance/value disentanglement. Bayesian surprise rewards ANY belief-shifting observation, including errors and noise — a maximally surprising edge may be maximally wrong. You must combine surprise with a value/grounding term, and there is no consensus weighting.
- Evaluation methodology itself is unsolved. LBD's two regimes (replication = no statistical power; time-slicing = rewards co-occurrence prediction not insight) are both inadequate; there is no shared benchmark, no formal definition of 'a discovery,' no community standard (Sebastian/Moreau 2023).
- LLM self-evaluation is unreliable. Models over-rate their own novel-but-invalid outputs (Si et al.; AI Scientist critiques), so the cheap automated critic cannot be the final gate; some human or empirical validation is unavoidable, which caps throughput.
- Defining and bounding the 'expectation set' E. Unexpectedness needs a reference model of what is already expected (Adamopoulos-Tuzhilin); for an open many-lens substrate this set is enormous and ill-defined, making 'unexpected' hard to compute consistently.
- The combinatorics of lenses. The number of entity×lens×entity×lens intersections grows explosively; you need a principled sampling/prioritization strategy (SciAgents uses random paths but admits no ranking) or the engine drowns in its own output before any triage.