# The Lens Engine: Discovery at the Intersection of Many Apertures

*A research report on agentic polyperspectival decomposition, paraconsistent holding, and the architecture of machine serendipity — grounded in a 10-area literature study and four adversarial critiques.*

---

## 1. The vision, crystallized

Strip the founder's idea down to its load-bearing claim and it is not about volume at all. It is not "a million facts about a thing." The premise is **depth of decomposition through many lenses, in service of relationship discovery**:

> Use AI agents to break any entity, text, or event down through the *full spectrum of human analytical lenses* — philosophical, linguistic, temporal, causal, mereological (part/whole), teleological (purpose), ethical, aesthetic, economic, ecological, semiotic, phenomenological, social, material — each pushed toward the utmost of human understanding. The payoff is **not** the facts inside any one lens. It is the **relationships between entities that emerge at the *intersection* of lenses** — connections no human ever drew, because no human holds all the lenses at once over all the entities.

Two things make this precise enough to engineer. First, the *unit of value is a relationship*, not a fact — specifically a relationship that becomes visible only when entity A's decomposition under lens X is laid alongside entity B's decomposition under lens Y. Second, the *generative mechanism is the cross-product*: many entities × many deep lenses, mined at the seams. The fact-extraction is just scaffolding; the gold is in the joins.

I will call this system **the lens engine**, and the broader research program **polyperspectival decomposition**. Both names matter. "Lens" keeps faith with the founder's own metaphor and with a 90-year tradition (Ranganathan's facets, Pustejovsky's qualia, Nietzsche's "affects") of treating an analytical viewpoint as an aperture that reveals one structure and suppresses others. "Polyperspectival" insists on the plurality and on the fact that the perspectives are *incommensurable by design* — that is where the value lives. A third name worth holding in reserve is **bisociative substrate**, because, as we will see, Koestler already named the core move in 1964 and donto's paraconsistent store is the first place that move can be *persisted* rather than performed once and discarded.

The honest reframing the founder has already half-made is the correct one: this is a *serendipity / latent-structure / discovery engine*, and its success metric is **novel-and-verified-and-valuable relationships per unit of curation cost**, never fact count. Everything below is in service of that single metric — and of being brutally clear about where it has been pursued before, where it breaks, and what is genuinely unclaimed.

---

## 2. You are standing on giants

The founder's instinct that "no one has thought to do this" is, at the level of the *idea*, false — and that is the most useful gift this report can offer, because it means the hard-won lessons of seven distinct research traditions are available for free. The components are old. Let me map the lineage honestly, tradition by tradition, naming what each *already achieved*.

### 2.1 Literature-Based Discovery — the direct 40-year-old ancestor

The single most direct ancestor is **Don R. Swanson's Literature-Based Discovery (LBD)**. His 1986 concept of *undiscovered public knowledge* is almost verbatim the founder's thesis: knowledge logically implied by the union of two existing literatures, unknown only because no one read both. Swanson formalized it as the **ABC model** — if literature reports A→B (Raynaud's disease ↔ elevated blood viscosity) and a *separate, non-co-citing* literature reports B→C (fish oil ↓ blood viscosity), then a testable A→C link (fish oil treats Raynaud's) is latent in the disjoint corpora. He published the fish-oil/Raynaud's hypothesis in 1986; a 1989 trial confirmed it. His 1988 "Migraine and Magnesium: Eleven Neglected Connections" found *convergent multi-path* support — eleven independent indirect links — for magnesium-deficiency→migraine, later clinically supported.

These are existence proofs that the method finds real, non-obvious, testable discoveries from text alone. The field then split discovery into the two modes that map exactly onto the founder's use-cases: **open discovery** (start from A, fan out to find unexpected C's — "serendipity mode") and **closed discovery** (given a suspected A and C, find the B-paths that explain it — "verification mode"). Swanson & Smalheiser's **Arrowsmith**, Hristovski's BITOLA, Petric's RaJoLink, and Cambridge's **LION LBD** (2019, ~27M PubMed abstracts) built ranking machinery — Jaccard, normalized PMI, symmetric conditional probability, chi-squared, log-likelihood. **SemMedDB / Semantic MEDLINE** (Kilicoglu & Rindflesch, NLM) replaced bag-of-words with ~130M *typed* predications (Drug-X TREATS Disease-Y), a direct precursor to donto's quad structure (NLM deprecated it 31 Dec 2024).

What LBD learned, and donto must inherit: **ranking is the entire game.** LION honestly reports that true targets often sit at rank 56–299 (closed) or rank 15–120,000 (open) — the signal is real but buried. The field's only non-gameable evaluation is **time-slicing** (train on pre-year-Y literature, test whether top-ranked links were actually published after Y). And the field is in a *reproducibility crisis*: the 2025 "Make LBD Great Again Through Reproducible Pipelines" movement (SKiM and others) exists precisely because results swing wildly with corpus, preprocessing, and split choices. Crucially, a 2023 *Bioinformatics* critique (Sebastian/Moreau) showed time-slicing's gold standard is "dominated by meaningless co-occurrences (Ebolavirus + Professional Burnout)" — the true-discovery fraction is "unknown and likely low." After four decades, the validated trophy case is still essentially Swanson's two hand-curated findings, *found by a careful human reasoner, not by combinatorial enumeration.* That is a warning written in the founder's own handwriting.

### 2.2 Bisociation and conceptual blending — the cognitive mechanism, already computationalized

The founder's "connections no one thought of, because no one holds all the lenses at once" is, almost word for word, **Arthur Koestler's bisociation** (*The Act of Creation*, 1964): every creative act — the comic Haha, the scientific Aha, the artistic Ah — is perceiving one situation in two self-consistent but *habitually incompatible* matrices (M1, M2) at once. Ordinary thought is *association* (within one plane); creativity is *bisociation* (the collision of two). "The value is the relation that springs from intersecting two frames" is the founder's thesis, stated sixty years ago.

And it was *operationalized*. The EU FP7 **BISON** project (2008–2012; Berthold, Lavrač, Mladenić, Borgelt; Springer LNCS 7250, 2012) built **BisoNets** (heterogeneous evidence graphs) and three formal bisociation types — bridging *concepts*, bridging by structural *similarity* (analogy), bridging by *graphs* (connecting subgraphs) — and tried to make "bisociativeness" a rankable score. Its running tool, **CrossBee** (Juršič, Cestnik, Urbančič, Lavrač, ICCC 2012, crossbee.ijs.si), takes two document sets and ranks candidate bridging b-terms by an *ensemble of heuristics*, then lets a human verify. That is donto's intended workflow — over-generate cross-context links, curate the rare valuable ones — already built, and already stuck on the same wall: too many candidates, the expert is the bottleneck.

Running alongside is **Fauconnier & Turner's Conceptual Blending** (*The Way We Think*, 2002): two input mental spaces project selectively into a *blend* via a generic space, and the blend develops **emergent structure** present in neither input. This is the precise theoretical name for "a relationship that emerges only at the intersection of lenses." It too was made algorithmic — Goguen's category-theoretic Unified Concept Theory, Pereira's **Divago**, and the EU **COINVENT** project (Schorlemmer, Kutz, Confalonieri, Pease) modeling blends as *amalgams* / colimits. Their hard-won lesson: the bottleneck is **selection/optimality among a combinatorial explosion of possible blends**, not generation. And **Margaret Boden's** taxonomy (*The Creative Mind*, 1990) gives the founder a roadmap: cross-lens links are mostly *combinational* creativity; pushing a lens to its limit is *exploratory*; a lens that rewrites another's assumptions is *transformational*.

### 2.3 Analogy and structure-mapping — a lens comparison *is* an analogy

Comparing two entities under a lens is, formally, an analogy. **Dedre Gentner's Structure-Mapping Theory** (1983) says an analogy maps a *system of relations* from base to target, governed by the **systematicity principle**: prefer deep, interconnected higher-order (causal/mathematical) structure over surface attributes. The **Structure-Mapping Engine** (Falkenhainer, Forbus & Gentner, 1989) returns not just correspondences and a score but **candidate inferences** — *novel claims projected from base onto target.* That candidate-inference step is the literal generator of "a relationship no one thought to draw."

**Douglas Hofstadter's** FARG tradition (*Fluid Concepts and Creative Analogies*, 1995; *Surfaces and Essences*, 2013) makes the radical claim that *all* thinking is analogy-making, and that **representation is constructed under pressure, not given** — Copycat's Slipnet/Workspace/codelets/temperature is the cognitive analogue of donto's identity-as-hypothesis. **Holyoak & Thagard's** multiconstraint theory (ACME/ARCS) adds that analogy is governed by *soft, purpose-weighted* constraints, not strict isomorphism.

This tradition scaled. **Hope, Chan, Kittur & Shahaf's** "Accelerating Innovation Through Analogy Mining" (KDD 2017, Best Paper) learned *purpose* and *mechanism* embeddings so analogies could be mined from messy patent corpora; **SOLVENT** (2018) extended it to scientific papers (background/purpose/mechanism/findings). This is the closest existing relative of donto's vision at the document level — but it uses *one coarse facet schema* (purpose × mechanism), not a full spectrum, and holds no contradictions. The 2023–2026 LLM wave reopened it: Webb, Holyoak & Lu (*Nature Human Behaviour* 2023) found "emergent analogical reasoning" in GPT-4; Lewis & Mitchell (2024) showed it *collapses* on counterfactual (permuted-alphabet) variants where humans stay robust — evidence that LLM analogies are often training-data echoes, not structural mapping. **YARN** (2026) splits the difference with the template donto should steal: *LLM-abstraction-then-structural-alignment* beats both pure LLM (0.46 vs 0.41) and pure SME (which scores *below random*, 0.17) on far analogies.

### 2.4 Faceted classification and Formal Concept Analysis — the lens idea, and the emergence engine

**S. R. Ranganathan's Colon Classification** (1933) and its **PMEST** facets — Personality, Matter, Energy, Space, Time — were the first *analytico-synthetic* scheme: decompose a subject into orthogonal facets, then *synthesize* compound subjects no one enumerated in advance. The deep claim — *a small set of orthogonal facets composes to an unbounded space* — is the founder's combinatorial generativity, stated in 1933. PMEST is the historical ancestor of modern faceted search (Marti Hearst's **Flamenco**).

The emergence engine is **Rudolf Wille's Formal Concept Analysis** (1981; Ganter & Wille 1999). From a binary object × attribute table, a Galois connection yields **formal concepts** (maximal extent/intent pairs) ordered into a **concept lattice** — emergent concepts the analyst never named — plus a canonical basis of **attribute implications** (A→B). *This is literally a machine that surfaces latent concepts and rules from a matrix.* Its multi-relational extension, **Relational Concept Analysis** (Rouane-Hacène, Huchard, Napoli, Valtchev, 2013), iterates over several object sorts linked by relations — structurally what cross-entity, cross-lens discovery aims at. Bo Xiong's 2026 **Lattice Representation Hypothesis** even argues LLM embeddings *already encode* FCA-style concept lattices (71–83% F1 recovering concept-attribute relations). The brutal caveat: classical FCA needs *exact binary* incidence, is noise-sensitive, and is worst-case exponential.

### 2.5 Semantic decomposition — meaning is not atomic

The deepest formal warrant that "decompose every entity through fixed lenses" is principled rather than arbitrary comes from semantic-decomposition primitives. **Wierzbicka's Natural Semantic Metalanguage** reduces all meaning to ~65 universal *primes* — so two concepts from different cultures become comparable at the prime level (the discipline of a *shared decompositional alphabet*, without which cross-entity discovery degrades to string matching). **Pustejovsky's Generative Lexicon** (1991/1995) is the most lens-like precedent: its **qualia structure** assigns every noun four Aristotelian *modes of explanation* — **formal** (kind), **constitutive** (parts — mereology), **telic** (purpose — teleology), **agentive** (origin — causation). A *book* is formal=physical-object, constitutive=pages/text, telic=read(x), agentive=write(x). And its **dot-object** (book = PHYSICAL•INFORMATION — one entity legitimately under two co-present types) is a near-exact precedent for donto's "identity is a hypothesis." Schank's Conceptual Dependency, Jackendoff's Conceptual Semantics, Dowty's proto-roles, and the modern, graded **Universal Decompositional Semantics** (White et al., 2016–2020 — many *scalar, confidence-weighted, orthogonal* property layers on one graph) all teach the same throughline: *once two entities are decomposed into a shared primitive vocabulary, latent cross-entity relations (shared telic purpose, matching proto-role profiles) become computable rather than guessed.* The standing objection — **Fodor & Lepore's atomism** ("decomposition smuggles in unverifiable structure") — is the one the founder must answer: *does a machine-proposed decomposition carry falsifiable content, or just rename the problem?*

### 2.6 The science of *value* — structural holes, atypical combinations, discord

This is the tradition the founder most needs and least expects, because it answers the question *which* generated connections are valuable — and it contradicts a naive reading of the vision. **Granovetter's "Strength of Weak Ties"** (1973): novel information flows across *bridges*, not within dense clusters. **Burt's structural holes** (1992; "Structural Holes and Good Ideas," AJS 2004, the Raytheon study): people whose networks span holes have a measurable *vision advantage*; "the creative spark on which serendipity depends is to see bridges where others see holes." **Uzzi, Mukherjee, Stringer & Jones, "Atypical Combinations and Scientific Impact"** (*Science* 2013, 17.9M papers) is the keystone scoring result: the highest-impact papers are **not** the most novel — they sit in the *high-conventionality + high-tail-novelty* quadrant. **A bedrock of convention with one sharp atypical intrusion is 2× more likely to be a hit. Pure novelty underperforms.**

**Foster, Rzhetsky & Evans** (ASR 2015) showed scientists rationally *avoid* risky bridges (the impact premium doesn't offset the chance of being ignored) — *the market gap a machine can exploit, bearing risk humans won't.* **Wang, Veugelers & Stephan** (2017) showed the most novel papers are systematically undervalued short-term and cited mostly in "foreign" fields — the strongest external validation of "no single evaluator holds all the lenses." And **Chen, Ding & Evans** ("New Directions Emerge from Disconnection and Discord," 2021) found new directions emerge where clusters are both *disconnected* and *in discord* — **direct empirical warrant that a contradiction frontier is a map of where novelty pays off**, not noise to be resolved. **Sourati & Evans** (*Nature Human Behaviour* 2023) built "human-aware" random walks that *avoid the crowd* to surface "alien" hypotheses — improving prediction of real discoveries by up to 400%.

### 2.7 Modern agentic discovery — the headline is already shipped

Finally, the systems that already do most of the founder's pitch. **SciAgents** (Ghafarollahi & Buehler, MIT; *Advanced Materials* 2025) samples *random* (not shortest) paths through a 33K-node ontological KG and runs an Ontologist → Scientist → Critic team to surface "hidden interdisciplinary relationships previously considered unrelated" — the single most architecturally aligned prior system. **Google DeepMind's AI Co-Scientist** (*Nature* 2026) runs Generation / Reflection / Ranking / Evolution / **Proximity** (clusters specifically *to prevent collapse into one line of thought*) / Meta-review agents with an Elo tournament of simulated debates, and produced **wet-lab-validated** results: an AML / liver-fibrosis drug-repurposing candidate (~91% of a scarring response blocked at Stanford), and an independently re-derived antimicrobial-resistance mechanism that had taken an Imperial lab years. **Robin** (FutureHouse) ran end-to-end to a validated dry-AMD candidate (ripasudil) in ~2.5 months. **mat2vec** (Tshitoyan et al., *Nature* 2019) proved latent future relationships are *already encoded* in a corpus — word2vec over 3.3M abstracts recommended thermoelectric materials years before discovery, with no symbolic lenses at all.

Here is the lineage in one table:

| Tradition | Key figures / systems | What it already achieved | What it did *not* do |
|---|---|---|---|
| Literature-Based Discovery | Swanson 1986; Arrowsmith; LION; SKiM | Cross-literature A→C discovery, clinically validated; open/closed modes; time-sliced eval | Many lenses; persist rejected candidates; identity-as-hypothesis |
| Bisociation / Blending | Koestler 1964; BISON/CrossBee; Fauconnier-Turner; Boden; COINVENT | Named the core move; built rankable cross-domain bridging; emergent-structure theory | Scale; full lens spectrum; paraconsistent hold |
| Analogy / structure-mapping | Gentner SME; Hofstadter; Hope/Shahaf/SOLVENT; YARN | Candidate-inference generation; facet-embedding mining at scale; LLM-abstract-then-align | >1 facet schema; contradiction-holding; evidence anchoring |
| Faceted classification + FCA | Ranganathan 1933; Wille 1981; RCA 2013 | Orthogonal-facet generativity; *automatic* emergent concepts + implications from a matrix | Noise tolerance; web scale; agentic fill |
| Semantic decomposition | Wierzbicka NSM; Pustejovsky GL/qualia; UDS | Per-entity lens decomposition (4 qualia); graded multi-property layers; dot-objects | Full spectrum beyond linguistics; paraconsistency |
| Science of value | Granovetter; Burt; Uzzi; Foster/Evans; Chen/Ding/Evans | *Computable* value signals: brokerage, z-score atypicality, discord-as-generative | A substrate to apply them to machine-proposed edges |
| Agentic discovery | SciAgents; AI Co-Scientist; Robin; mat2vec | Generate-debate-rank-evolve with **wet-lab-validated** novel discoveries | Persist the firehose; hold contradictions; formal certification |

The gift is plain: **the discovery idea, the bisociation move, the facet idea, the value math, and even the agentic generate-rank-evolve loop are all prior art.** "No one has thought of this" is false at the level of every component. Which is exactly why the genuine novelty must be located precisely.

---

## 3. What is genuinely new here (the white space)

The four adversarial critiques converge — all at confidence ~0.6–0.72, all verdict **partially-holds** — on the same answer. The novelty is **not** any single ingredient and **not** the discovery itself. It is a specific, defensible *combination*:

> **Agentic, many-deep-lens generation × paraconsistent, evidence-anchored, bitemporal HOLDING of the entire speculative (and mutually-contradictory) relationship frontier × a formal/empirical VERIFICATION ladder that promotes the rare valuable survivors — operated at substrate scale.**

Let me be exact about each leg, and honest about what the critiques refuted.

**The defensible core: generate-HOLD-verify, where every prior system can only generate-and-discard.** This is the strongest, most consistently affirmed point across all four critiques. SciAgents generates and discards into prose. LBD/analogy engines rank candidates and throw away the rest. AI Co-Scientist keeps its tournament population only in memory for one run. Cyc *had* the contradiction-tolerant store (microtheories) but no agents and famously collapsed under manual curation. **donto is the first place where the 99.9% of machine-proposed relationships that everyone else discards can persist** — as `hypothesis_only` edges, with typed argument edges (`supports`/`rebuts`/`undercuts`), evidence-anchored to source bytes, on a bitemporal store, re-rankable forever as the corpus and lenses evolve. The science-of-value tradition supplies the killer argument *for* this: Chen/Ding/Evans showed **discord is generative** — so a consistency-enforcing store would *destroy the very signal that predicts new directions.* donto's contradiction frontier is not a tolerated defect; it is a *priority map*. This is genuine white space, and it is the moat.

**The qualified leg: paraconsistency + identity-as-hypothesis is the *right home*, but today the moat is mostly schema, not populated reality.** Critique #3 is the sharpest and most useful. It verified the substrate is real and at scale (39,496,776 current statements, genuinely bitemporal), and that the design beats a "vector DB + reranker" *in principle* for the stated job — an ANN index has no native typed edge, no first-class "this claim attacks that claim," no provenance-on-the-assertion, no way to assert two contradictory things as co-equal legal state. The lineage is legitimate and should be *cited, not claimed*: paraconsistent belief revision / LFI; bipolar/weighted argumentation (Dung; Toulmin/Pollock supports-rebuts-undercuts); **nanopublications** (assertion + provenance + publication-info, with a 2024+ line using provenance to detect contradictory claims); RDF-star / named graphs; Cyc microtheories; and most precisely **Standpoint Logic**, purpose-built so "multiple viewpoints [can] be integrated into the same ontology, even when certain viewpoints may hold contradicting beliefs." *But* the critique grep-counted the live store and found **zero** `supports`/`rebuts`/`undercuts`/`hypothesis`/`contradiction`/`sameAs` predicates among 39.5M statements; the store is ~96% genealogy facts; the user's own memory confirms "only 3 of ~80 Caroline-line kinship triples have evidence_links, all in test contexts." **The differentiator is, today, an affordance, not an exercised reality.** Worse: the critique *read `extraction.py`* and found that the six apertures run *independently and are deduped* — **there is no cross-lens intersection / relationship-discovery step in the code at all.** The discovery engine the vision presupposes does not yet exist; the substrate is being justified by a generator that hasn't been written. This is the most important correction in the entire report.

**What the critiques refuted or seriously qualified — integrate these or the vision dies:**

1. **Novel ≠ valuable (the ideation-execution gap).** Si & Hashimoto (arXiv 2506.20803): LLM ideas rated *more novel* than experts (5.78 vs 4.91), but after 43 experts spent 100+ hours each *executing* them, LLM scores **collapsed** (overall −1.976, novelty −1.049) while human ideas barely moved. **The apparent novelty advantage was an evaluation artifact.** Volume of "connections no one drew" is the *expected signature of spurious links*, not of discovery (Calude & Longo on big-data spurious correlation). The verifier, not the generator, is where all value and all unsolved difficulty live.

2. **The bitter lesson (Sutton).** A hand-authored taxonomy of "philosophical / teleological / semiotic…" lenses is exactly the engineered human structure that scaled end-to-end learning has repeatedly out-classed. mat2vec found latent thermoelectrics with *plain gradient-boosting on embeddings* — no lenses, no symbolic store. A frontier LLM prompted "find a non-obvious connection between X and Y across economic and ecological framings" may extract the same links more cheaply. **If a vanilla frontier model matches the lens engine, the lenses are dead weight.** This is the crux baseline the pilot *must* beat.

3. **The precision/noise deluge is made *worse* by many lenses, not better.** Multiplying lenses multiplies the candidate cross-product; multiple-comparisons / false-discovery-rate math (Benjamini-Hochberg) guarantees the expected count of spurious-but-surprising "significant" hits approaches certainty. The novelty/precision tradeoff is *empirically adverse* (serendipity recommenders: novelty and accuracy directly negatively correlated). LLM self-validation does **not** certify truth ("even unanimous agreement among SOTA LLM critics does not guarantee scientific accuracy"). The Lean-4 overlay certifies *shape/consistency*, not empirical truth — it cannot supply the missing external signal.

So the honest verdict, integrated: **the discovery idea is old and the generation half is increasingly commoditized; the defensible white space is narrow but real — an agentic, machine-generated, paraconsistent claim store with first-class evidence-anchoring and a formal verification overlay, operated at scale on a domain whose evidentiary structure *demands* contradiction-preservation.** The genealogy / native-title domain (descendant-restricted records, irreducibly conflicting witnesses, legal stakes where every source-attestation must be preserved paraconsistently, and where the user's own standing rule is "no authority is ground truth") is the *strongest possible* first home, because there contradiction-holding is not a clever architecture choice — it is the actual requirement.

---

## 4. The architecture of a lens engine

Here is a concrete design that takes every correction above seriously.

### 4.1 A taxonomy of lenses — organized, not listed

The semantic-decomposition tradition (Pustejovsky's qualia, NSM's primes, UDS's scalar layers) and faceted classification (PMEST) teach two non-negotiable design contracts: lenses should be **as orthogonal as possible** (value lives in *synthesis across* facets, not within one), and each lens should emit **structured, typed, graded, falsifiable descriptors**, not prose. Organize the spectrum into five families, anchored where possible to an existing formal lens so each is more than a prompt:

| Family | Lenses | Anchored to | Emits (typed descriptor) |
|---|---|---|---|
| **Constitutive** (what it *is*, is *made of*, is *for*, *came from*) | formal/kind, mereological (parts), telic (purpose), agentive (origin), material | Pustejovsky qualia; BFO continuant/occurrent | qualia 4-tuple; part-of lattice; function frame |
| **Dynamical** (how it changes, causes, persists) | temporal, causal, processual, modal/counterfactual | UDS event aspect (telicity/dynamicity); Jackendoff GO/CAUSE; bitemporal | event graph; causal DAG fragment; valid-time interval |
| **Semiotic-linguistic** (how it means, signs, frames) | linguistic, semiotic, rhetorical/pragmatic, narrative | FrameNet frames + roles; AMR; NSM primes | filled frame; predicate-argument graph; sign relation |
| **Normative-evaluative** (worth, ought, beauty) | ethical, aesthetic, teleological-value, legal | argumentation (Toulmin/Pollock); standpoint logic | value-claim + warrant; defeasible norm edge |
| **Situated** (its place in larger systems) | social, economic, ecological, political, phenomenological | structural holes / network position; D&S descriptions | role-in-network; flow/exchange edge; experiential frame |

Two design rules from the critiques. First — **dynamic, fine-grained lenses beat a fixed generic list.** Solo-Performance-Prompting (Wang et al., NAACL 2024) found dynamically-identified personas crush fixed ones (Codenames 79% vs 38%); a treaty rewards "legal/temporal/diplomatic-game-theory," a poem rewards "prosodic/semiotic/phenomenological." Let an agent *select* the lenses an entity actually rewards, drawn from this organized palette, rather than always running all of them. Second — **subjective lenses (aesthetic, ethical, phenomenological) have no convergent ground truth**; emit them as *distributions of defensible readings* tagged perspectival, never as facts. Per the bias-variance-diversity theory (Wood, Mu, Brown, JMLR 2023): diversity helps *only among individually competent members*, so gate each lens-output by a competence check and budget frontier-model capability — cognitive synergy *emerges only at GPT-4-level* (SPP).

### 4.2 Agents apply lenses → deep typed descriptors

Each selected lens runs as a specialized agent (CAMEL/AutoGen-style infrastructure) producing **scalar, confidence-weighted, orthogonal** descriptors on the entity's node — the UDS pattern, which is *exactly* what a paraconsistent, hypothesis-weighted substrate wants. To defeat **representational collapse** ("Representational Collapse in Multi-Agent LLM Committees": same model under N persona prompts gives cosine 0.888, effective rank 2.17/3 — illusory diversity), *structurally* vary models, temperature, and tools per lens, and **measure** semantic diversity (effective rank / pairwise distance), discounting near-duplicates. Heed YARN: use the LLM to *abstract*, then a structural step to *validate* — never trust the raw LLM mapping (Lewis-Mitchell counterfactual collapse). Heed Fodor: every descriptor must be **evidence-anchored to a source byte** so the question "real content or relabeling?" has an answer.

### 4.3 Generating cross-lens × cross-entity relationship hypotheses — *the heart*

This is the step that does not yet exist in the code and must be built. It is *not* per-lens fact extraction then dedup. It is: given the descriptor-decorated nodes, generate edges *between* nodes that become visible only at lens seams. Concrete generators, each with prior-art backing:

- **Shared-qualia bridging** (Pustejovsky + Swanson ABC): two entities sharing a deep *telic* or *agentive* structure across different surface domains → propose a relation, with the shared quale as the bridge B. (Silk and an energy-intensive material share a constitutive quale; a treaty and an immune response share a *telic* "boundary-defense" structure.)
- **Structure-mapped candidate inference** (Gentner SME): when two entities align structurally under a lens, *project the unmatched base structure onto the target* as a new `hypothesis_only` edge — SME's candidate-inference mechanism, the literal "relationship no one drew."
- **FCA/RCA lattice mining**: project the cross-lens descriptors into formal contexts (object × attribute) and per-relation RCA contexts; *compute* the concept lattice + canonical implication basis. The emergent implications **are** discoveries — and they come with a deterministic derivation that pairs perfectly with Lean certification (FCA implications are exactly the shape Lean can verify). Use *fuzzy/relaxed* FCA because LLM attributes are graded and noisy.
- **Bounded random-path sampling** (SciAgents; Sourati-Evans): do *not* enumerate all N²×L² pairs. Sample random (not shortest) paths between *distant* nodes — distant beats nearest for creativity — bounded by an exploration budget, *avoiding the crowd* (down-weighting densely co-mentioned, cognitively-reachable pairs).
- **Graph-of-Thoughts recombination** (Besta et al.): model lens-outputs as graph vertices and explicitly generate *edges between them*; do not concatenate per-lens fact lists.

### 4.4 Scoring — novelty × plausibility × value, and the war on combinatorics

Never score a relationship with one number (Kotkov's decomposition). Carry four separate axes, computed cheaply *before* any expensive agent or Lean touch:

1. **Novelty** — is this triple absent from the substrate? (Trivial given bitemporality.)
2. **Unexpectedness** — Murakami/Ge's *primitive prediction model*: a cross-lens link is surprising **iff a cheap single-lens baseline would NOT have produced it** (Runexp = R \ PM(u)). This *is* the ablation that proves the lenses earn their keep.
3. **Value** — the science-of-value composite, all computable on donto's quad graph: **Burt brokerage / structural-hole span** (how many non-redundant clusters does it bridge), **Uzzi z-score atypicality** against a degree-preserving randomized null (favoring *high-conventional-scaffold + high-tail-novelty*, not max novelty), and **discord** (does it bridge clusters carrying `supports`/`rebuts` argument-edge disagreement — Chen/Ding/Evans).
4. **Bayesian surprise** (Itti-Baldi) — D_KL between the substrate's belief *before* and *after* admitting the edge: "how much does this change what donto believes," provably distinct from mere rarity, and composing naturally with paraconsistency (a contradiction-inducing edge is maximally surprising). Balance against value, since surprise alone rewards errors.

Critically: **plausibility is a gate, never the ranking** (LLMs over-produce plausible-invalid hypotheses). And **multiple-comparisons control is not optional** — make the false-discovery-rate budget an explicit, tunable parameter; an engine proposing millions of links *manufactures apophenia by construction.* Recycle rejected links as negatives (the HITL-KG pattern).

### 4.5 The verification ladder

A funnel, narrowing, with cost rising at each rung — so cheap filters must do the heavy lifting:

1. **Evidence anchoring** — every edge pinned to a source byte (mandatory; the answer to Fodor and to "serendipity-shaped hallucination").
2. **Argumentation** — typed `supports`/`rebuts`/`undercuts` edges accumulate; bipolar argumentation frameworks adjudicate which claim wins *under a given lens* at query time (identity-as-hypothesis).
3. **Formal certification** — Lean-4 certifies the *shape/rule* of the rare valuable edge (FCA implications, structural-mapping isomorphisms). Be honest about its limit: it certifies form, **not empirical truth.**
4. **Empirical / external grounding** — the only unambiguous metric (Robin). Time-sliced retro-validation; expert adjudication; in the genealogy domain, a *bridging document* (one marriage cert anchors three generations).
5. **Human review** — only the thin top slice that survives 1–4.

### 4.6 Why donto is the natural home

Paraconsistency lets the engine *hold the firehose without collapsing* — converting "decide value at generation time" (everyone else's binding constraint) into "accumulate speculative links and let evidence arrive asynchronously." Identity-as-hypothesis lets the *same* substrate discover relationships under *different* identity resolutions — a genuinely unexplored degree of freedom, and exactly where many false LBD links come from (spurious co-occurrence of ambiguously-grounded entities). Bitemporality lets a speculative edge be held as legal state *now* and retro-confirmed or rebutted by later-ingested evidence *without rewriting history* — BioDisco's temporal evaluation, native. And the contradiction frontier *is* the Chen/Ding/Evans priority map. No vector DB + reranker can do this without being rebuilt into a graph/claim store.

---

## 5. The hard problems (brutally honest)

1. **Combinatorial explosion.** N entities × L lenses × pairwise relations is astronomically large, and almost all candidates are junk. Generation is trivially cheap (Weitzman/Arthur/Kauffman: the supply of combinations is effectively unbounded). The *entire moat is the triage filter.* SciAgents and Co-Scientist only tame the *single-lens* version (bounded random sampling, Elo pruning); many-lens intersection needs a fundamentally better candidate-generation theory, or it drowns before any triage.

2. **Novel ≠ true ≠ valuable.** The deepest correction. The ideation-execution gap (Si-Hashimoto) and TruthHypo's explicit novelty↔validity tradeoff (more creative ⇒ more hallucinated) show *the most surprising link is disproportionately likely to be wrong.* "No one thought to draw this" is the expected signature of a spurious link. Pareidolia by construction.

3. **Evaluation has no ground truth.** By definition, "a connection nobody made" has no label set. Time-slicing only validates connections that *eventually got published* (and its gold standard is mostly meaningless co-occurrence — Sebastian/Moreau). Value is subjective, experienced, and *delayed* — Wang/Veugelers/Stephan show the highest-value bridges look worthless short-term, so any greedy score discards them. There is no formal definition of "a discovery," no shared benchmark.

4. **The bitter lesson.** The single most dangerous critique. If a frontier LLM, prompted directly, surfaces the same valuable cross-domain links without the lens taxonomy and without the symbolic store, the elaborate pipeline is refuted. mat2vec is the existence proof that embeddings already capture latent structure cheaply. **The lenses must demonstrably *add* discoveries the single-pass baseline misses (ablation: drop lenses, value drops) — or they are scaffolding a model already internalizes.**

5. **The profound-sounding-but-vacuous failure mode.** LLMs fluently generate confident, beautiful, false cross-domain connections. Without per-lens competence gating, mandatory evidence-anchoring, FDR control, and an external verifier, the engine is a *serendipity-shaped hallucination generator*, and the paraconsistent store becomes "a landfill of unfalsifiable speculation."

6. **Cyc is the cautionary ghost.** A hand-built, microtheory-based, contradiction-tolerant, context-relative ontology that consumed 2000+ PhD-years and collapsed under curation load. The lens engine's bet is that *agents* — not knowledge engineers — fill the lenses cheaply, and that verification is *largely automated*. If either fails, donto becomes Cyc with extra steps.

The thread tying all six: **generation is solved; discrimination is not.** The founder's vision succeeds or fails entirely on the verifier.

---

## 6. How to build it on donto, concretely

A phased plan that builds the *missing* discovery step first, beats the bitter-lesson baseline before scaling, and proves itself with retrospective time-slicing.

**Phase 0 — Make the moat real (close the schema-vs-reality gap).** Before any new generation, *populate and exercise* the differentiating machinery on a bounded corpus: write `supports`/`rebuts`/`undercuts` argument edges, `hypothesis_only` flags, and `evidence_links` for a few thousand relationship triples so they are *load-bearing, query-optimized state*, not test-context curiosities. Critique #3's grep returning zero must become a non-trivial fraction. *This is the precondition for everything.*

**Phase 1 — Build the cross-lens intersection step.** Extend the 6 apertures from independent per-entity extraction to a genuine **relationship-hypothesis generator** (§4.3): start with shared-qualia bridging + SME candidate-inference + bounded random-path sampling over one bounded corpus. Emit edges as `hypothesis_only`, evidence-anchored. This is the engine the vision is actually about and which the code does not yet contain.

**Phase 2 — The scorer + FDR gate.** Implement the four-axis score (§4.4) as cheap pre-filters: novelty (substrate lookup), unexpectedness (single-lens primitive-prediction subtraction — *the ablation*), Uzzi z-score + Burt brokerage + discord over the quad graph, Bayesian surprise. Make the false-discovery budget an explicit knob. Recycle rejections as negatives.

**Phase 3 — The verification ladder.** Wire survivors into argumentation adjudication → Lean-4 shape certification (FCA implications are Lean-shaped) → human/empirical review of the thin top slice.

**The falsifiable pilot.** Pick a domain with *cheap ground truth* — the strongest is the genealogy / native-title corpus (bridging documents are checkable; contradiction-preservation is a hard requirement; "no authority is ground truth" is already the user's posture). Then run **retrospective time-slicing**, the field's only non-gameable metric:

1. Freeze the corpus at year/version Y. Run the N-lens engine. Record top-ranked `hypothesis_only` relationship edges.
2. Ingest post-Y evidence (later-discovered records, certs, profiles). Measure how many top-ranked machine-proposed edges are *confirmed* by post-Y evidence (rediscovery of known-but-later connections).
3. **Three baselines it must beat, or the vision is qualified:** (a) a frontier LLM prompted directly for cross-domain links over the same corpus — *the bitter-lesson crux*; (b) KG-embedding link prediction (ULTRA/AnyBURL) + LBD/analogy mining on the same graph; (c) the lens engine with lenses ablated (drop lenses → value must drop, proving value lives at *intersections*).

Report **precision/value** (edges surviving independent verification), not rated novelty. Pre-register the FDR budget and splits (the LBD reproducibility lesson). If the lens engine beats (a)–(c) on verified-valuable edges at a controlled false-discovery rate, the white space is real and executed. If it ties (a), the lenses are dead weight and the honest pivot is to the *substrate-and-curation* product — narrower, but still genuine white space.

---

## 7. The visionary horizon

Suppose Phase 0–3 succeed and the pilot beats all three baselines. What then?

The near horizon is modest and concrete: **a standing discovery engine for one domain** — a genealogy substrate that, as each new record arrives, *re-ranks its entire dormant frontier of speculative kinship and identity hypotheses*, surfacing the bridging documents a human researcher would have found only by holding every source at once. Because contradictions are held, not collapsed, it does the one thing the field's own ethics demand: it preserves every source-attestation paraconsistently and lets the *lens* (genealogical vs legal vs DNA) decide which reading wins at query time. That alone is a real product in a domain where the user already lives.

The middle horizon generalizes the operating model. Every prior discovery system is *one-shot*: generate, rank, discard. donto's bet turns one-shot generation into a **compounding asset.** Each new lens, each new entity, each new piece of evidence *re-illuminates the entire held frontier* — the adjacent-possible explosion (Kauffman's TAP equation) working *for* you rather than against you, because the speculation persists. A relationship proposed in 2026 and held as `hypothesis_only` can be confirmed by evidence ingested in 2031, retro-dated correctly by bitemporality, certified by a Lean proof whose shape was unprovable until a 2029 lens was added. **This is the first architecture where machine serendipity *accumulates*.**

The far horizon — earned, not promised — is the one the founder is reaching for: a **discovery engine for latent structure across all human knowledge.** The science-of-value tradition proved (Wang/Veugelers/Stephan) there is a *measurable surplus of value* in cross-lens bridges that human, discipline-bounded evaluation leaves on the table — precisely because no human holds all the lenses. Foster/Evans proved scientists *rationally avoid* the risky bridges. A machine can bear the risk humans won't, hold the firehose humans can't, and re-curate forever where humans curate once. If — *if* — the verifier is good enough to keep the false-discovery rate bounded, the engine becomes a standing map of where the next interdisciplinary directions are most likely to pay off: a contradiction frontier that is, literally, Burt's structural holes drawn across all of recorded thought.

But earn the vision by saying the quiet part: the bottleneck is not imagination, and it never was. It is the verifier. The founder's deepest, most counterintuitive asset is *not* the many lenses — those are increasingly cheap, possibly free in a frontier model. It is the **paraconsistent substrate that lets verification be lazy, asynchronous, and compounding** instead of eager and one-shot. That is the genuinely new thing. Build the verifier, prove it with time-slicing, and the lenses will have somewhere worthy to point.

---

## 8. Reading map

Grouped, annotated, with URLs. The starred (★) works are the ones to read *first*.

**The direct ancestor — Literature-Based Discovery**
- ★ Swanson, "Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge" (1986) — the ABC model and the founding existence proof. https://muse.jhu.edu/article/403510/summary
- Swanson, "Migraine and Magnesium: Eleven Neglected Connections" (1988) — convergent multi-path discovery. http://abel.lis.illinois.edu/tutorial/swanson_pbm_1988.pdf
- Smalheiser, "The Place of LBD in Contemporary Scientific Practice" (2008) — the field's honest self-assessment. http://abel.lis.illinois.edu/tutorial/smalheiser_LBD_preprint_2008.pdf
- LION LBD (2019) — ranking honesty: true links buried at rank 15–120,000. https://pmc.ncbi.nlm.nih.gov/articles/PMC6499247/
- ★ Sebastian/Moreau, the 2023 critique of LBD evaluation — *why time-slicing is "too noisy."* https://pmc.ncbi.nlm.nih.gov/articles/PMC9945845/

**The cognitive mechanism — bisociation, blending, creativity**
- ★ Koestler, *The Act of Creation* (1964) — bisociation; the founder's thesis, sixty years early. https://en.wikipedia.org/wiki/The_Act_of_Creation
- Berthold (ed.), *Bisociative Knowledge Discovery* (2012) — BisoNets; read before building. https://link.springer.com/book/10.1007/978-3-642-31830-6
- CrossBee (Juršič et al., 2012) — the running generate-then-verify prototype. https://computationalcreativity.net/iccc2012/wp-content/uploads/2012/05/226-Jursic.pdf
- Fauconnier & Turner, *The Way We Think* (2002) — emergent structure at the blend. https://markturner.org/blendaphor.html
- Boden — combinational / exploratory / transformational creativity. https://en.wikipedia.org/wiki/Computational_creativity

**Analogy — structure-mapping and its modern fusion**
- ★ Gentner, Structure-Mapping Theory (1983) — what a lens-comparison *is*. https://groups.psych.northwestern.edu/gentner/papers/Gentner83.pdf
- Falkenhainer, Forbus & Gentner, the SME (1989) — *candidate inferences* = the literal discovery step. https://groups.psych.northwestern.edu/gentner/papers/FalkenhainerForbusGentner89.pdf
- Hofstadter & Sander, *Surfaces and Essences* (2013) — analogy as all thought; representation under pressure. https://www.basicbooks.com/titles/douglas-hofstadter/surfaces-and-essences/9780465018475/
- Hope/Shahaf, "Analogy Mining" (KDD 2017) — facet embeddings at scale. https://arxiv.org/abs/1706.05585
- SOLVENT (2018) — closest document-level relative; one facet schema. https://arxiv.org/abs/1812.06974
- Lewis & Mitchell (2024) — LLM analogy collapses on counterfactuals (proposal ≠ verification). https://arxiv.org/abs/2402.08955

**Faceted classification and Formal Concept Analysis — the emergence engine**
- ★ Ranganathan, Colon Classification / PMEST (1933) — orthogonal-facet generativity. https://en.wikipedia.org/wiki/Colon_classification
- ★ Wille / Ganter, Formal Concept Analysis (1981/1999) — a machine for emergent concepts + implications. https://en.wikipedia.org/wiki/Formal_concept_analysis
- Rouane-Hacène et al., Relational Concept Analysis (2013) — multi-kind cross-entity discovery. https://link.springer.com/article/10.1007/s10472-012-9329-3
- Xiong, "The Lattice Representation Hypothesis of LLMs" (2026) — embeddings already encode lattices. https://arxiv.org/html/2603.01227v1

**Semantic decomposition — the lens-per-entity precedent**
- ★ Pustejovsky, *The Generative Lexicon* / qualia (1991) — four lenses + the dot-object (identity-as-hypothesis precedent). https://aclanthology.org/J91-4003.pdf
- Wierzbicka & Goddard, NSM primes — the shared decompositional alphabet. https://en.wikipedia.org/wiki/Natural_semantic_metalanguage
- White et al., Universal Decompositional Semantics (UDS) — many graded, weighted lenses on one graph; *imitate this.* https://arxiv.org/abs/1909.13851
- Fodor & Lepore, "Lexical Decomposition: For and Against" — the objection you must answer. https://www.cs.ox.ac.uk/files/240/lexdecomp.pdf

**The science of value — which connections are worth it**
- ★ Burt, "Structural Holes and Good Ideas" (AJS 2004) — value is at the bridge, and it's measurable. http://www.ronaldsburt.com/research/files/SHGI.pdf
- ★ Uzzi et al., "Atypical Combinations and Scientific Impact" (*Science* 2013) — the keystone scoring recipe; pure novelty underperforms. https://www.science.org/doi/10.1126/science.1240474
- Foster, Rzhetsky & Evans (ASR 2015) — the market gap a machine can exploit. https://arxiv.org/abs/1302.6906
- Chen, Ding & Evans, "Disconnection and Discord" (2021) — warrant for the contradiction frontier. https://arxiv.org/pdf/2103.03398
- Sourati & Evans, "human-aware AI" (*Nature Human Behaviour* 2023) — *avoid the crowd*; +400% prediction. https://pubmed.ncbi.nlm.nih.gov/37443269/
- Granovetter, "The Strength of Weak Ties" (1973) — the origin. https://www.cs.cmu.edu/~jure/pub/papers/granovetter73ties.pdf

**Modern agentic discovery — the shipped state of the art**
- ★ SciAgents (Ghafarollahi & Buehler, 2024/25) — random-path KG sampling + Ontologist/Scientist/Critic; the closest analog. https://arxiv.org/abs/2409.05556
- ★ Google DeepMind AI Co-Scientist (*Nature* 2026) — generate/reflect/rank/evolve + Elo + **wet-lab validation**; the Proximity agent prevents collapse. https://www.nature.com/articles/s41586-026-10644-y
- Robin (FutureHouse) — end-to-end to a validated candidate; downstream validation as the only honest gate. https://www.futurehouse.org/research-announcements/demonstrating-end-to-end-scientific-discovery-with-robin-a-multi-agent-system
- Tshitoyan et al., mat2vec (*Nature* 2019) — latent discoveries are *already in the corpus*, no lenses. https://www.nature.com/articles/s41586-019-1335-8

**KG link prediction — the cheap first-pass generator**
- ULTRA (Galkin et al., ICLR 2024) — zero-shot link prediction on any KG. https://arxiv.org/abs/2310.04562
- AnyBURL (Meilicke et al.) — explainable path-rules that match embeddings; Lean-friendly. https://link.springer.com/article/10.1007/s00778-023-00800-5

**Multi-perspective reasoning — and its failure modes**
- ★ Solo-Performance-Prompting (Wang et al., NAACL 2024) — dynamic fine-grained personas beat fixed; synergy needs frontier capability. https://aclanthology.org/2024.naacl-long.15/
- Wood, Mu, Brown, bias-variance-diversity (JMLR 2023) — diversity helps *only among competent members*. https://jmlr.org/papers/volume24/23-0041/23-0041.pdf
- "Representational Collapse in Multi-Agent LLM Committees" — persona diversity is often illusory (cosine 0.888). https://arxiv.org/html/2604.03809
- "Talk Isn't Always Cheap" — debate frequently *degrades* accuracy via conformity. https://arxiv.org/html/2509.05396v2

**Serendipity and surprise — how to score the rare gold**
- ★ Itti & Baldi, Bayesian Surprise (2006) — D_KL(posterior‖prior); the native scorer. http://ilab.usc.edu/publications/doc/Itti_Baldi06nips.pdf
- Kotkov, Wang & Veijalainen, serendipity survey (2016) — never collapse to one number. https://www.sciencedirect.com/science/article/abs/pii/S0950705116302763
- Murakami / Ge, primitive-prediction-model unexpectedness — the ablation that proves lenses earn their keep. https://link.springer.com/chapter/10.1007/978-3-540-78197-4_5
- Benjamini & Hochberg, False Discovery Rate (1995) — the inescapable governor at scale. https://en.wikipedia.org/wiki/False_discovery_rate

**The contradiction-holding substrate — cite, don't reinvent**
- Standpoint Logic — multi-perspective KR with contradicting viewpoints in one ontology. https://www.researchgate.net/publication/357533078_Standpoint_Logic_Multi-Perspective_Knowledge_Representation
- Cyc & microtheories — the deepest precedent *and* the cautionary tale. https://en.wikipedia.org/wiki/Cyc
- DOLCE + Descriptions & Situations — "the same facts re-read under many descriptions." https://arxiv.org/pdf/2308.01597
- Nanopublications — assertion + provenance; provenance-driven contradiction detection. https://link.springer.com/article/10.1007/s00799-025-00431-x
- POPPER — agentic sequential falsification; "certify the survivors" is not fantasy. https://openreview.net/forum?id=iTevNo8PzG

**The essential warnings — read these *with* the vision, not after**
- ★ Si & Hashimoto, the Ideation-Execution Gap (2025) — novelty collapses under execution; *the* correction. https://arxiv.org/abs/2506.20803
- ★ Sutton, "The Bitter Lesson" — the baseline the lens engine must beat. https://en.wikipedia.org/wiki/Bitter_lesson
- Calude & Longo, on spurious correlation in big data — "no one drew this" is the signature of *noise*. https://www.di.ens.fr/users/longo/files/BigData-Calude-LongoAug21.pdf
- TruthHypo — the explicit novelty↔validity tradeoff (more creative ⇒ more hallucinated). https://arxiv.org/pdf/2505.14599

---

*Final word, in the founder's interest and not in flattery: the lens idea is old, the bisociation move is old, the discovery dream is forty years old, and the agentic generate-rank-evolve loop has already produced wet-lab-validated discoveries this year. None of that is the moat — and believing it is the moat is the fastest way to build Cyc again. The genuinely unclaimed ground is narrow, real, and yours for the taking: an agentic, machine-generated, paraconsistent, evidence-anchored claim store that holds the entire speculative frontier without collapsing and re-curates it forever — first proven on a domain (genealogy / native-title) whose evidentiary structure actually demands contradiction-preservation. Build the verifier, build the missing cross-lens intersection step, beat the bitter-lesson baseline with retrospective time-slicing, and you will have made the first place where machine serendipity compounds. That is worth doing.*