How Do You Know the Story Is True?

MindCap’s whole pitch is that it shows you the story your curiosity has been telling. “You’ve been on a journey about emergence for three weeks and didn’t notice.” That’s a lovely thing for software to say.

It’s also a terrifying thing for software to say, because how would you ever know if it’s wrong?

I spent today not writing code. I sat with a 60-question ontology document — the formal model of what MindCap thinks the world is made of: Visits, Topics, Patches, Arcs, Themes — and worked through the decisions I’d been avoiding. Somewhere in the middle I hit the question that actually matters, and it wasn’t one of the sixty. It was: what stops this from becoming beautiful, unfalsifiable nonsense?

The Seduction of the Narrative Layer

The ontology has three layers. Layer 1 is what I literally captured: you visited this page, for this long, scrolled this far. That’s data. It’s boring and it’s true.

Layer 3 is the story: Arc, Theme, Question, InsightEvent. “An Arc is a thread of inquiry.” “An InsightEvent is a moment where your understanding reorganized.” These are the entities that make the product mean something — and every one of them is an interpretation, not an observation.

Here’s the trap. I can write an algorithm that looks at three weeks of browsing and emits: “You’ve been pursuing a question about decentralized systems.” It will sound insightful. It will sound insightful whether or not it’s true, because narratives are like that — we’re pattern-completing animals, and a plausible story about ourselves is almost impossible to reject. If I show you a story about your own curiosity, you will tend to believe it. That’s not validation. That’s a horoscope.

I wrote down, as the thing I’m most worried about getting wrong: the narrative layer being unfalsifiable. A system that tells you compelling stories you can’t check is not a self-knowledge tool. It’s a flattery engine.

You Can’t Validate a Vibe

The usual instinct is to make the detection smarter — better clustering, a bigger model, more signals. But that’s solving the wrong problem. A more sophisticated story-generator is still a story-generator. Sophistication isn’t truth.

The actual fix is structural, and it’s almost embarrassingly simple: the user has to be able to disagree, and the disagreement has to be recorded.

If MindCap proposes “you’ve been on an Arc about emergence” and you say no, that was me planning a course — that “no” is the most valuable signal in the whole system. It’s the only thing in the architecture that can tell me the model was wrong, as opposed to merely unconfirmed. So the design requirement I landed on, for every Layer 3 entity, is one question: can the user contradict this, and do we capture the contradiction?

That reframed a whole cluster of design decisions I’d been treating as UI questions. They’re not UI questions. They’re the falsification mechanism. Here’s the shape of it: the boring true data at the bottom, the gorgeous interpretation up top, and the one arrow that keeps the whole thing honest.

The system proposes a story upward from true data; the only thing that makes the story checkable is the pink arrow — your “no”, captured as the model’s error signal.

Who identifies an Arc? Not the system alone (unfalsifiable) and not the user alone (defeats the point). The system proposes; the user accepts, rejects, modifies, or names. The acceptance or rejection is the ground truth I otherwise don’t have.
In-progress or finished? The system can guess from recency. But only you know whether you resolved a question or abandoned it — and which one it is matters enormously. So the system infers, and you can override. The override is data.
Competing interpretations? I used to think the system should pick the best story. Now I think it should hold several in parallel, each tagged with how it was computed and how confident it is, and let you point at the one that resonates. A narrative product that insists on a single true story is lying about how stories work.

Every one of those “let the user correct it” loops is doing double duty: it’s good UX, and it’s the experiment. User rejection rate is the model’s error rate. I can’t measure whether a story is “true” in the abstract — but I can measure how often you tell the system it got you wrong.

The Other Discipline: Don’t Model What You Can’t Yet Test

There’s a sibling worry to unfalsifiability, and it showed up everywhere today: over-modeling before the data exists.

I have, at this point, an elaborate vocabulary for things I have never actually measured. Patch (a bounded engagement with a coherent topic cluster). Scent (the pull from one topic toward the next). Chase versus Wander. These are borrowed from information foraging theory and they’re gorgeous. They are also, right now, hypotheses wearing the costume of a schema.

The honest tell is that I haven’t shipped the extension that would collect the behavioral data — the per-visit timing, the navigation graph, the engagement signals — that any of these entities actually require. I’ve been designing the museum before the excavation. So nearly every Layer 2 and Layer 3 answer in the document now carries the same tag: gated on live data we don’t have yet. Not “decided,” not “built” — specified, and waiting to be tested.

This is the same lesson the keyword-extraction work taught me last week, one floor up. There, the discipline was decide with numbers, not with the urge to rewrite. Here it’s model with humility, and build the disagreement into the model so the numbers can eventually arrive.

A Small Thing That Wasn’t Small

One more, because it’s a good cautionary tale. For weeks I’d been using the word federated to describe topics. Today I tried to answer “how do private topics differ from federated ones” and couldn’t, because I realized I’d been using one word for two completely different ideas:

Comparing your topic graph to an established external map (is your “emergence” the same as Wikipedia’s “emergence”?) — available now, single-user, no privacy issue.
Pooling your topics with other users’ graphs to find kindred curiosity — a Year-2, multi-user, privacy-laden thing.

These are orthogonal. A topic can be anchored to an external authority and still be completely private. One sloppy word had been hiding a real design fork. The ontology bug wasn’t in the model — it was in my vocabulary. Untangling it into two independent properties took five minutes once I saw it, and the only reason I saw it was that the formalism forced me to.

That, honestly, is the case for doing this at all. Ontology work is slow and it doesn’t ship. But it’s the discipline that makes you say precisely what you mean — and it caught a flattery engine before I built one.

Where I Landed

Worry	The structural answer
Narrative layer is unfalsifiable	Every Layer 3 entity must be contradictable; user disagreement is captured as the error signal
System tells compelling-but-wrong stories	System proposes, user refines — no single forced “true” story; parallel interpretations with confidence
Over-modeling before data	Layer 2/3 entities are specified and gated, not built; nothing claims to be validated until live data exists
“Resolved” vs “abandoned”	Only the user knows; the system infers, the user overrides, the override is data

None of this makes the stories true. It makes them checkable — which, for a tool whose entire job is helping you see yourself clearly, is the only honesty that counts.

What I’m Reading

Still circling Pirolli’s Information Foraging Theory — it’s the source of the Patch/Scent vocabulary I’m being so careful with. But the book on my mind today was older and from a different shelf: Popper, on falsifiability. A theory that can’t be wrong isn’t a theory. I’d never thought to apply that to a product, but a feature that can’t be wrong isn’t an insight — it’s a horoscope with better typography. The whole job, it turns out, is making sure the user can tell me I’m wrong.