MindCap: Teaching Software to Understand Rabbit Holes

Some development sessions are spent writing code. Others are spent reading through conversation logs and documentation, re-learning a system you built last week.

I'm nine sessions deep into building MindCap, and today I spent most of my time doing something that doesn't look like progress: reading. Reading my own architecture documentation (we're on version 7.1 now). Reading session logs. Reading through decisions I made three days ago and trying to remember why they seemed so obvious at the time.

The Rabbit Hole as a Feature

Most productivity tools treat wandering attention as a problem to solve. Block this site. Limit that app. Shame yourself into focus.

I think that's backwards. Your rabbit holes are maps of genuine curiosity. When you start researching Kubernetes and end up reading about submarines, that's not a failure of attention—it's a signal about how your mind works. MindCap doesn't judge that journey. It maps it.

I've defined four dimensions for measuring a rabbit hole: depth (how far you went), spread (how wide you wandered), coherence (were the topics related?), and branching (how many tangents spawned tangents). These combine into four rabbit hole types:

coherent_deep_dive — Focused, sustained exploration of a single topic
focused_exploration — Deliberate investigation across related areas
wandering_journey — Loosely connected topics, following genuine curiosity
tangent_hopper — Rapid shifts between unrelated interests

None of these are "bad." They're just different modes of thinking. The interesting question isn't "how do I stop doing this?" but "when do I do this, and what does it reveal?"

The Keyword Problem

One challenge has consumed multiple sessions: how do you extract meaningful topics from page titles and URLs?

It sounds simple. "React Hooks Tutorial" should become [react, hooks, tutorial]. But the edge cases multiply fast:

"roadmap.sh" shouldn't filter out "roadmap" as a domain variant—that's a legitimate tech resource
"machine learning" shouldn't split into two separate keywords
"learning" by itself isn't meaningful, but "machine learning" is a compound term
"tutorial" alone is noise, but "react tutorial" tells you something

My solution evolved over multiple sessions into a layered system:

TECH_KEYWORDS — Programming languages and tools that are never filtered
CONTENT_TYPE_WORDS — Words like "tutorial" and "guide" that are valid when adjacent to tech keywords
COMPOUND_EXCEPTIONS — "learning" is valid after "machine"
KNOWN_COMPOUND_TERMS — Direct lookup for established multi-word phrases

The first version of keyword extraction took an afternoon. Getting it to handle real-world edge cases has taken weeks. That gap—between "it works" and "it works correctly"—is where most of the time goes.

The Documentation Overhead

Something that doesn't get talked about enough: the cognitive overhead of building alone.

Nine sessions in, I'm maintaining multiple documentation files: architecture.md, data-schemas.md, mechanisms.md, decisions.md, project-state.md. Session logs tracking every major decision and its rationale. The architecture doc alone is on version 7.1.

Why? Because AI-assisted development generates code faster than humans can remember.

When I'm working with Claude Code, we can implement a feature in an hour that would have taken me a day to write alone. But the next morning, when I come back to continue, I've lost access to all the reasoning that made those decisions feel obvious. The conversation history IS the institutional memory—and when context compacts or a session ends, that memory becomes documentation or it becomes nothing.

So some days are spent entirely on reading, not writing. Re-learning my own darned system. It doesn't feel productive, but it's the only way I've found to keep the whole thing coherent in my head.

What's Next

The overhead is real, but so is the progress. Each reading session makes the next building session faster. The documentation debt pays dividends.

Immediate next steps: finish validating the keyword extraction logic, test the extension with live browsing, build out the dashboard UI. Longer term: authentication, account linking for YouTube and Spotify watch history, maybe someday a mobile companion.

MindCap is a research project as much as a product right now. I'm not trying to ship fast. I'm trying to build something that works the way browsing works—which turns out to be messy and nonlinear and full of edge cases.

MindCap is teaching me as much about how I work as it will eventually teach users about how they browse. I think that's fitting.