Progress on MindCap today. The feature I've been wrestling with—a Topic Registry—is coming together.
MindCap is a privacy-focused browser behavior analysis tool I'm building to help users understand their online attention patterns. Today's work takes it from "tracking raw data" to "surfacing meaningful insights."
The Problem: Browser History Is Just a List of URLs
Your browser history tells you where you went. A timestamped list of URLs, stretching back weeks or months, useless for understanding what you were interested in or how that interest evolved.
I wanted to answer questions like:
- What topics am I spending the most time on?
- Is my interest in "machine learning" growing or fading?
- Which topics keep pulling me back week after week?
Browser history can't answer any of these. It's raw data without meaning. So I built a Topic Registry.
The Solution: Topic Registry
The Topic Registry transforms raw browsing sessions into aggregated topic records. Instead of "you visited 47 URLs today," you get "you spent an hour on Kubernetes, spread across 12 sessions, and your interest is trending upward."
Here's what a topic record looks like:
topic: "kubernetes"
total_time_ms: 3600000 # 1 hour cumulative
session_count: 12 # appeared in 12 sessions
engagement_level: "engaged" # 10-30 min total = engaged
weekly_time_ms: [45min, 30min, 15min, 0, 0, ...] # 12-week trend
related_topics: ["docker", "devops", "aws"]
Each topic tracks cumulative time, session count, engagement level, a 12-week time series for trend detection, and related topics that often appear in the same sessions. It's a complete picture of your relationship with a topic.
Engagement Levels: Making Numbers Human
Raw milliseconds are meaningless to humans. "You spent 847,293 ms on Python" tells me nothing. So I classify time into engagement levels:
| Level | Time Invested | Meaning |
|---|---|---|
quick_peek |
< 2 min | Barely touched |
exploring |
2-10 min | Getting familiar |
engaged |
10-30 min | Actively learning |
invested |
30-90 min | Significant commitment |
deep |
90+ min | Core interest area |
Now I can say "You're engaged with Kubernetes" or "Python is a deep interest for you." That actually means something.
Trend Detection: Growing or Fading?
The weekly_time_ms array is where the magic happens. By storing 12 weeks of time data per topic, I can detect whether interest is growing or fading.
The algorithm is simple: compare recent weeks to historical average.
# Trending: recent activity 50%+ higher than historical
if recent > older * 1.5:
return "trending"
# Fading: recent activity 50%+ lower than historical
if recent < older * 0.5:
return "fading"
This surfaces insights like "Your interest in Rust is growing" or "You haven't looked at Go in 3 weeks." Seeing your attention patterns visualized can be motivating—or concerning, depending on what you've been browsing.
Implementation Details
A few technical decisions worth noting:
Rolling Averages for Stability
One bad browsing session shouldn't destroy your engagement average. If you spend 3 hours rage-reading about a topic you hate, that shouldn't permanently mark it as a "deep interest."
Solution: weighted rolling averages.
new_avg = (old_avg * 0.8) + (new_score * 0.2)
New data contributes 20%, historical data keeps 80%. This creates smooth trends rather than noisy spikes. A single outlier session won't wreck your averages.
Time Distribution Across Keywords
When a session has multiple keywords, how do you attribute time? A 30-minute session about "building REST APIs in Python using FastAPI"—does Python get all 30 minutes? Does FastAPI?
I went with equal distribution: that session gives 10 minutes each to python, fastapi, and async. It's a simplification—you could weight by keyword frequency—but it's good enough for v1.
Background Processing
Topic updates happen as a background task after sync. Users shouldn't wait for analytics to compute:
background_tasks.add_task(
update_topics_from_session,
user_id,
session_data,
visits_data
)
The extension feels snappy because it is. All the heavy lifting happens after the user has moved on.
What's Next: Pattern Detection
The Topic Registry is the foundation for the next piece I'll build: Pattern Detection. Patterns use topic data to surface behavioral insights that raw metrics can't capture.
Here's what I'm planning:
- recurring_interest — Topics that appear across 3+ sessions over 2+ weeks. These are your genuine interests, not one-off curiosities.
- time_sink — High time but low engagement. You're scrolling without reading. The topic is consuming attention without delivering value.
- rabbit_hole_trigger — Topics that lead to unfocused browsing sessions. "Every time I start reading about productivity, I end up watching YouTube for 2 hours."
- circling_interest — Repeated quick peeks without diving deep. You keep glancing at something but never committing. Maybe it's time to either dive in or let it go.
Pattern detection is where MindCap stops being a fancy analytics dashboard and starts being genuinely useful for behavior change.
Key Takeaways from Today's Work:
- Aggregate, don't just log. Raw data is overwhelming. Aggregated insights are actionable.
- Use rolling averages for stability. Protect against noise in the data.
- Classify into human-readable levels. "Engaged" means more than "847,293 ms".
- Track trends over time. Weekly arrays enable "growing" vs "fading" detection.
- Process in background. Don't make users wait for analytics.
MindCap is a personal project exploring how AI can help users understand their digital attention habits. All data stays local-first with privacy-preserving sync. Your browsing history never leaves your device—only the insights do.
More updates as Pattern Detection takes shape.