Vintage Vestige: Teaching a Search Engine to Speak Fashion

The Metropolitan Museum of Art has thousands of historical garments in its collection — meticulously cataloged, beautifully photographed, and almost impossible to discover unless you already know what you’re looking for. A Victorian bustle dress doesn’t show up when someone searches “dark academia aesthetic,” even though it’s exactly what they mean.

Vintage Vestige is a semantic search engine that bridges that gap — connecting historical fashion with the way people actually talk about style today. And building it gave me a chance to apply some hard-won lessons from MindCap in a completely different domain.

The Translation Problem

The Met’s collection API provides structured data: title, date, medium, culture. A typical record reads “Dress, ca. 1875, silk, American.” That’s accurate metadata, but it tells you nothing about silhouette, vibe, or how the piece connects to contemporary aesthetics. If you embed that raw catalog text and run semantic search on it, you get results that are technically correct and creatively flat.

This is a translation problem. The information exists — it’s in the image, in the construction details, in the historical context — but the catalog language doesn’t encode it in a way that maps to how anyone actually searches for fashion inspiration.

If that sounds familiar, it should. MindCap had the inverse version of the same problem: URLs and page titles contained some signal about what content was about, but not enough to classify reliably. The answer in both cases turned out to be the same — enrich the input before you try to make sense of it.

The Enrichment Pipeline

The core of Vintage Vestige is an AI enrichment pipeline powered by Claude. Each item — its metadata and its image — gets analyzed through Claude’s vision capabilities and enriched with structured fields: era, decade, garment type, material, colors, fit, pattern, style tags, and a natural-language description that blends historical context with modern vocabulary.

A corset from 1780 gets tagged with “Rococo,” “corseted fitted,” and “coquette.” An 1860s wool coat picks up “Victorian,” “structured tailored,” and “dark academia.” The enriched text gets embedded and stored in a vector database, so modern search queries land on historically relevant results.

The baseline search quality score against raw museum descriptions was 0.656. Not broken — but not the kind of relevance that makes a search engine feel like it understands you.

Prompt Engineering as Data Engineering

This is where MindCap’s lessons paid off directly. One of the clearest takeaways from building MindCap’s keyword extraction system was that the real leverage is in what you feed the model, not the model architecture itself. Swapping embedding models or tweaking similarity thresholds produced marginal gains. Improving the input text moved the needle dramatically.

I went into Vintage Vestige expecting prompt iteration to be a significant part of the work, and it was. Early enrichment runs produced repetitive outputs — nearly every item got tagged “dark academia” regardless of era or garment type. The model was pattern-matching on the most common aesthetic label it had seen, not analyzing each piece individually.

The fix wasn’t more data. It was better prompt design:

Expanded example vocabularies so the model had a wider range of style terms to draw from
Explicit instructions to vary outputs and avoid defaulting to the same tags
Temperature tuning to encourage diversity without losing accuracy
A system prompt that set the right expert persona — fashion historian who also speaks trend language

With MindCap, I’d spent a week debugging keyword extraction before realizing the extraction pipeline’s input was too shallow — only reading 500 characters instead of the full article. Same principle here: garbage in, garbage out, and “garbage” includes “technically correct but insufficiently rich.”

Two Ways to Search

Vintage Vestige supports both text and image search, because fashion discovery works both ways.

Text search uses sentence-transformer embeddings (all-MiniLM-L6-v2, 384 dimensions) over the enriched descriptions. You type “flowy romantic Edwardian dress” and the system finds garments whose enriched descriptions are semantically close to that query — even if no museum record ever used the word “romantic.”

Image search uses CLIP embeddings (512 dimensions) for visual similarity. Upload a reference image — a Pinterest mood board, a screenshot from a period drama, a vintage photo — and find garments with similar silhouettes, textures, or composition. No words needed.

The two modes serve fundamentally different discovery patterns: searching with language versus searching with intuition. Sometimes you can articulate what you want. Sometimes you just have a vibe and a reference image.

The Stack

Layer	Technology	Role
Storage	PostgreSQL	Product metadata and enriched records
Vector DB	Qdrant	Semantic and image similarity search
Text embeddings	all-MiniLM-L6-v2 (384-dim)	Enriched description embeddings
Image embeddings	CLIP (512-dim)	Visual similarity search
AI enrichment	Claude API (vision + text)	Multimodal garment analysis
Frontend	Next.js	Search interface and results display

Two separate embedding spaces in Qdrant — one for text, one for images — means the system can handle both search modes without compromising either. Text embeddings are optimized for semantic meaning in language. CLIP embeddings are optimized for visual features. Trying to cram both into one space would degrade both.

What Carried Over from MindCap

Building MindCap first shaped how I approached this project in ways I didn’t fully anticipate.

Input quality over model quality. MindCap’s keyword extraction was hitting 49% accuracy — and the fix wasn’t a better model, it was feeding the extractor more than 500 characters of article text. Vintage Vestige’s baseline of 0.656 told the same story: the embedding model was fine, the text it was encoding was too thin. In both cases, the intervention that actually moved the numbers was upstream of the model.

Iterate on prompts like you iterate on code. MindCap’s intent detection went through multiple rounds of refinement — broadening keyword lists, expanding compound term handling, adding behavioral signals. The enrichment prompts for Vintage Vestige followed the same pattern: test, read the outputs, diagnose what’s repetitive or shallow, adjust, rerun. Prompt engineering isn’t a one-shot activity. It’s a development loop.

Validate with real data early. The MindCap validation notebook — 56,000 rows of real browsing history — caught a structural problem that unit tests never would have surfaced. I carried that discipline into Vintage Vestige: test enrichment quality against actual search queries, not just spot-check a handful of records and call it good. The numbers are how you know where you stand.

Different Domain, Same Shape

MindCap and Vintage Vestige look nothing alike on the surface. One is a privacy-focused browser extension that tracks browsing behavior. The other is a fashion search engine that connects museum archives to modern aesthetics. But the core engineering problem is the same: raw data that contains signal people care about, trapped in a format that doesn’t surface it.

MindCap extracts behavioral signal from URLs and page content that the browser treats as disposable. Vintage Vestige extracts style signal from catalog records that the museum treats as archival. Both require a translation layer — something that reads the raw data and produces a richer representation that downstream systems (pattern detection, semantic search) can actually work with.

Recognizing that shape early made the architecture decisions straightforward. I didn’t waste time trying to make raw catalog text work. I knew from MindCap that the enrichment step wasn’t optional — it was the whole point.

What’s Next

The enrichment pipeline and dual-mode search are working. Next steps are expanding the collection beyond the Met — the Victoria and Albert Museum and the Smithsonian both have accessible APIs — and building out the frontend to support more expressive browsing: filtering by era, exploring by color palette, and combining text and image queries.

There’s also a quality feedback loop to build. Right now, enrichment quality is measured by search relevance scores against test queries. I want to close that loop — track which results users actually click, and use that signal to improve both the enrichment prompts and the embedding pipeline over time.

Museum collections are one of the best underused resources on the internet. The artifacts are stunning, the data is public, and the only thing missing is a search layer that speaks the same language as the people who’d love to discover them.