ideas.
May 19, 2026 3 min read desktopconsumerproductivity

Screenshot archive with local full-text search

A desktop background process that watches your screenshots folder, OCRs every new image, and lets you search past screenshots by their text content — no cloud upload required.

The idea

A tray app that runs silently in the background, watches your screenshots folder, and passes each new image through Tesseract OCR when it arrives. Extracted text is indexed in a local SQLite database with FTS5. A minimal search window — triggered by a global hotkey — lets you search the full text of every screenshot you have ever taken and jump straight to the file.

Why build this

Screenshots are a terrible way to capture information. You take them because it's fast, then they pile up in a folder named by timestamp. Six months later you can't find the one with the error message or the UI mockup you wanted to reference. Cloud tools like Windows Recall and Rewind for macOS index screen recordings continuously, but they require specific hardware, upload to a server, or record everything rather than just explicit captures. A focused tool that only processes files you intentionally save to a folder, runs locally, and stores nothing remotely is a narrower but more trustworthy solution.

The technical ingredients are all mature and freely available. Tesseract is stable, ships in every major Linux distribution, and is installable on macOS with a single Homebrew command. SQLite FTS5 is built into Python's standard library. There is no infrastructure to stand up.

Stack sketch

  • Language: Python, distributed as a single pipx-installable package with a screenshots entry point
  • Folder watcher: watchdog library for cross-platform file system events (macOS FSEvents, Linux inotify, Windows ReadDirectoryChangesW)
  • OCR: pytesseract wrapping the system Tesseract binary; optionally route images through a local Ollama llava model for better accuracy on dark-mode UI or small fonts
  • Storage: SQLite with FTS5 via Python's built-in sqlite3 — one row per image with path, creation timestamp, OCR text, and a SHA-256 hash to skip re-processing on re-saves
  • Search UI: tkinter popup window (stdlib, zero extra install); results show filename, a text snippet, and creation date; clicking a result opens the file in the default image viewer
  • Tray icon: pystray with a right-click menu for pause, force reindex, and settings
  • Hotkey: keyboard library for a configurable global shortcut defined in config.toml

Scope for v1

  • Watch one configurable folder; process JPEG and PNG files only
  • OCR on file creation; skip files under 5 KB (likely blank captures or tiny icons)
  • SQLite FTS5 index; snippet highlighting on matched terms
  • reindex CLI subcommand to process an existing folder of historical screenshots
  • Tray icon with a pause toggle and an indicator icon while OCR is in progress
  • Deliberately out: thumbnail previews in search results, multi-folder watching, semantic embedding search, automatic screenshot triggering, Windows installer or macOS .app bundle, any network component

Where it could go

The first expansion worth adding is thumbnail rendering in search results. Seeing a small preview alongside the matching snippet makes it far easier to pick the right screenshot when multiple images share similar text — "error" appears in a lot of screenshots. Storing a downsized JPEG as a BLOB in SQLite keeps everything in one file without needing a separate image server.

The second direction is semantic search layered on top of FTS5. Each OCR result can be embedded with a local nomic-embed-text model (Ollama ships it), stored in a sqlite-vec virtual table, and queried by cosine similarity. This handles the case where you remember the meaning of a screenshot but not its exact words — "the chart that showed user drop-off" rather than "retention funnel 43%". FTS5 and vector search can run in parallel and results merged by rank.

Watch out for

Tesseract accuracy degrades on screenshots with dark-mode UI, small font sizes, or text on non-white backgrounds — common in developer tools, which is exactly the audience most likely to want this. Test against a sample of your own screenshots before shipping Tesseract as the default. If results are poor on the majority of captures, wire in Ollama with llava as a fallback: it handles dark UI correctly and the 1–3 s per image latency is acceptable for a background indexer.