The idea
A self-hosted bookmark service with one twist: the moment you save a URL, the page is archived as a self-contained snapshot (HTML + assets + screenshot) to your own storage. When the site eventually goes down or hides behind a paywall, your copy still works. The reading view shows the live page when available and falls back to the local snapshot automatically.
Why build this
Link rot is universal — studies put the half-life of a typical web page at 5–7 years, and "cool URIs don't change" remains a fantasy. Pinboard added Wayback fallbacks years ago, but everything else (Pocket, Raindrop, Linkding) treats archiving as either a paid add-on or a manual export. With single-file HTML capture tools (SingleFile, monolith) now mature, a personal archive that actually outlives the open web is a weekend project rather than an infrastructure undertaking.
Stack sketch
- Frontend: SvelteKit — the bookmark form, list, and reading view. Tailwind for styling.
- Backend: A Go service handling the archive worker queue (high concurrency, cheap memory).
- Capture engine:
monolith(Rust CLI) for self-contained HTML, plus a headless Chromium screenshot viachromedp. - Storage: SQLite for metadata, plain filesystem (or S3-compatible like MinIO) for snapshots.
- Browser extension: A Manifest V3 extension with a single "Save here" button — no popup form, instant capture.
Scope for v1
In: - Save URL via web form, browser extension, or POST to an HTTP endpoint. - Background worker captures HTML snapshot + 1280×720 screenshot. - List view with title, snapshot thumbnail, source domain, save date. - Reading view: prefer live page, fall back to local snapshot on HTTP error or DNS failure. - Full-text search across captured content (SQLite FTS5).
Out: - Tagging, folders, sharing — pure save-and-find for v1. - PDF rendering of snapshots. - Mobile app — the extension plus a responsive web UI is enough.
Where it could go
The most interesting next branch is change tracking: re-capture saved pages on a schedule, diff the HTML, and surface "this article was edited" or "this page disappeared" alerts. That turns a passive archive into an active research tool — useful for tracking corporate policy pages, government sites, or competitor product pages.
Beyond that: a small federated layer where multiple personal instances share archive coverage of public URLs ("someone else already archived this; want their copy?") would make the network of personal Waybacks more useful than any single one — without requiring a centralized service like archive.org.
Watch out for
Two real risks. First, capturing logged-in pages requires shipping browser cookies through to the headless capture, which is a security minefield — start with public-only capture and add authenticated capture only behind a clear opt-in. Second, storage grows fast: a single rich page can be 5–20 MB with all assets inlined, so a heavy user will hit hundreds of GB within a year. Plan for tiered storage (local SSD for recent, S3/cold for old) before you have a problem, not after.