Archive-on-bookmark

Save a URL and the page is captured to your own server forever — outlives broken links, paywalls, and dead sites.

The idea

A self-hosted bookmark service with one twist: the moment you save a URL, the page is archived as a self-contained snapshot (HTML + assets + screenshot) to your own storage. When the site eventually goes down or hides behind a paywall, your copy still works. The reading view shows the live page when available and falls back to the local snapshot automatically.

Why build this

Link rot is universal — studies put the half-life of a typical web page at 5–7 years, and "cool URIs don't change" remains a fantasy. Pinboard added Wayback fallbacks years ago, but everything else (Pocket, Raindrop, Linkding) treats archiving as either a paid add-on or a manual export. With single-file HTML capture tools (SingleFile, monolith) now mature, a personal archive that actually outlives the open web is a weekend project rather than an infrastructure undertaking.

Stack sketch

Frontend: SvelteKit — the bookmark form, list, and reading view. Tailwind for styling.
Backend: A Go service handling the archive worker queue (high concurrency, cheap memory).
Capture engine: monolith (Rust CLI) for self-contained HTML, plus a headless Chromium screenshot via chromedp.
Storage: SQLite for metadata, plain filesystem (or S3-compatible like MinIO) for snapshots.
Browser extension: A Manifest V3 extension with a single "Save here" button — no popup form, instant capture.

Scope for v1

In: - Save URL via web form, browser extension, or POST to an HTTP endpoint. - Background worker captures HTML snapshot + 1280×720 screenshot. - List view with title, snapshot thumbnail, source domain, save date. - Reading view: prefer live page, fall back to local snapshot on HTTP error or DNS failure. - Full-text search across captured content (SQLite FTS5).

Out: - Tagging, folders, sharing — pure save-and-find for v1. - PDF rendering of snapshots. - Mobile app — the extension plus a responsive web UI is enough.

Where it could go

The most interesting next branch is change tracking: re-capture saved pages on a schedule, diff the HTML, and surface "this article was edited" or "this page disappeared" alerts. That turns a passive archive into an active research tool — useful for tracking corporate policy pages, government sites, or competitor product pages.

Beyond that: a small federated layer where multiple personal instances share archive coverage of public URLs ("someone else already archived this; want their copy?") would make the network of personal Waybacks more useful than any single one — without requiring a centralized service like archive.org.

Watch out for

Two real risks. First, capturing logged-in pages requires shipping browser cookies through to the headless capture, which is a security minefield — start with public-only capture and add authenticated capture only behind a clear opt-in. Second, storage grows fast: a single rich page can be 5–20 MB with all assets inlined, so a heavy user will hit hundreds of GB within a year. Plan for tiered storage (local SSD for recent, S3/cold for old) before you have a problem, not after.