Self-hosted on-call rotation scheduler

A lightweight web app for small engineering teams to manage on-call rotations, send automated shift reminders, and log incidents without a PagerDuty subscription.

The idea

A small web app where you define your team, set a rotation interval, and get an auto-generated schedule for the next 90 days. The active on-call engineer gets a notification 24 hours before their shift starts. When an incident fires — from a monitoring webhook, a manual trigger in the UI, or an email to a dedicated address — the app logs it and alerts the on-call person via SMS or push notification. Everything runs on a single Docker container on your own VPS, with no per-seat pricing and no vendor account required.

Why build this

PagerDuty and OpsGenie start at $15–20 per user per month, which is real money for a three-person startup or a small internal tools team inside a larger company. The full feature set — ML-based noise reduction, ServiceNow integrations, enterprise SSO — is overkill for teams that need exactly two things: "who's on call this week" and "how do we wake them up when something breaks." Cheaper alternatives like Squadcast and Signl4 still require vendor-managed alerting infrastructure. A self-hosted option running on a $5 VPS closes that gap without ongoing costs.

The technical surface is small: a schedule generator, a job that fires reminders, a webhook endpoint that triggers notifications, and a basic incident log. Nothing exotic.

Stack sketch

Backend: Python + FastAPI; APScheduler for the 24-hour shift reminder and escalation timeout jobs
Storage: SQLite via SQLAlchemy — one file, one volume mount, trivial to back up
Frontend: HTMX + Jinja2 templates; a calendar grid rendered server-side, no JS build step
Notifications: Twilio for SMS; Ntfy (self-hosted or free cloud tier) for push; smtplib for email — team members configure which channels they want
Inbound webhooks: a single POST endpoint that accepts arbitrary JSON; ships with named parsers for Grafana, Uptime Kuma, and Prometheus Alertmanager payloads
Auth: single shared team token in a signed cookie; optional per-user bcrypt passwords for teams that need accountability in the incident log
Deploy: single Docker image, SQLite in a named volume, Traefik label pattern for TLS

Scope for v1

Add team members: name, phone number, email address, Ntfy topic — each member enables only the channels they want
Define one rotation: weekly or biweekly interval, start date, ordered list of members; app generates the schedule forward from that
Calendar grid view showing who is on call each day for the next 90 days, with a clear "on call now" banner on the home page
Inbound webhook endpoint: any POST triggers an incident, logs the raw payload, and notifies the active on-call person
Manual "test alert" button so teams can verify delivery before going live
Incident log: timestamp, trigger source (webhook name or manual), optional resolution note added by the on-call engineer
24-hour shift reminder sent automatically before each new shift starts

Deliberately out of scope for v1: escalation policies, multiple concurrent rotations, recurring overrides, SLA tracking, mobile app, Slack integration.

Where it could go

The most-requested follow-on for any on-call tool is escalation: if the on-call person does not acknowledge within N minutes, notify a secondary. Adding this requires a state machine on the incident — triggered → acknowledged → resolved — and a timed job that fires the escalation on timeout. It is maybe two weekends of work on top of v1, and it closes the gap with PagerDuty's core feature for small teams.

A second direction is richer monitoring integrations. Pre-built parsers for the most common webhook shapes (Grafana, Alertmanager, Uptime Kuma, Checkly) mean teams drop in the URL and it just works without hand-editing a JSON mapping. Pair that with a "silence" endpoint that mutes a specific alert rule during a maintenance window and the tool starts to feel like a real replacement for the lower tiers of commercial products.

Watch out for

SMS delivery through Twilio is unreliable in some regions and subject to carrier filtering for short messages that look like alerts — test with real devices on real carriers before treating SMS as the primary channel. Build Ntfy push as the default and present SMS as a fallback so teams in areas with patchy Twilio coverage are not silently uncontactable during an incident.