The idea
You dictate a rambling voice memo on your commute — half brain-dump, half to-do list — and the app turns it into structured tasks by the time you sit down at your desk. It uses OpenAI Whisper for transcription, then a small extraction prompt to pull out action items, optional deadlines, and project labels. The result lands in a local task store you can export to JSON or push to a task manager of your choice.
Why build this
Voice is faster than typing for capturing ideas in motion, but the output — an unstructured audio blob — is nearly useless for task management. Converting audio to text is a solved problem; converting that text to structured tasks is still a manual step for most people. Whisper is now fast and accurate enough to run on-device via whisper.cpp, and extraction with a small language model is cheap and reliable. The people who would use this are already recording voice memos — they just need the extraction layer.
Stack sketch
- React Native (Expo) for cross-platform iOS/Android
react-native-whisperwrapping whisper.cpp for on-device transcription (no audio leaves the phone)- Claude Haiku or a local Ollama model for task extraction via a structured-output prompt returning
[{ task, deadline?, project? }] - SQLite via
expo-sqlitefor local task storage - Optional: n8n webhook to forward accepted tasks to Todoist, Linear, or Notion
Scope for v1
- Record a voice memo in-app or import an existing audio file from the share sheet
- Transcribe on-device with the Whisper small model (works offline)
- Extract tasks with a single-shot prompt; return a JSON array of task objects
- Review screen: accept, edit, or discard each extracted task individually
- Export accepted tasks to a plain JSON file
- Out of scope: cloud sync, user accounts, native integrations with third-party task managers, background processing
Where it could go
The most direct expansion is native task-manager integration — push accepted tasks straight into Todoist, Apple Reminders, or Linear without a copy-paste step. A share-sheet extension would let users pipe recordings from the stock Voice Memos app directly into the extraction flow without switching to a separate recorder.
A longer-term direction is passive meeting capture: run transcription on a recorded call, extract all action items with speaker attribution, and deliver each person their own list. That requires diarization (Pyannote or Whisper's word-level timestamps) and a more complex review UX, but the extraction core carries over unchanged.
Watch out for
Whisper's small model makes consistent errors on proper nouns, internal project names, and fast speech, so users need a low-friction editing step before any task is committed — don't skip the accept/edit screen. Recording length also matters: on-device transcription of a five-minute memo is fine, but longer recordings can spike battery and take thirty-plus seconds; cap v1 at five minutes and make the limit visible before recording starts.