Turn any workflow into a verifiable SOP.
Record your screen and narration with the desktop agent. Citestep transcribes it with Whisper, drafts a step-by-step procedure, and verifies every claim against the source recording before you approve and export to Markdown.
Capture and transcription run on the desktop agent. Processing, storage, and review live in your hosted Citestep workspace.
Replace with a screen-capture demo when assets are ready.
Four deterministic stages, all auditable.
Every artifact — frames, transcripts, draft, citations — is stored and linked, so any step traces back to the moment it came from.
- 01
Capture
An Electron shell records your screen via mss and your microphone via sounddevice. Chunks stream to a local Python sidecar as you work.
Electron · Python sidecar - 02
Transcribe
Whisper transcribes each audio chunk on the desktop agent. Timestamped transcripts upload to your workspace alongside the matching frames.
Whisper · on the agent - 03
Draft
A multimodal LLM reads the frames and narration, then produces a structured SOP draft with citations pointing back to the source timestamps.
BYOK LLM · cited drafts - 04
Verify & export
A semantic citation verifier checks every claim against the transcript. Failed citations trigger a draft retry. Approve and export to clean Markdown.
Verifier loop · Markdown out
Built for honest, reviewable documentation.
No black-box agents. Every step in a generated SOP traces back to a timestamp in the recording it came from.
Side-by-side review console
Read the generated SOP next to the recorded transcript. Click a citation to jump to the moment it was said. Approve when it's right; reject and regenerate when it isn't.
Semantic citation verifier
A second LLM pass checks each draft claim against the transcript chunk it cites. Failed citations trigger an automatic retry.
Lightweight desktop capture
A small desktop agent records screen, microphone, and clicks on the operator's machine, then streams them to your hosted workspace — nothing heavy to install or maintain.
Mic selector & live progress
Pick your input device before recording. Watch chunks transcribe and the draft assemble in real time.
Plain Markdown export
Approved SOPs export to clean Markdown — check them into git, paste them into Notion, or drop them in your wiki.
One hosted workspace
Recordings, transcripts, SOPs, and their versions live together in one hosted workspace — searchable and organized, instead of scattered across screen recorders and wikis.
More than a one-shot generator.
Beyond capture-and-draft, here's what already ships in the current build — verified against the codebase, not a roadmap.
Multi-signal capture
Records screen frames, microphone narration, and OS click events in one session — so a silent click or an unspoken step is still on the record.
Provider-agnostic model routing
Each pipeline stage picks its model from whichever API keys you supply — Anthropic, OpenAI, or Google — with a free local-dev fallback. Bring your own keys; tokens bill to your provider.
Searchable SOP Library
Approved procedures land in a library with fuzzy full-text search, so the right SOP is one query away.
Versioned SOPs
Every approved SOP keeps a version history per session — see how a procedure changed as the work did.
Edit with live citation checks
Fix any step in the browser. On save, every citation is re-parsed and re-grounded against the recording's captured actions.
Cost-aware processing
Empty or too-short recordings are caught before any paid model runs, so tokens aren't spent on captures that can't produce an SOP.
Boring, proven, and built to last.
No exotic dependencies or proprietary runtime — a standard Python, Next.js, and Postgres stack you can reason about.
Backend
layer- Python
- 3.11+
- FastAPI
- HTTP API
- SQLAlchemy
- ORM
- Postgres 16
- managed
Frontend
layer- Next.js 14
- App Router
- TypeScript
- strict
- Tailwind CSS
- styling
- shadcn/ui
- components
Capture
layer- Electron
- desktop shell
- mss
- screen frames
- sounddevice
- microphone
- MediaRecorder
- browser audio
AI pipeline
layer- faster-whisper
- local transcription
- Model router
- BYOK: Claude / GPT / Gemini
- Vision captions
- Gemini 2.5 Flash
- Citation verifier
- draft retry loop
Record your first SOP.
From a screen recording to an approved, cited procedure in four steps.
- 1
Install the agent
Download the Citestep desktop agent and sign in to your workspace.
- 2
Record the task
Hit record and walk through the work, narrating as you go — screen, voice, and clicks are captured together.
- 3
Review the draft
Citestep drafts a cited SOP. Open it next to the recording and click any step to jump to the moment it came from.
- 4
Approve & publish
Edit anything that needs a tweak, then approve to publish it to your library and export to Markdown.