Verifiable · Cited · Cloud-hosted

Turn any workflow into a verifiable SOP.

Record your screen and narration with the desktop agent. Citestep transcribes it with Whisper, drafts a step-by-step procedure, and verifies every claim against the source recording before you approve and export to Markdown.

Capture and transcription run on the desktop agent. Processing, storage, and review live in your hosted Citestep workspace.

PLACEHOLDER · hero demo
<video src="hero-demo.mp4" autoplay loop muted />

Replace with a screen-capture demo when assets are ready.

The pipeline

Four deterministic stages, all auditable.

Every artifact — frames, transcripts, draft, citations — is stored and linked, so any step traces back to the moment it came from.

  1. 01

    Capture

    An Electron shell records your screen via mss and your microphone via sounddevice. Chunks stream to a local Python sidecar as you work.

    Electron · Python sidecar
  2. 02

    Transcribe

    Whisper transcribes each audio chunk on the desktop agent. Timestamped transcripts upload to your workspace alongside the matching frames.

    Whisper · on the agent
  3. 03

    Draft

    A multimodal LLM reads the frames and narration, then produces a structured SOP draft with citations pointing back to the source timestamps.

    BYOK LLM · cited drafts
  4. 04

    Verify & export

    A semantic citation verifier checks every claim against the transcript. Failed citations trigger a draft retry. Approve and export to clean Markdown.

    Verifier loop · Markdown out
What's inside

Built for honest, reviewable documentation.

No black-box agents. Every step in a generated SOP traces back to a timestamp in the recording it came from.

Side-by-side review console

Read the generated SOP next to the recorded transcript. Click a citation to jump to the moment it was said. Approve when it's right; reject and regenerate when it isn't.

PLACEHOLDER · /sop/[id] screenshot

Semantic citation verifier

A second LLM pass checks each draft claim against the transcript chunk it cites. Failed citations trigger an automatic retry.

Lightweight desktop capture

A small desktop agent records screen, microphone, and clicks on the operator's machine, then streams them to your hosted workspace — nothing heavy to install or maintain.

Mic selector & live progress

Pick your input device before recording. Watch chunks transcribe and the draft assemble in real time.

Plain Markdown export

Approved SOPs export to clean Markdown — check them into git, paste them into Notion, or drop them in your wiki.

One hosted workspace

Recordings, transcripts, SOPs, and their versions live together in one hosted workspace — searchable and organized, instead of scattered across screen recorders and wikis.

What's built

More than a one-shot generator.

Beyond capture-and-draft, here's what already ships in the current build — verified against the codebase, not a roadmap.

Multi-signal capture

Records screen frames, microphone narration, and OS click events in one session — so a silent click or an unspoken step is still on the record.

Provider-agnostic model routing

Each pipeline stage picks its model from whichever API keys you supply — Anthropic, OpenAI, or Google — with a free local-dev fallback. Bring your own keys; tokens bill to your provider.

Searchable SOP Library

Approved procedures land in a library with fuzzy full-text search, so the right SOP is one query away.

Versioned SOPs

Every approved SOP keeps a version history per session — see how a procedure changed as the work did.

Edit with live citation checks

Fix any step in the browser. On save, every citation is re-parsed and re-grounded against the recording's captured actions.

Cost-aware processing

Empty or too-short recordings are caught before any paid model runs, so tokens aren't spent on captures that can't produce an SOP.

The stack

Boring, proven, and built to last.

No exotic dependencies or proprietary runtime — a standard Python, Next.js, and Postgres stack you can reason about.

Backend

layer
Python
3.11+
FastAPI
HTTP API
SQLAlchemy
ORM
Postgres 16
managed

Frontend

layer
Next.js 14
App Router
TypeScript
strict
Tailwind CSS
styling
shadcn/ui
components

Capture

layer
Electron
desktop shell
mss
screen frames
sounddevice
microphone
MediaRecorder
browser audio

AI pipeline

layer
faster-whisper
local transcription
Model router
BYOK: Claude / GPT / Gemini
Vision captions
Gemini 2.5 Flash
Citation verifier
draft retry loop
Get started

Record your first SOP.

From a screen recording to an approved, cited procedure in four steps.

  1. 1

    Install the agent

    Download the Citestep desktop agent and sign in to your workspace.

  2. 2

    Record the task

    Hit record and walk through the work, narrating as you go — screen, voice, and clicks are captured together.

  3. 3

    Review the draft

    Citestep drafts a cited SOP. Open it next to the recording and click any step to jump to the moment it came from.

  4. 4

    Approve & publish

    Edit anything that needs a tweak, then approve to publish it to your library and export to Markdown.