One voice for all your demos
Anyone can record a demo. DemoVoice replaces the narration with a consistent AI voice, so all your videos sound like they were made together.
✓ Validating ffmpeg and ffprobe...
✓ Extracting audio from source video...
✓ Transcribing narration with OpenAI Whisper...
✓ Splitting into 12 timed segments...
✓ Synthesizing voice with gpt-4o-mini-tts...
✓ Adjusting timing to match original...
✓ Muxing audio with video stream...
Done! Output saved to demo.demovoice.mp4
The Problem
Demo videos are great docs. But they're a mess to produce.
10 people recording demos
10 different voices, no consistency
Developers uncomfortable on camera
They just don't record, so you get no video
Great content, inconsistent delivery
Videos feel disjointed, not like a cohesive brand
The Solution
Let anyone record. DemoVoice handles the voice.
- Record your demo with any voice, or no voice at all
- DemoVoice transcribes and re-records with a consistent AI voice
- Every video sounds like it belongs to the same brand
One voice.
Every demo.
Built for software demo creators
DemoVoice focuses on one thing: re-voicing demo videos with perfect timing alignment.
Timing Preserved
AI-generated speech is automatically stretched or compressed to fit the exact timing windows of your original narration.
AI Text-to-Speech
Leverage OpenAI's gpt-4o-mini-tts with multiple voice options to create natural-sounding narration for your demos.
Highly Configurable
Fine-tune pace, emotion, timing tolerances, and segment boundaries through YAML configuration files.
CLI-First Design
Built with Go, Cobra, and Viper for fast, scriptable workflows that integrate into your existing toolchain.
Smart Retries
When generated speech doesn't fit the timing window, DemoVoice automatically rewrites and retries for best results.
BYOK Security
Bring your own API keys via environment variables. Provider secrets are never stored in config files.
How it works
A streamlined pipeline for timing-preserved voice replacement.
Extract & Transcribe
DemoVoice extracts the audio track from your video and transcribes it using OpenAI Whisper, capturing the exact timing of each phrase.
Segment & Synthesize
The transcription is split into timed segments. Each segment is synthesized using AI text-to-speech with your chosen voice and settings.
Fit & Align
Generated audio is stretched or compressed to fit the original timing window. If needed, the text is rewritten and regenerated for better fit.
Assemble & Mux
All segments are assembled with proper silence gaps and muxed with the original video stream, creating your final re-voiced demo.