Introduction

DemoVoice is an open-source CLI tool that re-records voice tracks in demo videos using AI text-to-speech while preserving the timing of the original narration.

Why DemoVoice?

Demo videos are a powerful form of documentation, but creating them at scale presents challenges:

  • 1.Inconsistent voices — When 10 team members record demos, you get 10 different voices with varying audio quality, accents, and speaking styles.
  • 2.Recording reluctance — Many developers are uncomfortable recording their voice, which means great demos never get made.
  • 3.No brand cohesion — Without a consistent voice, your video library feels disjointed rather than professional.

DemoVoice solves this by letting anyone record a demo with any voice (or no voice at all), then replacing the audio track with a consistent AI voice that preserves the original timing.

How It Works

  1. 1

    Extract & Transcribe

    Audio is extracted and transcribed with word-level timestamps using Whisper.

  2. 2

    Segment & Synthesize

    Text is split into segments and synthesized with your chosen TTS voice.

  3. 3

    Fit & Align

    Each segment is time-stretched or compressed to match original timing.

  4. 4

    Assemble & Mux

    Segments are concatenated and muxed back with the original video.

Requirements

  • Go 1.22 or later
  • ffmpeg and ffprobe on PATH
  • OpenAI API key (for Whisper transcription and TTS)