Configuration

DemoVoice uses YAML configuration to control voice, timing, and provider settings.

Config File Location

Configuration is stored in .demovoice/demovoice.yaml relative to your current directory. Run demovoice init to create a default config. Use --config to point at any other file, or--dir to point at another project directory.

Full Configuration Reference

.demovoice/demovoice.yaml
profile: default

providers:
  stt:
    provider: openai
    model: whisper-1
  tts:
    provider: openai
    model: gpt-4o-mini-tts
    voice: alloy

presets:
  - tech-demo

glossaries:
  - glossary.yaml

profiles:
  default:
    pace: original
    emotion: neutral
    voice_instructions: ""
    preserve_timing: true
    max_segment_stretch: 1.12
    max_segment_compress: 0.88
    max_tempo_delta: 0.05
    max_forced_tempo: 1.3
    min_segment_seconds: 1.8
    max_segment_seconds: 8.0
    max_phrase_seconds: 3.0
    silence_padding_ms: 350
    rewrite_max_retries: 4
    segment_concurrency: 4

Top-Level Keys

KeyDescriptionDefault
profileActive profile name; selects an entry from profiles:.default
presetsBuilt-in presets to load before project glossaries. tech-demo is the only preset shipped today.[tech-demo]
glossariesProject glossary files, resolved relative to the project directory.auto-detect glossary.yaml

Provider Settings

Speech-to-Text (STT)

OptionDescriptionDefault
providerSTT provider. Only openai is supported today.openai
modelTranscription model.whisper-1

Text-to-Speech (TTS)

OptionDescriptionDefault
providerTTS provider. Only openai is supported today.openai
modelTTS model.gpt-4o-mini-tts
voiceVoice identifier supported by the TTS model.alloy

Available Voices

Any voice supported by your TTS model. OpenAI's commonly available voices include:

alloyashballadcedarcoralechofablemarinnovaonyxsageshimmerverse

Profile Options

Each entry under profiles: defines a named profile. Profiles control pacing, delivery, and timing tolerances. Voice and model live under providers.tts, not the profile.

OptionDescriptionDefault
paceSuggested speaking pace passed to the TTS model.original
emotionSuggested emotional tone passed to the TTS model.neutral
voice_instructionsFreeform instructions to steer delivery (accent, energy, style).""
preserve_timingFit generated speech into the original segment timing.true
max_segment_stretchMax stretch factor (1.12 = 12% slower).1.12
max_segment_compressMax compress factor (0.88 = 12% faster).0.88
max_tempo_deltaPreferred tempo window before content fitting kicks in.0.05
max_forced_tempoWorst-case tempo fallback so speech does not become unusably fast.1.3
min_segment_secondsAvoid tiny TTS windows below this length.1.8
max_segment_secondsSoft upper bound for a single segment.8.0
max_phrase_secondsInitial phrase target before sentence-fragment repair.3.0
silence_padding_msSilence required before a new segment is created.350
rewrite_max_retriesHow many minimal text-fit retries to attempt per segment.4
segment_concurrencyParallel TTS generation.4

Environment Variables

Provider secrets are not stored in config and must be set via environment variables. Any other config key can be overridden using the DEMOVOICE_ prefix (dots become underscores).

# Required
export OPENAI_API_KEY=sk-your-key-here

# Optional overrides
export DEMOVOICE_PROFILE=production
export DEMOVOICE_PROVIDERS_TTS_VOICE=cedar

Next step: Learn about profiles to manage multiple timing configurations.