Configuration

DemoVoice uses YAML configuration to control voice, timing, and provider settings.

Config File Location

Configuration is stored in .demovoice/demovoice.yaml relative to your current directory. Run demovoice init to create a default config. Use --config to point at any other file, or--dir to point at another project directory.

Full Configuration Reference

.demovoice/demovoice.yaml

profile: default

providers:
  stt:
    provider: openai
    model: whisper-1
  tts:
    provider: openai
    model: gpt-4o-mini-tts
    voice: alloy

presets:
  - tech-demo

glossaries:
  - glossary.yaml

profiles:
  default:
    pace: original
    emotion: neutral
    voice_instructions: ""
    preserve_timing: true
    max_segment_stretch: 1.12
    max_segment_compress: 0.88
    max_tempo_delta: 0.05
    max_forced_tempo: 1.3
    min_segment_seconds: 1.8
    max_segment_seconds: 8.0
    max_phrase_seconds: 3.0
    silence_padding_ms: 350
    rewrite_max_retries: 4
    segment_concurrency: 4

Top-Level Keys

Key	Description	Default
profile	Active profile name; selects an entry from `profiles:`.	default
presets	Built-in presets to load before project glossaries. `tech-demo` is the only preset shipped today.	[tech-demo]
glossaries	Project glossary files, resolved relative to the project directory.	auto-detect `glossary.yaml`

Provider Settings

Speech-to-Text (STT)

Option	Description	Default
provider	STT provider. Only `openai` is supported today.	openai
model	Transcription model.	whisper-1

Text-to-Speech (TTS)

Option	Description	Default
provider	TTS provider. Only `openai` is supported today.	openai
model	TTS model.	gpt-4o-mini-tts
voice	Voice identifier supported by the TTS model.	alloy

Available Voices

Any voice supported by your TTS model. OpenAI's commonly available voices include:

alloyashballadcedarcoralechofablemarinnovaonyxsageshimmerverse

Profile Options

Each entry under profiles: defines a named profile. Profiles control pacing, delivery, and timing tolerances. Voice and model live under providers.tts, not the profile.

Option	Description	Default
pace	Suggested speaking pace passed to the TTS model.	original
emotion	Suggested emotional tone passed to the TTS model.	neutral
voice_instructions	Freeform instructions to steer delivery (accent, energy, style).	""
preserve_timing	Fit generated speech into the original segment timing.	true
max_segment_stretch	Max stretch factor (1.12 = 12% slower).	1.12
max_segment_compress	Max compress factor (0.88 = 12% faster).	0.88
max_tempo_delta	Preferred tempo window before content fitting kicks in.	0.05
max_forced_tempo	Worst-case tempo fallback so speech does not become unusably fast.	1.3
min_segment_seconds	Avoid tiny TTS windows below this length.	1.8
max_segment_seconds	Soft upper bound for a single segment.	8.0
max_phrase_seconds	Initial phrase target before sentence-fragment repair.	3.0
silence_padding_ms	Silence required before a new segment is created.	350
rewrite_max_retries	How many minimal text-fit retries to attempt per segment.	4
segment_concurrency	Parallel TTS generation.	4

Environment Variables

Provider secrets are not stored in config and must be set via environment variables. Any other config key can be overridden using the DEMOVOICE_ prefix (dots become underscores).

# Required
export OPENAI_API_KEY=sk-your-key-here

# Optional overrides
export DEMOVOICE_PROFILE=production
export DEMOVOICE_PROVIDERS_TTS_VOICE=cedar

Next step: Learn about profiles to manage multiple timing configurations.