Configuration
DemoVoice uses YAML configuration to control voice, timing, and provider settings.
Config File Location
Configuration is stored in .demovoice/demovoice.yaml relative to your current directory. Run demovoice init to create a default config. Use --config to point at any other file, or--dir to point at another project directory.
Full Configuration Reference
profile: default
providers:
stt:
provider: openai
model: whisper-1
tts:
provider: openai
model: gpt-4o-mini-tts
voice: alloy
presets:
- tech-demo
glossaries:
- glossary.yaml
profiles:
default:
pace: original
emotion: neutral
voice_instructions: ""
preserve_timing: true
max_segment_stretch: 1.12
max_segment_compress: 0.88
max_tempo_delta: 0.05
max_forced_tempo: 1.3
min_segment_seconds: 1.8
max_segment_seconds: 8.0
max_phrase_seconds: 3.0
silence_padding_ms: 350
rewrite_max_retries: 4
segment_concurrency: 4Top-Level Keys
| Key | Description | Default |
|---|---|---|
| profile | Active profile name; selects an entry from profiles:. | default |
| presets | Built-in presets to load before project glossaries. tech-demo is the only preset shipped today. | [tech-demo] |
| glossaries | Project glossary files, resolved relative to the project directory. | auto-detect glossary.yaml |
Provider Settings
Speech-to-Text (STT)
| Option | Description | Default |
|---|---|---|
| provider | STT provider. Only openai is supported today. | openai |
| model | Transcription model. | whisper-1 |
Text-to-Speech (TTS)
| Option | Description | Default |
|---|---|---|
| provider | TTS provider. Only openai is supported today. | openai |
| model | TTS model. | gpt-4o-mini-tts |
| voice | Voice identifier supported by the TTS model. | alloy |
Available Voices
Any voice supported by your TTS model. OpenAI's commonly available voices include:
Profile Options
Each entry under profiles: defines a named profile. Profiles control pacing, delivery, and timing tolerances. Voice and model live under providers.tts, not the profile.
| Option | Description | Default |
|---|---|---|
| pace | Suggested speaking pace passed to the TTS model. | original |
| emotion | Suggested emotional tone passed to the TTS model. | neutral |
| voice_instructions | Freeform instructions to steer delivery (accent, energy, style). | "" |
| preserve_timing | Fit generated speech into the original segment timing. | true |
| max_segment_stretch | Max stretch factor (1.12 = 12% slower). | 1.12 |
| max_segment_compress | Max compress factor (0.88 = 12% faster). | 0.88 |
| max_tempo_delta | Preferred tempo window before content fitting kicks in. | 0.05 |
| max_forced_tempo | Worst-case tempo fallback so speech does not become unusably fast. | 1.3 |
| min_segment_seconds | Avoid tiny TTS windows below this length. | 1.8 |
| max_segment_seconds | Soft upper bound for a single segment. | 8.0 |
| max_phrase_seconds | Initial phrase target before sentence-fragment repair. | 3.0 |
| silence_padding_ms | Silence required before a new segment is created. | 350 |
| rewrite_max_retries | How many minimal text-fit retries to attempt per segment. | 4 |
| segment_concurrency | Parallel TTS generation. | 4 |
Environment Variables
Provider secrets are not stored in config and must be set via environment variables. Any other config key can be overridden using the DEMOVOICE_ prefix (dots become underscores).
# Required
export OPENAI_API_KEY=sk-your-key-here
# Optional overrides
export DEMOVOICE_PROFILE=production
export DEMOVOICE_PROVIDERS_TTS_VOICE=cedarNext step: Learn about profiles to manage multiple timing configurations.