Auto sound design

Automatically mix music, voice, and SFX levels. The agent detects audio types and sets balanced volumes with fades.

Get a balanced audio mix without touching faders. Auto sound design detects music, voice, and SFX by name/pattern, then sets appropriate levels and fades.

What it does

Scans timeline audio tracks
Classifies by name and content:
- Voice — interview, dialogue, narration
- Music — background tracks, beds
- SFX — sound effects, stingers
- Other — uncategorized audio
Sets volume targets per type
Adds fade in/out where needed
Optionally mutes embedded video audio when clean voice audio exists

When to use it

Mixed footage — camera audio + music + voiceover
Interviews with BGM — duck music under voice
Tutorials — ensure narration is clear, music supports
Social clips — punchy music-forward, voice audible

How to use

Quick mix:

"Auto sound design"

"Balance the audio levels"

With specifics:

"Mix with voice upfront"

"Music-forward mix for TikTok"

"Auto sound design and mute any embedded video audio"

| Type | Target | Notes | |------|--------|-------| | Voice | -6 to -12 LUFS | Clear and prominent | | Music | -20 to -24 LUFS | Supporting, doesn't mask voice | | SFX | -12 to -18 LUFS | Audible but not jarring | | Embedded video | Often muted | Use clean separate audio when available |

Fades

Auto-adds:

Music — 1-2 second fade in/out
SFX — Quick fade in/out
Voice — Subtle fade if abrupt starts

Audio type detection

Based on:

Asset names — "music," "bgm," "voice," "VO," "sfx"
Track position — Audio track 0 (video embedded) vs separate tracks
Audio characteristics — stereo, mono, branching

Ducking and sidechain

When voice + music coexist:

Voice remains clear (-6 to -12 LUFS)
Music drops to support (-20 to -24 LUFS)
Relative levels preserve intelligibility

Tips

Name assets clearly — "music-upbeat," "voice-intro" helps detection
Separate tracks — keep voice on its own track for clean detection
Preview after mix — adjust if specific moments need more/less
Manual override — fine-tune any clip after auto-mix; only affected clips are processed

Limitations

Uses filename heuristics, not audio analysis (doesn't "hear" the content)
Sets constant levels per clip (no dynamic ducking during playback)
Works on clip volumes, not track automation