Auto sound design

Automatically mix music, voice, and SFX levels. The agent detects audio types and sets balanced volumes with fades.


Get a balanced audio mix without touching faders. Auto sound design detects music, voice, and SFX by name/pattern, then sets appropriate levels and fades.

What it does

  • Scans timeline audio tracks
  • Classifies by name and content:
    • Voice — interview, dialogue, narration
    • Music — background tracks, beds
    • SFX — sound effects, stingers
    • Other — uncategorized audio
  • Sets volume targets per type
  • Adds fade in/out where needed
  • Optionally mutes embedded video audio when clean voice audio exists

When to use it

  • Mixed footage — camera audio + music + voiceover
  • Interviews with BGM — duck music under voice
  • Tutorials — ensure narration is clear, music supports
  • Social clips — punchy music-forward, voice audible

How to use

Quick mix:

"Auto sound design"

"Balance the audio levels"

With specifics:

"Mix with voice upfront"

"Music-forward mix for TikTok"

"Auto sound design and mute any embedded video audio"

Volume targets

The agent applies:

| Type | Target | Notes | |------|--------|-------| | Voice | -6 to -12 LUFS | Clear and prominent | | Music | -20 to -24 LUFS | Supporting, doesn't mask voice | | SFX | -12 to -18 LUFS | Audible but not jarring | | Embedded video | Often muted | Use clean separate audio when available |

Fades

Auto-adds:

  • Music — 1-2 second fade in/out
  • SFX — Quick fade in/out
  • Voice — Subtle fade if abrupt starts

Audio type detection

Based on:

  • Asset names — "music," "bgm," "voice," "VO," "sfx"
  • Track position — Audio track 0 (video embedded) vs separate tracks
  • Audio characteristics — stereo, mono, branching

Ducking and sidechain

When voice + music coexist:

  • Voice remains clear (-6 to -12 LUFS)
  • Music drops to support (-20 to -24 LUFS)
  • Relative levels preserve intelligibility

Tips

  • Name assets clearly — "music-upbeat," "voice-intro" helps detection
  • Separate tracks — keep voice on its own track for clean detection
  • Preview after mix — adjust if specific moments need more/less
  • Manual override — fine-tune any clip after auto-mix; only affected clips are processed

Limitations

  • Uses filename heuristics, not audio analysis (doesn't "hear" the content)
  • Sets constant levels per clip (no dynamic ducking during playback)
  • Works on clip volumes, not track automation

See also

Community