Dress up a talking head with B-roll and cards
Tutorial: add concept B-roll at the exact transcript moments and animated overlay cards — a lower-third intro and a stat callout — to a plain talking-head video.
A talking-head video where nothing on screen ever changes is a scroll-past. This tutorial takes a finished (or rough) talking-head cut and layers in the two things that make it feel produced: B-roll that shows the concepts as they're spoken, and animated overlay cards — a lower-third introducing the speaker, and a stat callout that lands the moment a number is said. About 10 minutes, one prompt plus refinements.
What you'll add
- 2–4 short stock B-roll overlays, each landing on the exact transcript moment where the concept is mentioned
- An animated lower-third card introducing the speaker in the first seconds
- A stat callout card that appears when the speaker says the key number
- Nothing else — restraint is the point
Before you start
- Have a talking-head video on the timeline. It works on a raw clip, but it shines on a cut that's already tight (run the podcast tutorial or one-prompt edit first).
- Captions help even if you don't want them visible: transcribing gives the agent the transcript it uses to place B-roll and cards. If there's no transcript yet, the agent will create one.
Step 1: One prompt
Dress this up. Add stock B-roll at the moments where I explain concrete concepts — keep me as the star, two or three overlays max. Introduce me with a clean lower-third ("Sam Rivera — Founder, Driftwood") in the first seconds, and put a stat card on screen when I say "forty percent".
What the agent actually does
- Reads the transcript. It finds the concrete, showable concepts ("the dashboard", "shipping containers", "a rainy day") and the moments they're spoken. If no transcript is cached, it transcribes first.
- Imports B-roll without placing it.
search_stock_mediawithplace: 'bin'searches the built-in Pexels library and imports matching clips into the media bin only — nothing lands on the timeline yet. The tool returns the imported media ids. - Overlays at the transcript moments.
insert_broll_from_assetstakes those media ids plus the exact times and places each clip as an overlay while you keep talking underneath. Defaults: up to 4 placements, ~2.5 seconds each. The speaker's audio never cuts — B-roll here is picture, not a scene change. - Bakes the cards.
add_overlay_cardcreates each card as a designed, animated graphic with a transparent background, timed to the transcript: the lower-third (presetlt-clean-bar) in the opening seconds withtitle: "Sam Rivera",subtitle: "Founder, Driftwood", and adata-calloutcard starting at the exact second "forty percent" is spoken. Cards default to 4 seconds on screen. They're baked to an animated image at creation, so they play identically in preview and export. - Checks the result. The finish sequence (
critique_edit→verify→review_edit) includes lint checks for exactly the mistakes this kind of layering invites: overlapping text and text outside the frame. See How the agent checks its work.
What the timeline looks like after
Your talking-head cut untouched on the main track; a few short video overlays and two card elements on the overlay tracks, each aligned to a spoken moment. Every element is a normal timeline clip — drag, retime, or delete like anything else.
Step 2: Refine conversationally
Swap a B-roll shot:
The clip over "shipping containers" is a port at night — find a daytime one instead.
Retime a card:
The stat card comes in half a second late — start it right on the word "forty".
Different card style:
Try the Soft Pill lower-third instead, with our brand orange (#ff5a36) as the accent.
Add a follow CTA at the end:
Add a TikTok follow card with @driftwood for the last 3 seconds.
Picking cards: the 23-preset library
There are 23 card presets, organized by what they do. A few worth knowing by name:
| Category | Examples |
|---|---|
| Lower thirds (11) | Clean Bar (the default intro card), Soft Pill, Dark Card, Bold Block, YouTube Subscribe |
| Data / stats (5) | Data Callout, Stat Count-Up (rolling digits — great for the money number), Pull Quote, Bar Chart, Code Card |
| Social (5) | X Post (put the receipts on screen), Reddit Post, Follow CTA (tiktok / instagram / youtube variants), Notification, Music Card |
| Misc (2) | News Ticker, Outro Card |
You can name them in plain language ("a clean lower-third", "a count-up stat card") or browse them yourself in the Cards tab of the asset panel. Full list and parameters: Animated overlay cards.
Taste: when to stop
The agent already holds itself to this, and you should too when refining:
- 1–2 cards per 30 seconds, maximum. A card should mark the most important moment on screen; three cards a minute means nothing is important.
- One text block on screen at a time. A card over captions over a hook is noise — the lint check will flag it, but don't ask for it in the first place.
- B-roll shows nouns, not vibes. Cut to the dashboard when the speaker says "dashboard". Generic "office people typing" B-roll under an unrelated sentence reads as filler.
- The speaker stays the star. If more than a third of the video is covered by overlays, you've made a slideshow.
Troubleshooting
The B-roll ended up appended to the end of my video instead of overlaid. That's place: 'append' behavior (the stock tool's default when used for building sequences). Say "undo that — import the stock to the bin and overlay it while I'm talking instead," which is the place: 'bin' + insert_broll_from_assets path.
Stock search returns nothing usable for a niche concept. Pexels is broad but generic — "Kubernetes control plane" won't hit. Ask for the nearest visual metaphor ("server racks", "traffic control room"), or import your own screen recording and tell the agent to use that asset as the overlay.
A card sits over the captions. Ask the agent to move it: "the lower-third overlaps the captions — move it up" — or nudge the card element in the preview. The review_edit pass usually catches this, but a manual eye never hurts.
Cards are missing when you open the project on another device. Card graphics are stored in your browser's local storage (IndexedDB), so a project opened elsewhere loses them. Re-add the cards on the new device, or export from the machine you built on. This is a known limitation.