Everything you can ask for

A plain-English reference of everything the editing agent can do — what to ask for, what happens, and the defaults that matter. Organized by job.

This is the full menu of what the editing agent can do for you. You never pick operations from a list — you describe an outcome in plain English and the agent works out how to get there — but knowing what's possible helps you ask for exactly the edit you want. You can point the agent at things with anchors like @project, @clip, or @range(0:05-0:12).

Some capabilities are read-only: they look at your project, media, or render and report back without changing the timeline. Everything else makes normal timeline edits you can undo with Cmd/Ctrl+Z.

Plan and inspect

Ask for a plan first — instead of editing, the agent drafts a reviewable, numbered plan. You get a plan card with Run plan / Discard (and the option to save it as a recipe). This happens when you explicitly ask to see a plan; normally the agent just edits.
Set the storyboard — the agent writes the editorial plan for the scene: a one-line thesis, a target length, and ordered beats (hook, context, payoff…) tied to time ranges in your footage. It's shown as a card above the chat and kept across turns. See /docs/agent/storyboard.
Size up the project — read-only. A quick intelligence pass over the project: what footage you have, whether a transcript and visual analysis exist yet, the likely output format, and what it would do next. The agent runs this on its own before "figure it out" edits, and you can ask for it directly:

Look at what's in this project and tell me what you'd make from it — don't edit anything yet.
Analyze your footage visually — the agent has AI actually look at your media (full-video understanding first, falling back to labeling a sparse set of frames) and remembers scene summaries, the best ranges, suggested hooks, quality and caption risks, and cropping guidance for later edits. It never changes the timeline, but it does count against the free plan's AI budget. See /docs/agent/visual-analysis.
Inspect one clip — read-only. Pulls the full details of a single timeline clip when you (or the agent) need more than the project summary shows.

Get media

Find stock footage or photos — searches the built-in Pexels stock library (free, commercially usable). By default the results land one after another on the main track; you can also ask for them to just be imported into your media bin so they can be placed later as B-roll at exact moments. It fetches 3 results by default (up to 8). See /docs/media/stock.
Add music or a sound effect — searches the built-in Freesound library (all CC0). Ask for background music and it adds a music track; ask for a sound effect and it drops a short SFX at the moment you name.
Generate a voiceover — turns your script into text-to-speech narration on an audio track. 10 voices to choose from (Alloy by default), speed from half to double, and up to 4096 characters per request. See /docs/agent/text-to-speech.
Lay out your imports — takes files you've imported but haven't placed yet and appends them to the timeline: video and images onto the main track, audio onto a new audio track. You can name exactly which files.
Place B-roll overlays — overlays images or videos you've already imported as B-roll, either at beats the transcript suggests or at the exact times you give. Defaults: up to 4 placements of about 2.5 seconds each. It never fetches media by itself — ask the agent to pull stock into your bin first, then place it. See /docs/agent/b-roll-insert.

Stock, sounds, and voiceover need their provider keys configured on the server; without them these features step aside gracefully instead of failing the whole run.

Build the cut

Close silent gaps — closes the dead air between adjacent clips on the main track (any gap longer than 0.3 seconds by default) by sliding later clips left. This is gap-closing only — it doesn't listen to the audio or cut pauses inside one continuous clip; for that, ask it to remove fillers instead. See /docs/agent/cut-silences.
Cut by timestamps — tell the agent which parts of a video to keep ("keep 0:00–0:35 and 0:46–0:54") and it stitches them into a clean cut on the main track. If you uploaded a separate clean audio recording, it mirrors the same cuts onto it by default — and you can also ask it to rebuild just the audio to match the video's existing trims, which recovers sync after video-only cuts.

Keep 0:00–0:35, 0:46–0:54, and 2:10–2:30, and cut my clean audio file to the same ranges.
Remove ums and repeated takes — cuts filler-only moments (um, uh, you know) and near-duplicate repeated takes, then ripples every track so everything stays in sync. It needs the scene transcript that captioning produces, and it works at the level of transcript segments, not individual words. You can ask it to remove only fillers or only repeats; a few hundredths of a second of padding is kept around each cut.

Caption this first, then cut the ums and the repeated takes.
Keep only the best lines — keeps the highest-signal parts of the transcript up to a target length (30 seconds by default), optionally steered by a goal you state ("focus on the pricing announcement"), and removes the rest with everything re-synced. Needs a transcript first. See /docs/agent/transcript-highlights.
Build a highlight reel — assembles the best moments using AI-ranked visual ranges when your footage has been analyzed, otherwise the highest-energy audio moments. Defaults to a 30-second reel built from roughly 4-second windows. See /docs/agent/extract-highlights.
Trim to an exact length — hard-trims the main track to the length you name, splitting whatever clip straddles the cut point.
Compress to a time budget — fits a target length while keeping as much content as possible: first closes gaps, then speeds clips up slightly (up to 1.25× by default), and only trims the tail as a last resort.

Get this under 60 seconds — speed things up slightly before you trim anything off the end.
Cut to the beat — splits the main track at beat-like peaks in the audio's energy. Up to 12 cuts by default, spaced at least 1.2 seconds apart. It follows energy and onsets, not full BPM tracking.

Add an energetic music track, then cut the montage to the beat.
Mute the camera audio — turns off the embedded camera sound on your video clips so a separate clean audio track carries the voice. Overlay clips keep their sound unless you say otherwise.

Mute the camera audio on all clips and use my uploaded clean audio track instead.
Rearrange specific clips — structural edits on the clips you point at: delete, duplicate, split at a time, group into a compound clip, link or unlink, or move.

Frame and style

Go vertical — sets the canvas to 1080×1920 and scales every visual clip (main track and overlays) to fill or fit the frame. This is a straight resize — no face tracking or source cropping.
Reframe around the subject — a smarter vertical reframe that uses what the visual analysis found (run that first): faces are framed toward the top, screen recordings and text fit instead of cropping, products sit lower in frame. You can steer the focus — face, product, screen, center, left, or right — or let it decide. It's heuristic framing, not pixel-level tracking. See /docs/agent/smart-reframe.
Fit visuals to the canvas — scales and centers visuals to fit or fill the current canvas without changing its size.
Flatten B-roll onto the main track — moves video and image clips off the overlay tracks and lines them up one after another on the main track; text and caption overlays stay put. The agent does this before vertical social cuts so B-roll isn't left stacked as picture-in-picture.

Take the B-roll off the overlay tracks and lay everything out one after another on the main track.
Add zoom punches — short zoom-in punches for energy. By default it places 5 subtle ones (about a 12% push-in) automatically, or you can name exact moments.

Add subtle zoom punches at 3, 9, and 15 seconds.
Add transitions — fade, fade-in/out, crossfade, dip, slides, zooms, spin, pop, or flash. Ask for whole clips to animate in and out (crossfade blends neighbors), or name exact seconds to drop a brief transition anywhere — even inside one continuous clip. Transitions default to 0.35 seconds.
Apply effects and color grades — adds or removes a visual effect or grade: brightness/contrast/saturation adjustments, LUT presets, blur, glow, vignette, grain, sharpen, chromatic aberration, pixelate, posterize, invert, or chroma-key.
Adjust clip properties — sets a clip's position and size, opacity, volume (in dB), audio fades, playback speed (pitch-preserved by default), visibility, or name — and clears any conflicting keyframes so the new value actually holds.
Apply a style pack — one of Viral Cut, Clean Educator, Cinematic Doc, UGC Ad, Neon Gaming, or Warm Founder: background, clip treatment, and text styling in one pass. Only the agent can apply these. See /docs/agent/style-packs.
Apply a creator template — a full workflow in one ask: TikTok Highlight, Podcast Clip, UGC Ad, Product Demo, Cinematic Story, or Music Cutdown. It sets the canvas direction, style pack, opening hook, and caption look. The caption look only applies when captions already exist — templates never transcribe on their own. See /docs/agent/creator-templates.
Change project settings — canvas size, background color, adding/removing/muting/hiding tracks, and adding, renaming, or deleting scenes. See /docs/editing/scenes.

Text, cards and sound

Caption the video — transcribes the timeline's speech and adds styled captions as a reviewable text track (TikTok-style by default). Captions are timed to speech at the sentence-segment level with syllable-weighted pacing — not word-by-word. This is also what produces the scene transcript that filler removal and transcript highlights depend on.
Restyle the captions — swaps the caption look without retranscribing: TikTok Bold, Minimal Clean, Boxed Contrast, Editorial Highlight, Neon Pop, or Documentary Lower. See /docs/agent/caption-skins.
Add an opening hook — inserts a styled hook text overlay at the start (3 seconds by default; bold, boxed, or minimal looks). The best hooks quote your own transcript rather than stock phrases.
Add a title or text — a styled text overlay: font, size, weight, color, alignment, position, and an optional background box.
Add a shape — a graphic overlay: rectangle, ellipse, triangle, star, heart, arrow, line, speech bubble, progress bar, and more.
Add an animated overlay card — a transparent-background animated graphic: 11 lower-third designs (like Clean Bar), data callouts and stat count-ups, charts, code cards, X Post receipts, follow CTAs, a news ticker, and an outro card — plus fully custom cards the agent can design from scratch (self-contained, animated with CSS only). See /docs/editing/animated-overlay-cards.
Balance the audio mix — a creator-style mix of the audio already on your timeline: voice at 0 dB, music beds at −18 dB, sound effects at −8 dB, quarter-second fades, and it can mute embedded camera audio when a separate voice track exists. It mixes what's there — it doesn't fetch music. See /docs/agent/sound-design.

Check the work

Run pass/fail checks — read-only. Simple checks against the project: is the video under a target length, does it have captions, does it have media, is the canvas portrait, is the timeline non-empty.
Get a critique — read-only. A quality review of the edit — content, canvas, captions, hook, overlay mistakes, duration, and transcript repetition — through the lens you pick: social (the default), podcast, UGC ad, product demo, story, or music.
Have AI watch the edit — read-only. Renders the current cut and has AI watch it like an editor would: readability, pacing dead spots, audio problems, the hook, and a ship/no-ship verdict with concrete fixes. Reviews work for cuts up to 2 minutes; longer timelines are skipped.

The agent runs these on its own before finishing — and a quality gate sends it back to fix things if its latest check failed. See /docs/agent/quality-checks.