Theo Browne On AI Video Editors — A Builder's Reply

Overview

Two chrome hands shaking over a glowing tweet card, magenta-cyan rim light, a CRT timeline visible behind them

The party

About a week ago, on a Monday night in May, I was at a party in California, holding a chrome-colored rocks glass full of sparkling water and trying to remember anyone's name. Theo Browne — @theo — was there. We ended up in the kind of conversation you can only really have once you've abandoned the small-talk perimeter: AI tools that touch creative work.

I'm not going to pretend I remember exactly what either of us said. I had two seltzers and zero notes. But the spirit of what Theo said to me, as best I can reconstruct it, was something like: video editors should sit down and actually edit their videos, line by line, themselves. If you really care about the work, you'll do the work. He was friendly about it. He wasn't lecturing me. He was being honest the way he is on a podcast — direct, opinionated, willing to stake the position.

I remember thinking two things at the same time. One: he's not wrong about the failure mode. Most "AI video editor" products in 2026 are, frankly, embarrassing. Two: he is — almost certainly — also not the maximalist "AI is for cowards" guy that the discourse sometimes paints AI skeptics as. This is the founder of T3 Chat. This is a public investor in Cursor, whose impact on his own velocity he's called "insane." Theo is a guy who has thought about where AI helps and where it hurts. The party take was a sharp, well-aimed version of a real concern, not a Luddite confession.

So I sat with it. I went home, re-read the public version of his take, and wrote this.

What Theo actually said in public

Here is the version of Theo's position I'm going to engage with on the merits. From May 2025, in reply to one of the many "cursor for video editing" launches:

Why is every AI video editor made by someone who doesn't seem to get video editing?

That's the whole tweet. (@theo, May 2025) Twelve words. It does not say AI shouldn't edit video. It says the people building these AI video editors don't understand video editing. Those are two completely different claims.

The party version and the public version are pointing at the same thing from different angles. The party version is what you say when you've watched ten of these tools demo back-to-back and you're tired. The public version is the sharpened claim you can defend.

Both are a builder's critique. Make the thing well, or don't make the thing. I respect that critique enormously — I'd rather be told my product misses the craft than be told it's "amazing" by someone who's never opened a timeline. Theo and I have had public exchanges about this on X over the past year, and they've always made the roadmap better.

So this post is not a rebuttal. It's a builder's reply.

Mock screenshot of Theo Browne's May 2025 tweet asking why every AI video editor is made by someone who doesn't seem to get video editing

Why he's right about most AI video editors

Let me steelman the critique first, because if I don't, the rest of this post is just me dodging.

Most AI video editors in 2026 are bad. Not "they have rough edges" bad. Conceptually bad. Four mistakes at once:

1. They generate when they should edit. You upload a shoot and the tool slides toward generating new pixels, new B-roll, "AI fill" of stuff you didn't shoot. The output stops being yours. Theo's tweet calls this out in twelve words. Casey Neistat has warned about it for a year in different language. The audience can smell it. 2. They treat editing as one button. "Make it shorter." "Make it pop." "Make it viral." Real editing has fifty decisions in it. A one-button tool collapses fifty decisions into one default. 3. They assume you don't have taste. Most AI editors are designed for someone who doesn't know what a J-cut is, doesn't know where the breath belongs. They condescend by default. 4. They were built by engineers who never sat through a four-hour edit. This is the Theo line. If the people building the tool have never spent a Saturday nudging a fade by one frame at midnight, the tool will show it. No amount of model improvement fixes a product that was wrong at the spec layer.

He's right. He's right four times.

The difference between generative and editorial AI

Here's where my reply starts.

The category Theo is correctly burning down is generative AI for video. Tools whose job is to make video — text-to-video, image-to-video, "AI fill," talking-head generators, the whole pile. There are legitimate uses for some of that. There are also a lot of slop pipelines. The criticism applies cleanly.

But there's a second category the discourse keeps smushing into the first one, and it's the one I've spent the last eighteen months building inside. I call it editorial AI.

Editorial AI doesn't make pixels. It moves them. It takes a timeline of your real footage and performs the same operations a junior editor would: trim, split, ramp, transition, label, sequence. It doesn't generate a "fake you" reading a script. It rearranges the actual you that you actually shot.

The vocabulary matters. Generative means new content. Editorial means new structure on existing content. The audience-trust collapse Theo is worried about is a generative problem. Editorial AI is a different beast — and importantly, it's a beast the editors he respects already use, they just don't call it AI. Avid bins. Scene-detection in Premiere. Auto-transcribe in Descript. Every modern editor's stack has had editorial AI in it for a decade. Nobody's mad.

What makes a new editorial AI tool worth defending isn't that it removes the human. It's that it removes the tax — the click-five-hundred-times tax — so the human can spend more time on the parts only the human can do.

Diagram showing two columns: GENERATIVE AI on the left producing slop, EDITORIAL AI on the right manipulating an existing timeline

What VibeChopper did differently

I'll be specific. Three things, and they map directly back to Theo's critique. See the editorial AI surface yourself free

1. The chat is a control surface, not a generator

The chat panel in VibeChopper looks like a chat panel. It's not. It's a director's intercom.

You type, or you speak, what you want the cut to be: Trim the first five seconds of clip three. Split clip seven at the laugh. Cross-dissolve into scene two. Drop a lower-third on the founder. Polish the timeline for dead air. The model reads your project — every frame, every transcript line, every clip you already dragged in — and issues a tool call against the existing timeline. The clip moves. The transition appears. The overlay lands on the right beat.

It doesn't write you a paragraph about what it would do. It does it. And — this is the part the engineers-who-never-edited camp miss — it does it on the timeline you already built, with the footage you already shot, in the order you already chose to keep. That's the difference between generation and editing. The chat is the steering wheel, not the engine.

2. Every tool call leaves a receipt

This is the one I'd love to drag Theo over to a screen for, because it's what the slop-tool category does not do.

Every time the AI touches the timeline in VibeChopper, it fires a tool event — a typed record of what it did, when, why, on which clip, against which frames, with which transcript snippet as evidence. The server publishes those events over a stream. The chat panel renders them as cards.

Each card has:

a status badge (running, completed, failed),
a clip identity pill with the thumbnail and time range of the clip it touched,
a three-frame strip — IN, MIDDLE, OUT — so you can see the clip the model picked,
a transcript range with the actual line that made the model pick this clip and not the next one,
a before/after timeline diff,
and a one-click button that jumps the playhead to the cut.

If you're an engineer reading this: server/projectEditorEvents.ts publishes a typed event stream (tool_event_started, tool_event_updated, tool_event_completed, timeline_snapshot_created, timeline_changed), and the client renders those events through client/src/components/editor-workstream/ — ToolCallCard, ClipFrameStrip, TranscriptRangePreview, BeforeAfterTimelineDiff, TimelineDeepLinkButton. The card you see in chat isn't a confidence-screen. It's the actual contract of what happened. If you don't trust the model, click the jump button and watch the cut. If the cut's wrong, reject the candidate; the run takes your replacement and keeps going.

This is what I mean by editorial AI. The model doesn't ask you to trust it. It hands you the receipts and asks you to verify it.

The other AI video editors Theo's right to be mad at do not do this. They show you a finished cut and a "regenerate" button. You don't see what they touched, why they touched it, or what they considered and didn't pick. You get the slop and the vibe and you're supposed to nod.

We don't do that. The whole product is built around the assumption that you, the creator, are still the one making the editorial decisions — the AI is just doing the wrist work.

3. Brief-as-context — the model reads what you wrote before it cuts

Briefly. Before the AI touches your timeline, you can hand it a brief — paragraph, bullet list, voice memo, sketch of intent. This is a 30-second trailer for a wedding I shot. The bride hates the song "A Thousand Years." Lean on the speeches, not the dance floor. The grandma joke at minute 47 has to land.

The model carries that brief through every tool call. When it picks between two candidate clips, it picks against the brief. When you reject a candidate, the rejection updates the brief.

The reason this matters in the Theo frame is that the brief is the part of editing only the human can do. What is this cut for. Who is it for. What feeling do you want the audience to leave with. Outsource the brief, you get slop. Write the brief and let the model do the wrist work, you get an edit that's still yours.

The brief is the seat. The chat is the steering wheel. The tool-call cards are the dashboard. The footage is yours, the cut is yours, the taste is yours. The wrist is on loan.

Horizontal diagram showing chat input flowing through a planner into a tool call into a timeline edit, then a feedback loop back to the chat

Mock screenshot of a VibeChopper tool-call card with status badge, clip identity pill, frame strip, transcript line, and timeline jump button

Diagram of the editorial AI surface: chat plan, tool event, clip pill, transcript range, frame strip, before/after diff, timeline deep-link

The five places we still have work to do

I'd be a coward if I didn't list these. Theo's critique was useful precisely because it was specific.

1. The default polish is still too aggressive. Out of the box, the "polish" tool trims too tight on dead air. Experienced editors know that a breath is a beat. The brief override already works; the new default ships next release. 2. Multicam isn't there yet. The AI can pick a take, but not the right angle with the consistency a human would. For now, multicam users do the angle pick by hand. 3. Music selection is a known weak spot. The model is good at where music should rise and fall. It is not yet good at picking the right song. We default to your library; we don't generate stock pop. 4. It's still possible to use the chat to be lazy. "Just make me something good" is a prompt that exists. The brief is the antidote. We need to nudge harder toward writing a real one. 5. The receipts are powerful, but only if you read them. Some users skim the tool-call cards and approve everything. That's a UX failure on our part. We're iterating on which cards demand a verify-click.

Five honest gaps. Any working editor could add ten more. I'd rather have the list public than buried.

A synthwave depiction of two figures talking at a party — chrome glasses, neon palm decor, a glowing pool reflecting on the back wall

The thing that keeps me up about this debate

There's a version of the "do it yourself or you don't care" argument that's really about something deeper than AI. It's about attention as a form of love. When you sit at a timeline for nine hours and watch the same 30-second cut forty times to nail the rhythm — that's the creator caring. The audience feels it. It's the difference between a piece of work that has a person inside it and a piece of work that doesn't.

I share that intuition completely. I don't want to live in a world flooded with content that nobody was inside of.

But — and this is what I'd push back on at the party if I could replay it — the wrist is not where the love lives. The love lives in the brief. In the order. In whose face you held on. In what you cut for the audience's sake and what you kept because you couldn't bear to lose it. In the breath before the punchline. The wrist work — dragging the razor tool five hundred times — is the labor that expresses the taste, but it isn't the taste itself.

If I could take the wrist work off a creator's shoulders and give them back three hours per video, the question isn't "do they care less now?" The question is what do they do with those three hours? If they ship more slop, faster, Theo wins. If they spend those hours on the brief, on a second pass for pacing, on a third pass for the parts only they can hear — everyone wins.

We're building, deliberately, for the second creator. That's the bet.

A large neon-magenta sign reading WE HEARD YOU mounted above a chrome doorway, with palm-tree silhouettes and a grid floor receding into a sunset

An open invitation

Theo, if you read this — the door's open. Try VibeChopper free

I would genuinely love to have you on a stream. Bring a shoot. Bring a project that matters to you. Sit at VibeChopper with me for an hour. Type the brief. Watch the tool calls. Click the receipts. Reject the cuts you don't like. Tell me where we're missing the craft, and I'll fix it on camera if I can.

You said something at the party I haven't been able to put down: that caring is the work, and the work is what you do yourself. I think you're right about the caring part and I think there's a version of "yourself" that's bigger than the wrist. I'd like to convince you in person, or be wrong in person. Either one moves the product forward.

To the rest of the readers who got this far: the chat-driven edit and the tool-call receipts are both free to try. Drop in a shoot. Write a brief. Read every receipt. If a card says it cut something you didn't want cut, reject it, pick your own clip, and watch the run keep going. That's editorial AI in one paragraph.

The slop pile is real and Theo's right to point at it. So is the other thing.

— Steve

Keep reading

Casey, Hank, and the Myth That Caring Equals Clicking — the broader version of the "do it yourself or you don't care" argument, addressed in full.
We Want to Give Creators Their Power Back — the manifesto behind the product.
Watch the AI Show Its Work — a deep look at the tool-call cards, the event stream, and the receipts.
Tell It What You Want. Watch It Cut. — the chat-as-control-surface walkthrough.

An open chrome studio door spilling magenta light into a dark hallway, a single empty chair facing a glowing CRT timeline inside

Try the workflow

Open every feature from this post in the editor

These panels collect the features discussed above. Sign in once, finish your profile if needed, then the editor opens the first highlighted surface and walks through the tutorial.

Start full tutorial

Step 1

See the editorial AI surface yourself

Open the editor and watch the tool-call cards roll in — every cut comes with a clip, a transcript line, a frame strip, and a button that jumps you to it.

See the editorial AI surface yourself free →

Step 2

Try VibeChopper free

Drop in a shoot, type what you want, watch the timeline obey. Log in and we'll seed your first project.

Try VibeChopper free →