Second-Pass AI Editing With Rubrics and Agent Scoring

The First Draft Is Not the Finish

A first AI video edit draft is useful because it moves the project out of blank-canvas mode. It can choose beats, assemble a rough timeline, follow a prompt, and give the creator something concrete to react to. But a first draft is still a proposal. In VibeChopper, the second pass is where the system asks the production questions that matter: did this segment come from the right source file, does the selected range have evidence, are the alternate candidates better, does the color intent make sense, is the audio plan plausible, has rendering completed, and did validation pass? Open the edit-run receipts

That is the difference between chat-shaped editing and product-shaped editing. A chat answer can sound confident because the language is polished. A timeline has to survive playback, export, ownership checks, object storage paths, frame evidence, transcript alignment, undo history, and user trust. The second-pass system exists because VibeChopper is not trying to make a clever paragraph about an edit. It is trying to make an edit that can be inspected, rendered, repaired, and shipped.

The implementation evidence for this post is server/secondPassEditHarness.ts and server/finalDraftRubric.ts, called out in the developer audit under commit 7eeeeaf. That same audit places the work inside the 2026-05-17 to 2026-05-18 platform hardening wave: AI edit runs, tool events, render verification, compositor effects, DATA remediation, upload telemetry, owned auth, passkeys, and themed platform emails. The pattern across that wave is simple. AI can be powerful, but the product has to keep score.

Second-pass review is where that score becomes visible. It is not one monolithic critic model. It is a set of smaller checks that create structured metadata: segment evidence, source-selection confidence, color normalization scores, whole-timeline color intent, music planning coverage, validation state, render verification, and final blockers. Each piece is understandable on its own. Together they decide whether the edit is ready to notify the creator.

A dark VibeChopper second-pass review console scoring an AI-generated video timeline.

Where the Second Pass Fits

The first pass starts with intent. A creator can describe an edit in natural language: cut the dead air, keep the strongest explanation, use the product shots as visual proof, add a restrained music bed, and make the ending feel resolved. The planning layer turns that into candidate timeline segments and tool calls. That plan may already be structured, validated, and traceable, but it is still upstream from a deeper review of the actual media.

The second pass receives the planned segments and reloads the project media through server storage. For each segment, it builds source evidence using the selected videoId, sourceStart, and sourceEnd. That evidence can include source-path presence, frame descriptions at the beginning, middle, and end of the selected range, denser frame descriptions inside the range, transcript lines, and nearby candidate clips. The goal is to stop treating a segment as an abstract JSON object and start treating it as a claim about real footage.

From there, VibeChopper creates a revised segment that carries more review context. The segment keeps the planning fields, but now it also has sourceEvidence, alternateCandidates, selectionConfidence, colorIntent, musicIntent, audioMix, and finishIntent. That shape matters because later tools can read a segment and understand not only what should happen on the timeline, but why the system thinks this source range deserves to be there.

The second pass also emits tool calls as it works. It records a clip_source_selection_agent result, an inspect_source_evidence result, and a clip_color_normalization_agent result per segment when evidence is available. After all segments are reviewed, it adds whole-timeline signals through timeline_color_agent, music_scoring_planning_agent, and refine_plan_second_pass. This makes the review visible in the same style as the rest of the AI edit run: not as hidden deliberation, but as product state.

Workflow diagram from first AI edit plan to second-pass refinement and final draft rubric.

Source Evidence Over Vibes Alone

VibeChopper is built around editing by vibe, voice, and precision timeline context. The vibe matters. Creators often know the feeling they want before they know the exact timecode. But once the system chooses a range, the second pass has to leave the neon mood board and walk into the evidence room. Talk a cut into shape

The source evidence check gives each segment a grounded review. The confidence calculation starts with a base score, then adds credit when the source path is present, when first, middle, and last frame descriptions exist, when dense frame descriptions cover the selected range, and when transcript lines overlap the range. It subtracts for warnings. The result is a bounded confidence score, not a vibe-only declaration that the clip is probably right.

That confidence is paired with alternate candidates. If nearby candidates exist, the clip-selection agent records the strongest candidate score and folds it into an aggregate score. This is important because a second pass should not only ask, "is this selected clip defensible?" It should also ask, "was there another nearby piece of evidence that may have been better?" In creative tooling, that question is where many quality gains live.

The agent result includes the segment index, selected video, selected range, selected file, evidence score, candidate score, aggregate score, related candidates, rationale, and warnings. That is not glamorous copy for a launch page. It is the connective tissue that lets the edit run explain itself. When a creator or reviewer asks why a beat landed where it did, the product can point to frame descriptions, transcript overlap, and candidate review instead of shrugging behind model confidence.

Product callout showing source file, frame descriptions, transcript lines, and alternate candidates for one timeline segment.

Specialized Agents, Not One Big Judge

A common mistake in AI product design is asking one model to be the planner, critic, colorist, sound supervisor, validator, and production manager all at once. It feels simple until you need to debug it. If the result is weak, was the source wrong, was the transcript missing, was the grade mismatched, did music fight dialogue, did validation fail, or did the render never complete? One large judgment hides the answer.

The second-pass harness breaks the review into smaller agent-style outputs. The clip source selection agent checks whether the selected source range is supported by frame and transcript evidence. The clip color normalization agent looks at visual evidence and creates a color intent for each segment, including an exposure risk of low, medium, or high. The timeline color agent summarizes a restrained whole-timeline grade so individual clip fixes do not fight each other. The music scoring planning agent checks whether the edit has a plausible music strategy while preserving dialogue priority.

These agents are deliberately practical. Color intent is not a poetic description. If frame descriptions imply dark, night, shadow, or dim footage, the intent protects shadows and lifts faces gently without noisy over-brightening. If the evidence suggests sun, bright outdoor light, or windows, the intent balances highlights and keeps skin tones natural. If a segment is B-roll or a cutaway, the intent is to match surrounding A-roll and keep contrast restrained. The agent result is readable by engineers and useful to editors.

Music planning follows the same product logic. Hook segments can begin with a sparse pulse. Peak or escalation beats can build rhythmic support. Resolution or call-to-action beats can resolve the bed with a short tail. Source dialogue still gets priority. The point is not to automate taste into a universal formula. The point is to make the AI edit review explicit enough that the system can improve the draft without losing the creator's intent.

Agent score grid for clip source selection, clip color normalization, whole-timeline color, and music planning.

The Final Draft Rubric

Once the second pass has produced metadata, VibeChopper can score the draft with a final rubric. server/finalDraftRubric.ts keeps that scoring intentionally compact. It converts second-pass evidence into category scores for exact source, candidate review, color, music, render, and validation, then combines them into an automated score and a self-review score. The draft becomes ready to notify only when the scores are high enough and the blocker list is empty. Render a timeline free

The category weights tell you what the system values. Exact source carries 30 percent of the automated score. Candidate review carries 18 percent. Color carries 18 percent. Music carries 12 percent. Render carries 14 percent. Validation carries 8 percent. That weighting is a product statement: source truth matters most, but finish quality, audio planning, render completion, and structural validation all contribute to whether an AI edit is ready.

Exact source starts from average selection confidence and applies a penalty for evidence warnings. Candidate review checks whether every segment has an independent clip-selection agent result. Color combines per-clip normalization coverage with a whole-timeline color agent. Music is binary at the rubric level: the music planning agent has completed or it has not. Render reaches full credit only when a rendered export has completed and render verification exists. Validation receives full credit when the planned timeline validation is okay.

The self-review score is intentionally more conservative. It starts from the automated score, then subtracts for evidence warnings, missing render completion, and validation failure. This gives the product a way to model the difference between a mechanically computed score and the stricter question a reviewer would ask before telling a creator the draft is ready. The final gate requires both automated and self-review scores to be at least 90, with no blockers.

Final draft rubric showing weighted categories for exact source, candidate review, color, music, render, and validation.

Blockers Are Product Language

A score without blockers can become theater. It looks quantitative, but it does not tell the user or the system what to do next. The final draft rubric returns explicit blockers because a video editor needs recovery paths, not just judgment. If not every segment has an independent clip-selection result, that is a blocker. If not every segment has color normalization plus whole-timeline color intent, that is a blocker. If music scoring has not completed, render has not completed, render verification has not completed, or exact source confidence is below 90, those are blockers too.

This language matters because it is actionable. A missing render verification step sends the workflow toward rendering and artifact checks. Low exact-source confidence sends it back toward evidence inspection or candidate review. Incomplete color coverage sends it toward clip normalization and whole-timeline grade planning. Instead of asking a creator to interpret an opaque AI score, the product can point at the part of the production line that still needs work.

Blockers also protect notifications. VibeChopper should not tell a creator that a draft is ready simply because an agent generated a polished plan. The readyToNotify flag is true only when the automated score is at least 90, the self-review score is at least 90, and there are no blockers. That creates a clear contract between AI editing and user communication. The system can be enthusiastic in brand voice, but the gate itself stays strict.

This is also why rubrics belong near the backend, not only in frontend presentation. The server owns access to project media, validation state, render state, and durable metadata. It can build the rubric input from second-pass metadata and render options, then return a result that downstream features can trust. The UI can display it beautifully, but the decision should come from the same side of the product that owns the evidence.

A ready-to-notify gate blocking an AI edit until render verification and rubric scores pass.

Why This Improves AI Editing

Second-pass scoring improves AI editing because it changes the system from answer generation to accountable revision. The first pass can be creative. The second pass is a production review. It checks claims against source evidence, turns subjective finish work into inspectable intent, records agent outputs as tool calls, and sends the final draft through a rubric that can say no. Explore your media graph

That shape also makes the system easier to improve. If users dislike source choices, engineers can inspect selection confidence, transcript overlap, frame evidence, and alternate candidate scores. If the cut feels visually inconsistent, the color agent outputs and whole-timeline color score are available. If music feels too heavy, the music planning agent exposes how it treated hook, escalation, dialogue, and resolution beats. If a ready notification fired too early, the rubric result gives the first place to look.

The audit history around this work is important. VibeChopper's hardening wave did not only add one AI feature. It connected provider harnesses, edit runs, tool events, render verification, media summaries, upload telemetry, and remediation workflows. Second-pass rubrics benefit from that context because scoring is only useful when the underlying system records enough truth. A media graph, render artifact, transcript segment, or tool event can all become evidence for better review.

For creators, the benefit is simple: describe your edits, get a stronger draft, and keep the ability to inspect what happened. For developers, the lesson is equally direct: do not let AI confidence be the final interface. Turn confidence into structured evidence, specialized agent outputs, explicit blockers, and revision loops. That is how a natural-language video editor becomes a reliable editing system instead of a shiny prompt demo.

Implementation Lessons

First, keep the second-pass input close to the original plan. VibeChopper's revised segment extends the planned segment instead of replacing it with an unrelated review object. That makes it easier to connect timeline positions, source ranges, narrative roles, audio roles, and downstream tool calls. Review metadata becomes additive context, not a forked version of the truth.

Second, make each agent result boring enough to test. Source selection has scores, files, ranges, candidates, rationale, and warnings. Color normalization has a score, risk level, intent, rationale, and warnings. Timeline color and music planning have segment counts, scores, intents, and affected files. The objects are not vague model essays. They are structured enough to snapshot, validate, display, and compare over time.

Third, separate scoring from readiness. A category score explains quality. A blocker explains action. A ready flag explains product state. Mixing those together makes the system harder to reason about. In VibeChopper, the rubric can say the automated score is high but still block notification because render verification has not completed. That distinction keeps the workflow honest.

Fourth, preserve the audit trail. Every second-pass tool call includes created time, inputs, outputs, affected files, and affected timeline ranges where available. That lets an AI edit run show more than a before-and-after diff. It can show how the system inspected the source, what it warned about, and why a later result passed or failed. This is the difference between an AI assistant that acts mysterious and an AI editor that earns trust.

The Result

Second-pass AI editing is not about slowing the editor down with ceremony. It is about putting the right production checks in the path before the product claims a draft is ready. Source evidence prevents confident wrong cuts. Agent scoring breaks review into debuggable signals. Color and music intent turn finishing work into inspectable metadata. Render verification ties the plan to an output artifact. The final rubric turns all of that into readiness and blockers.

That is the VibeChopper posture: let the creator describe the edit by vibe and voice, let AI assemble and refine the plan, then make the server prove the shape before the timeline becomes a final draft. The neon part is the interface. The serious part is the contract behind it.

A reliable AI video editor is not built from one perfect prompt. It is built from loops: plan, inspect, score, revise, validate, render, verify, and notify. The second pass is the loop that makes a rough cut grow up. It gives the system the vocabulary to say, "this is strong," "this needs review," and "do not ship this yet." That vocabulary is what turns AI editing from a trick into a workflow creators can come back to.

A unified VibeChopper edit lab where evidence, agent scores, render verification, and the final timeline are connected.

Try the workflow

Open every feature from this post in the editor

These panels collect the features discussed above. Sign in once, finish your profile if needed, then the editor opens the first highlighted surface and walks through the tutorial.

Start full tutorial

Step 1

Inspect an AI edit run

Open the editor and see how plans, tool calls, artifacts, and render results stay connected.

Open the edit-run receipts →

Step 2

Try voice-driven timeline edits

Describe the edit you want and let VibeChopper translate intent into timeline changes.

Talk a cut into shape →

Step 3

Render a verified timeline

Export a project through the same storage-backed render path described in this article.

Render a timeline free →

Step 4

Open the media asset graph

See generated audio, rendered assets, source clips, metadata, and provenance in the media panel.

Explore your media graph →