All Resources
Claude Agent · Free02

When AI Becomes
a Team

Last week I built a pipeline where Claude orchestrated a team of local LLMs — assigning tasks, reviewing outputs, deciding what was good enough to keep. Qwen generated. Claude judged. The thing that struck me wasn't the output quality. It was how human it felt. Here's the agent. Drop it in your .claude/agents/ directory.

Profiles all local models by capability Decomposes tasks into routable subtasks Quality gates between every phase Parallel execution where dependencies allow Final integration + polish pass
claude-lmstudio-pipeline.md
PHASE 1 — MODEL CAPABILITY PROFILING
  Audit all local LLMs · Score across 10 dimensions
  code-gen / reasoning / speed / structured-output / ...
  Build a routing map before touching any task

PHASE 2 — TASK DECOMPOSITION
  Parse task → identify all output artifacts
  Map subtasks to capability dimensions
  Flag critical path items · Identify parallel work

PHASE 3 — EXECUTION WITH QUALITY GATES
  Route each subtask to its best-fit model
  Fail fast: re-prompt → escalate → flag for human
  Parallel execution where dependencies allow

PHASE 4 — INTEGRATION & POLISH
  Assemble → consistency check → polish pass
  Final audit against original requirements
  Deliver with routing summary
1
Model capability profilingBefore routing anything, the agent audits all local LLMs and scores them across 10 dimensions — code generation, reasoning, speed, structured output, and more.
2
Task decompositionBreaks your task into atomic subtasks, maps each to a capability dimension, identifies dependencies, and builds a routing plan before touching any model.
3
Execution with quality gatesEach subtask runs with a precision prompt tailored to its assigned model. Failures escalate — re-prompt first, then a higher-tier model, then a human review flag.
4
Integration and polishAssembles the final deliverable, resolves inconsistencies, runs a polish pass, and delivers a routing summary showing which model handled what and why.
Fail fastQuality failures caught at subtask level — never at final delivery
Escalation ladderRe-prompt → higher-tier model → human review. No silent failures
Output contractsEvery subtask defines exact format and structure before execution
Consistency checksCross-subtask validation — does the code match the docs?
Transparent routingShows all routing decisions and reasoning before executing
Persistent memoryCapability map improves with every run — learns what works
Why roles matter more than raw capability

Everyone's racing to use the best model. The more interesting question is what happens when models work for each other. Claude holds the ORCHESTRATOR and JUDGE roles explicitly — it never auto-approves anything. Local models execute fast and free; Claude decides what passes. The structure of accountability is what made it feel human. Not the output quality.

Write to me
Suresh Victor | Product Architect