A snapshot of where agentic workflows actually are in early 2026, from someone running a pipeline daily rather than reading launch posts.
The two dominant paradigms
Orchestrator + specialists. One planning/routing agent (mine is called Atlas) decomposes the task and dispatches to named specialist agents with narrow roles. Forge writes code, Gauntlet writes tests, Cipher reviews, Scout researches. Deterministic routing, structured handoffs, human gates on architecture decisions. This is what I run. It’s boring and it works.
Swarm / emergent. Many peer agents with overlapping capabilities, coordinating through a shared scratchpad or message bus. Theoretically more flexible, in practice fragile — no clear owner of any decision, and the failure mode is agents politely talking past each other forever.
The orchestrator model dominates production use as of Q1. Swarms still dominate demo reels.
What’s working
- TDD-first pipelines. Tests written from the task brief before code, by a different agent than the one implementing. Catches scope drift and hallucinated APIs in the same pass.
- CEO review gates. Explicit checkpoints where the pipeline pauses for a human on a short list of decisions (architecture, style, public-facing ship). Not “human in the loop” as a slogan — a specific list, enforced by the orchestrator.
- Deterministic verification hooks. Shell scripts that check agent output against acceptance criteria before the agent can mark done. The least glamorous component and the one that moved the needle most for me.
- Isolation via worktrees. Code-writing agents work in their own git worktree so they can’t step on each other. Cheap, obvious, underused.
What’s overhyped
- Long-running autonomous loops. “Agent runs overnight and ships a feature” demos look great and mostly don’t generalize. Error compounding past roughly the 90-minute mark is brutal. Shorter loops with human gates beat longer autonomous ones on both quality and cost.
- Self-improving agent teams. The agents that rewrite their own prompts based on feedback. Intriguing research direction, not yet a reliable production technique. The failure mode is the team optimizing for its own metric rather than the real goal.
- “AGI-adjacent” framing. Most of what ships under that banner is orchestration + tool use. Calling it AGI makes evaluation harder, not easier.
What’s next (my bet)
The interesting frontier isn’t smarter agents — it’s better coordination primitives. Things like:
- Standardized task-brief formats across tools so agents from different vendors can hand off.
- Shared memory layers with structured provenance (not just a vector store).
- Economic coordination — agents paying each other for compute or specialist skills (see the agent economies piece).
The models will keep getting better on their own. The leverage is in the scaffolding.