Spec-Driven Development: GSD vs Spec Kit vs OpenSpec

11 min read
Spec-Driven DevelopmentAIClaude CodeGSDSoftware Engineering

GSD Foundation dropped an article this week: "How We Built The World's Most Powerful Coding Agent." Bold title. But after reading it — and after months of using GSD, Spec Kit, and OpenSpec on real projects — I think the technical claims hold up more than the marketing ones.

The article unpacks GSD's architecture: deterministic tool calls replacing bash scripts, context pruning with invisible anchor messages, boundary maps that force interface thinking, fractal summaries for scalable memory, and a verification system that checks outcomes instead of checklists. It's a deep technical read, and it forced me to think about what actually separates these spec-driven development tools beyond star counts on GitHub.

I wrote a complete guide to Spec-Driven Development recently. This post isn't that — it's a practitioner's comparison of the three frameworks leading the SDD space right now, triggered by what GSD Foundation published, and informed by my experience shipping projects with all of them.

The Spec-Driven Development Landscape in 2026

Spec-driven development went from niche methodology to near-mainstream in under six months. GitHub shipped Spec Kit in September 2025. OpenSpec launched through Y Combinator. GSD grew from zero to 26,000+ stars in three months. AWS built SDD directly into Kiro. Martin Fowler's team published analysis on it.

The core idea: write structured specifications before code, and let AI agents execute against those specs instead of improvising from vague prompts.

Three open-source frameworks dominate:

ToolCreatorStarsCore Philosophy
Spec KitGitHub75.2kStructured phases, agent-agnostic portability
OpenSpecFission AI (YC)28.8kLightweight change isolation, brownfield-first
GSDTACHES26.7kExecution-layer orchestration, fresh subagent contexts

Each represents a fundamentally different answer to the same question: how should specs relate to execution?

Loading diagram...

What GSD's Architecture Gets Right

The GSD Foundation article makes a specific claim worth unpacking: most of what makes AI coding agents unreliable isn't the model's code generation — it's everything around it. State management. Context pollution. Lost continuity between sessions. Mechanical errors in git operations. Verification that checks steps instead of outcomes.

GSD's answer is a strict separation: LLM judgment for creative work, deterministic TypeScript for everything else.

The LLM/Deterministic Split

Two tools expose the entire deterministic layer: gsd_manage (18 actions for state, git, scaffolding, and context) and gsd_verify (4 actions for static verification). One tool call replaces 5-10 bash/read/edit calls.

The model never constructs a git command. Never parses markdown to figure out which task is next. Never formats frontmatter. It calls a tool, gets a result, and moves to creative work — writing code, making architectural decisions, diagnosing failures.

This is the single biggest architectural difference from Spec Kit and OpenSpec, which delegate all execution to whatever agent you connect. GSD doesn't just specify. It orchestrates.

Loading diagram...

Context Pruning Solves Context Rot

This is the feature that matters most in practice. Each task gets an invisible anchor message injected into the conversation. Before every LLM call, a context hook prunes the message history back to the current task's anchor.

Task 5 doesn't see the 40 tool calls from tasks 1-4. No stale file reads. No debugging traces from solved problems. No accumulated noise.

Without this, you get context rot — the silent quality degradation that happens when models reason over polluted context windows. Most agent systems hit a wall around task 3-4 in a sequence. The model isn't dumber. The context is poisoned with outdated information, renamed variables, and failed approaches that no longer apply.

I've experienced this firsthand. Before using GSD, long sessions with Claude Code would progressively degrade. By task 6 or 7, the agent referenced variables that had been renamed and avoided approaches that failed for reasons that no longer applied. GSD's anchor pruning eliminates this. Task 7 runs with the same context quality as task 1.

Boundary Maps and Goal-Backward Verification

Two more concepts from the article worth highlighting:

Boundary maps force interface thinking before implementation. Each slice declares what it produces and consumes with concrete function names, types, and endpoints. When slice 3 is being planned, it knows exactly what slice 1 built. No silent assumptions. No "slice 3 needs a function that slice 1 never exported."

Goal-backward verification checks outcomes, not steps. GSD verifies truths (observable behaviors), artifacts (files with real implementation and minimum line counts), and key links (import wiring between files). A stub detector scans for TODO comments, return null, and hardcoded empty responses. An 8-line file returning an empty object doesn't pass.

Spec Kit: GitHub's Enterprise Play

Spec Kit has the numbers — 75,000+ stars, GitHub's institutional backing, and support for 20+ AI coding tools. It's the most adopted spec-driven development framework by community size.

The workflow is a 7-step pipeline: Constitution → Specify → Clarify → Plan → Tasks → Analyze → Implement. Each phase produces markdown artifacts that persist in your repo.

Loading diagram...

Where it shines:

  • Agent-agnostic portability. Write specs once, hand them to Copilot, Claude Code, Cursor, or Windsurf. No vendor lock-in.
  • The Constitution concept. Non-negotiable project principles established upfront that govern all downstream decisions. A unique governance layer no other tool offers.
  • Cross-artifact validation. /speckit.analyze catches architectural misalignments before a line of code is written.
  • Team workflows. Multiple developers can work with the same AI assistant using shared project context.

Where it struggles:

  • Ceremony overhead. A Scott Logic review found Spec Kit took 3.5+ hours vs 23 minutes with iterative prompting for the same feature. That's a steep tax for smaller changes.
  • Markdown explosion. One feature generated 2,577 lines of specification. Much of it redundant.
  • No execution orchestration. Spec Kit guides agents but doesn't manage parallelism, context isolation, or subagent spawning. It's a specification layer, not an execution engine.
  • Sequential rigidity. The 7-phase pipeline mirrors waterfall methodology. If you value agile iteration, this will feel heavy.

OpenSpec: The Lightweight Contender

OpenSpec comes from Fission AI (YC-backed), and its philosophy is direct: "Generating code is now cheap. Correctness is still expensive."

The workflow is a 4-phase state machine: Propose → Validate → Apply → Archive. Each change gets its own isolated folder with proposal, specs, design, and tasks. Completed changes archive into permanent specifications.

Loading diagram...

Where it shines:

  • Brownfield-first. Designed for evolving existing codebases. /opsx:onboard reverse-engineers undocumented legacy code into baseline specs. Most competitors assume greenfield.
  • Lightweight output. ~250 lines per change vs ~800 for Spec Kit. Less noise, faster iteration.
  • Change isolation. Each feature lives in its own directory until merge. No cross-feature interference.
  • Broadest agent support. 24+ AI coding tools — more than any other SDD framework.

Where it struggles:

  • No multi-agent orchestration. Like Spec Kit, it delegates execution entirely to the connected agent. No parallel subagents, no context isolation.
  • Specs are static. They don't auto-update during implementation, creating drift on longer tasks.
  • Manual archiving. Requires developer discipline to archive completed changes. Skip it and specs diverge from reality.
  • No automatic git branching. Gives you control but adds manual steps.

Side-by-Side Comparison

Architecture

DimensionGSDSpec KitOpenSpec
OrchestrationFull execution engine (wave-based parallel, subagent isolation)Specification layer onlySpecification layer only
Context managementFresh 200k context per task via anchor pruningStatic file-based specsLoad-on-demand from change folders
Git strategyAtomic commits per task, branch-per-slice, squash mergeAuto branch per featureNo automatic branching
Verification4-tier ladder (static → command → behavioral → human)Cross-artifact analysisStrict mode structural checks
Interruption handlingContinue-here files with auto-compaction hooksManual session re-entry/opsx:continue command
Output weightModerateHeavy (~800 lines/feature)Light (~250 lines/feature)

Developer Experience

FactorGSDSpec KitOpenSpec
Setup time< 5 min< 5 min< 5 min
Learning curveLow for solo devsModerate (7 commands + review checkpoints)Low (4-phase state machine)
Ceremony per featureLow–moderateHighLow
Agent supportClaude Code, OpenCode, Gemini CLI, Codex20+ agents24+ agents
Best forSolo devs, small teams, greenfieldTeams, enterprise, cross-agent portabilityExisting codebases, fast iteration
Worst forTeams needing multi-agent flexibilityRapid prototyping, small changesProjects needing execution orchestration
LicenseMITMITMIT
BackingCommunity (TACHES)GitHubY Combinator

What to Expect: The Honest Assessment

The Good

Spec-driven development works. Regardless of which tool you pick, writing specs before code produces better AI output. I shipped this portfolio site — 5 phases, 28 requirements, zero gaps — using SDD with GSD. The methodology is the real value. The tool amplifies it.

Context rot is a solved problem. If you've used Claude Code for 30+ minutes on a complex task, you've felt quality degrade. GSD's anchor pruning is the most elegant engineering solution I've seen — and it's the reason I keep coming back to it.

All three tools are free and open source. MIT licensed across the board. No vendor lock-in on the framework level. You're only paying for the underlying model's token usage.

The Bad

Token consumption with GSD is real. The multi-agent architecture uses significantly more tokens than unstructured coding. Multiple fresh subagent contexts, parallel research agents, verification loops — it adds up. One user reported a bug fix spawning 100+ agents. Plan your budget accordingly.

Upfront time investment. SDD trades speed on small tasks for reliability on large ones. For a 10-line bug fix, running discuss → plan → execute → verify is overkill. GSD's /gsd:quick mode helps, but the overhead exists. Spec Kit's 3.5-hour benchmark for a single feature is a real data point.

Agent ecosystem fragmentation. GSD is Claude Code-centric. If your team uses Cursor or Copilot primarily, Spec Kit or OpenSpec give you more flexibility. This is GSD's most significant limitation.

The waterfall criticism has merit. Heavy upfront specification echoes waterfall methodology. The counter-argument: AI execution speed makes the "slow planning, fast execution" trade-off worthwhile because planning takes 10 minutes while execution that would have taken days happens in minutes. But it's a valid concern for teams used to iterative workflows.

Known Issues to Watch

IssueGSDSpec KitOpenSpec
Auto-answering bug/gsd:discuss-phase sometimes auto-answers questions since v1.22.0
Subagent context gapSubagents may not receive project CLAUDE.md
PII in planning files.planning/ files added to repos can leak PII
Maintenance concernsKey developer left for Anthropic
Validation errorsSome users report validation errors during spec generation
Windows supportCrashes on large phases due to path issues

Which One Should You Use?

Loading diagram...

Choose GSD if you work primarily with Claude Code, value execution-layer orchestration, and care about context isolation. It's the most opinionated tool — and for solo developers and small teams building greenfield projects, that's a feature. The fresh subagent contexts and goal-backward verification are unmatched.

Choose Spec Kit if you need cross-agent portability, work in a team environment, or need enterprise governance. The Constitution concept and cross-artifact validation are genuinely unique. Accept the ceremony cost as the price of thoroughness.

Choose OpenSpec if you work on existing codebases, want minimal overhead, and value speed over orchestration depth. The brownfield onboarding and change isolation pattern solve real problems other tools ignore.

My setup: I use GSD for everything — NovaMX, this site, internal tools, client projects. The fresh subagent contexts and goal-backward verification align with how I think about building software. But I'm a Claude Code power user who has invested time in agents and skills. If your primary tool is Cursor or Copilot, Spec Kit or OpenSpec will serve you better today.

The Bigger Picture

The spec-driven development landscape will keep evolving. Augment Code's Intent offers living specs that auto-update during implementation — the only tool that does this today. AWS has SDD built into Kiro. Tessl is exploring specs-as-source-code where implementation is generated entirely from specs.

But the core insight from GSD Foundation's article holds regardless of tooling: reliability comes from engineering the infrastructure around the model, not from writing better prompts. Every SDD tool is a bet on that principle. The question is how much infrastructure you want between your intent and your code.

Start with the methodology. Pick the tool second. And if you want the methodology explained from scratch, read my Spec-Driven Development guide.

Check out my tools and stack for the full development setup I use alongside these frameworks.


Building with spec-driven workflows? Reach out — I'm always comparing notes with other engineers on what works.