What Is Spec-Driven Development? A 2026 Primer
Spec-driven development — SDD for short — is a methodology where every feature starts with a specification rich enough for an AI coding agent to execute against. Not a ticket. Not a one-line prompt. A structured document that captures scope, acceptance criteria, edge cases, architecture, and task breakdown.
The idea isn't new. Teams have been writing tech specs for 40 years. What's new is that AI coding agents turned specs from a documentation artifact into an execution artifact. The spec isn't for humans to read and then translate into code — it's the input to an agent that produces the code.
This post is the primer I wish someone had handed me when I first heard the term. No tool pitches, no comparisons — just the methodology. If you want to jump to tools, I've written about GSD, Spec Kit, and OpenSpec in depth.
Why SDD Exists Now
Through 2024, coding agents were good enough for a 50-line function and unreliable for anything larger. You'd prompt Claude Code or Copilot, get a plausible-looking draft, spot a subtle bug in line 27, fix it, and then chase three more down the stack. The friction was in the ambiguity — you gave the model a vague intent, the model filled the gap with plausible defaults, and those defaults sometimes matched your actual requirements and sometimes didn't.
By mid-2025 the models got dramatically better. Claude Sonnet 4.5, GPT-5, Gemini 2.5, all crossed a quality threshold where the limiting factor stopped being the model and started being the context. Give the same model a one-line prompt, it produces mediocre code. Give it a 400-line spec that describes exactly what you want, it produces production code.
SDD is the methodology that captures that shift. The bottleneck moved from code generation to requirement clarity. Specs close the gap.
The Core Loop
At its simplest, spec-driven development is a four-step loop:
- Spec — You describe what needs to be built in structured form.
- Plan — The spec is decomposed into ordered tasks with dependencies.
- Execute — An AI agent implements each task against the plan.
- Verify — The result is checked against the original spec, not just "did it compile."
The loop looks obvious written down. The reason teams historically skipped it is that steps 1 and 2 were expensive — a human had to write the spec and break down the tasks before any code got written, and humans resist that friction when they can just start coding.
AI agents flip the math. The spec becomes cheap because the agent helps write it through conversational elicitation. The task breakdown becomes cheap because the agent proposes it and you edit. Execution becomes cheap because the agent does the typing. The only expensive step is the one that should be expensive — reviewing the result against the spec.
What's in a Spec
A usable SDD spec has, at minimum, four sections:
Scope. What's in, what's out. The "out of scope" list is often more valuable than the "in scope" list — it prevents the agent from inventing features that sound helpful but weren't requested.
Acceptance criteria. Concrete, verifiable statements. "The import succeeds" is not a criterion. "The import processes 10,000 rows in under 30 seconds, rejects rows with invalid emails, and reports per-row errors in the response payload" is a criterion.
Architecture notes. The high-level shape — which services, which data stores, which third-party APIs. Not line-level code, but enough that the agent doesn't have to guess which database table to write to.
Task breakdown. Ordered list of tasks with clear dependencies. Each task should take 30–90 minutes of agent time and end with a verifiable artifact (a file, a test, a migration).
Here's a fragment of a real spec I wrote for a recent feature:
## Scope
- CSV upload endpoint accepts files up to 5MB
- Parse contacts with columns: email (required), name, phone
- Dedupe by email within the file
- Insert new contacts in batches of 500
- Return per-row status in the response
## Out of Scope
- Multi-file upload
- UI for error inspection (separate feature)
- Contact update on dedupe (creates-only)
## Acceptance Criteria
- A 10k-row file completes in under 30s on staging
- Invalid emails fail with a specific error code
- A file with 2 duplicate emails imports 1 row
- The endpoint rejects files over 5MB with 413
## Tasks
1. Upload endpoint — S3 presigned URL, 5MB limit
2. Parser worker — Papa Parse, row-level validation
3. Batch insert — 500-row batches, conflict handling
4. Status endpoint — per-row error reporting
5. Integration test — 10k-row file on staging DB
That's it. Nothing fancy. The discipline is in writing it before any code, not in making it elaborate.
The Role of the Agent
In SDD, the AI agent plays three roles across the loop:
During specification, the agent elicits. It reads your initial description, asks clarifying questions, flags underspecified sections, and suggests edge cases you didn't consider. Good SDD tools lean into this — they treat the spec as a dialogue rather than a dictation.
During planning, the agent decomposes. It takes the scope and acceptance criteria and produces a task breakdown. You edit, reorder, or reject tasks. The agent isn't writing the plan alone; you're co-authoring it.
During execution, the agent implements. It reads the spec and the plan, reads the relevant code, writes the implementation, runs tests, and moves to the next task. Your job during execution is review, not typing.
During verification, the agent checks its own work against the spec. Did every acceptance criterion land? Are there stub implementations masquerading as real ones? Did any task's output get skipped?
Each of these roles is possible because the spec is the shared context. Without the spec, the agent has no ground truth to verify against. Every SDD tool — GSD, Spec Kit, OpenSpec — is a different opinion about how that shared context should be structured and how much of the loop the tool should automate.
What SDD Is Not
SDD gets confused with a few adjacent things. Worth separating them:
- It's not waterfall. Waterfall locks requirements before engineering starts. SDD loops — specs evolve, plans revise, changes create new specs. The cadence is weekly or daily, not quarterly.
- It's not TDD. Test-driven development writes tests first, then code. SDD writes specs first, then tests and code. Specs sit above tests in the hierarchy.
- It's not documentation-driven development. DDD treats docs as output. SDD treats specs as input — they drive code generation, not the other way around.
- It's not prompt engineering. A prompt is a one-off instruction. A spec is a persistent artifact that survives the feature, lives in the repo, and informs future changes.
The clearest test: if your "spec" is a Slack message you send to an agent and never look at again, you're prompting. If it's a file in the repo that your teammate can read six months later and understand what got built, you're doing SDD.
Why Teams Actually Adopt It
Three reasons I keep hearing from teams who've made the switch:
Review became meaningful again. When PRs are generated from one-shot prompts, reviews devolve into "does the code work" rather than "does this do what we want." With a spec attached, reviewers check the spec against the code, and the PR becomes a conversation about requirements instead of syntax.
Onboarding is faster. New engineers read the specs, not the code. The specs describe intent. The code describes current implementation. Intent decays more slowly than implementation, so spec-reading scales better across team growth.
The same spec works across tools. A spec written for Claude Code today can be handed to Cursor next quarter without rewriting. Models improve, tools change, specs persist. Teams that tied their workflow to a single agent's prompt format got stuck. Teams with specs in the repo didn't.
Common Objections
"Specs will go stale." They will, if you treat them as documentation. They won't, if you treat them as the input to every change. The habit to build: no implementation change merges without a matching spec update. Tools like OpenSpec archive specs on merge to automate half of this.
"It's slower." On small changes, yes. A one-line bug fix doesn't need a 400-line spec. SDD is for changes where the cost of getting it wrong exceeds the cost of writing the spec. That's most production work, but not all of it.
"The agent can figure it out from a good prompt." Sometimes. The bigger the feature and the more integrated the codebase, the less true this gets. Every experienced SDD practitioner I know tried the "good prompt" path first and came back.
How to Start
If you've never done SDD, pick one feature on your next sprint. Write a 200-line spec following the structure above — scope, acceptance criteria, architecture, tasks. Hand it to Claude Code, Cursor, or whichever agent you use, and run the implementation.
Three outcomes are possible:
- The implementation matches the spec. You didn't save time, but you gained reviewability and a durable artifact. Do it again next sprint.
- The implementation drifts. Your spec wasn't specific enough. Tighten the acceptance criteria and try again on a smaller feature.
- The agent asked useful clarifying questions you hadn't thought of. This is the most common outcome for new SDD adopters, and it's the one that sells the methodology.
After three or four features, you'll have an instinct for how much specification is enough. At that point, pick a tool — Spec Kit, OpenSpec, or GSD — that matches your constraints and lean into it.
The Short Version
Spec-driven development is not a product. It's a working agreement: we will describe what we're building before we build it, in enough detail that an agent can execute, and we will verify the output against that description. The tools help. The discipline does the work.
Specs are slow to write the first time. They're fast to write the tenth time. And they make AI agents produce code you can actually trust — which is the only thing that matters at scale.
Read the full methodology deep-dive here, or check out my tools and stack for the setup I pair with SDD on client projects.
Trying SDD for the first time or scaling it across a team? Reach out — happy to trade notes on what works.