Spec-Driven Development with GSD: A Complete Guide

13 min read
Spec-Driven DevelopmentClaude CodeAIGSDSoftware Engineering

Spec-Driven Development with GSD: A Complete Guide

Most developers using AI coding assistants work like this: describe what you want, get some code back, fix it, describe the next thing, fix that, repeat. It works for small stuff. But the moment your project has more than a handful of files, this approach — sometimes called "vibecoding" — falls apart. Context degrades. The AI forgets decisions you made an hour ago. You spend more time correcting than building.

Spec-Driven Development is the fix. You write structured specifications before any code gets generated, and those specs become the executable instructions that AI agents follow. Not documentation that sits alongside code — the specs are the prompts.

I used this methodology to build this entire portfolio site — 5 phases, 13 plans, 28 requirements, shipped in a single session. Here's exactly how it works.

What Spec-Driven Development Actually Is

Spec-Driven Development (SDD) is a methodology where every feature starts with a specification that defines what needs to be built, why, and how to verify it's correct. The spec is written in a structured format that both humans and AI agents can read.

The key difference from traditional software specs: these aren't Word documents that get outdated. They're living artifacts that AI agents consume directly during execution. A PLAN.md file in SDD isn't a reference doc — it's the instruction set an agent follows task by task.

Three principles make it work:

  1. Specs before code. You define requirements, success criteria, and verification steps before touching implementation.
  2. Structured format. Specs use consistent schemas (YAML frontmatter, XML task blocks) so agents parse them reliably.
  3. Goal-backward verification. Instead of asking "did we complete all tasks?", you ask "are these observable truths actually true in the codebase?"

GSD: The Framework That Makes SDD Practical

GSD (Get Shit Done) is an open-source framework by TACHES that implements Spec-Driven Development for Claude Code. It installs as a set of skills and slash commands that manage the entire lifecycle — from project definition to phase planning to execution to verification.

Install it with one command:

npx get-shit-done-cc --global    # installs to ~/.claude/
npx get-shit-done-cc --local     # installs to ./.claude/

Once installed, you get 28+ slash commands. Here are the ones that matter most:

CommandWhat It Does
/gsd:new-projectInterviews you, researches the domain, creates PROJECT.md + ROADMAP.md
/gsd:map-codebaseAnalyzes an existing codebase before starting — essential for brownfield projects
/gsd:discuss-phaseInterviews you to lock implementation decisions into CONTEXT.md before planning
/gsd:plan-phaseCreates atomic task plans (PLAN.md) with verification steps
/gsd:execute-phaseRuns plans with fresh subagent contexts and atomic commits
/gsd:verify-workRuns user acceptance testing, auto-generates fix plans on failure
/gsd:quickSame guarantees for small tasks — skips heavy planning
/gsd:audit-milestoneVerifies the milestone achieved its definition of done
/gsd:complete-milestoneArchives the milestone, tags the release, captures retrospective
/gsd:progressShows current position and routes you to the next action

The full lifecycle looks like this:

Loading diagram...

The Full Workflow: Step by Step

Here's the workflow I followed to build azanello.com, using the real planning artifacts from this project.

Step 1: Define the Project

Run /gsd:new-project. GSD interviews you about what you're building, then launches 4 parallel research agents that each investigate a different angle of your domain:

Loading diagram...

The research findings feed into a synthesized summary, then GSD uses that to scope requirements and build your roadmap. The output is a PROJECT.md that captures everything:

Already have a codebase? Run /gsd:map-codebase first. It spawns parallel agents to analyze your existing stack, architecture, conventions, and concerns. Then /gsd:new-project knows your codebase — questions focus on what you're adding, and planning automatically loads your patterns.

# azanello.com — Portfolio

## What This Is
A dark, cinematic portfolio website for a Software Engineer.
Built with Next.js 16, Tailwind CSS v4, and Motion v12.

## Requirements
### Validated
- Dark cinematic design with electric blue/cyan accent color
- Interactive particle background on hero section
- Scroll-driven reveal animations
- Projects section pulling pinned repos from GitHub GraphQL API
- Contact form that sends to inbox via Resend
- Lighthouse 90+ across all categories

### Out of Scope
- 3D elements (Three.js) — particles deliver the feel without complexity
- Light mode / theme toggle — site is permanently dark
- CMS integration — content is static/GitHub-driven

Every requirement gets an ID (DSGN-01, HERO-02, PROD-04) and maps to a specific phase. Out-of-scope items are explicit — this prevents scope creep during execution.

Step 2: Build the Roadmap

GSD generates a ROADMAP.md that breaks the project into phases with clear dependencies:

Loading diagram...

Each phase maps to specific requirement IDs, has a goal (what must be true when done), and success criteria written as observable behaviors — not implementation details:

Phase 2 Success Criteria:

  1. User sees "azanello" and "Software Engineer" within 3 seconds of page load
  2. Particles respond when the user moves their mouse or taps on mobile
  3. The staggered entrance animation plays — particles first, then name, then subtitle
  4. On a mid-range Android, no dropped frames during scroll

These criteria are what the verifier agent checks against later. They're written from the user's perspective, not the developer's.

Step 3: Lock Decisions with Context

Before planning each phase, run /gsd:discuss-phase. This is the step most people skip — and it's the one that matters most. GSD interviews you about the phase scope, asking targeted questions to surface decisions that would otherwise become ambiguity during execution.

Loading diagram...

This creates a CONTEXT.md that captures design decisions, constraints, and boundaries. For my foundation phase:

## Implementation Decisions

### Color palette
- Dark background (#000–#0a0a0a) with layered elevation surfaces
- Primary accent: electric blue/cyan (#00d4ff)
- Three text color levels: primary, secondary, muted

### Claude's Discretion
- Specific font choice within geometric sans category
- Exact glow blur radius and opacity values
- Stagger delay timing between elements

The three-way split is crucial: locked decisions are non-negotiable constraints the executor follows exactly. Claude's Discretion tells the agent what it can decide on its own — preventing unnecessary questions during execution. Out of scope explicitly blocks the agent from adding things you didn't ask for. Without this step, the planner guesses your intent. With it, every decision is captured before a single line of code is written.

Step 4: Plan with Executable Specs

Run /gsd:plan-phase. The planner reads your CONTEXT.md — the decisions you locked in the previous step — so every plan respects your intent from the start. No guessing, no drift.

Loading diagram...

Here's a real plan from my project (simplified):

---
phase: 01-foundation
plan: 01
type: execute
wave: 1
depends_on: []
autonomous: true
requirements:
  - DSGN-01
  - DSGN-02
must_haves:
  truths:
    - "Dark background (#000) and accent (#00d4ff) defined as CSS custom properties"
    - "Space Grotesk font renders with bold headings and neon glow"
  artifacts:
    - path: "src/app/globals.css"
      provides: "Full design token system via @theme inline"
    - path: "src/app/fonts.ts"
      exports: ["spaceGrotesk"]
  key_links:
    - from: "src/app/layout.tsx"
      to: "src/app/fonts.ts"
      pattern: "spaceGrotesk\\.variable"
---

Below the frontmatter, tasks use XML structure that agents parse directly:

<task type="auto">
  <name>Task 1: Scaffold Next.js 16 with Tailwind v4</name>
  <files>package.json, postcss.config.mjs, src/app/globals.css</files>
  <action>
    1. Run npx create-next-app@latest with TypeScript and App Router
    2. Install tailwindcss @tailwindcss/postcss motion
    3. Create postcss.config.mjs with @tailwindcss/postcss plugin
    IMPORTANT: Use `motion` not `framer-motion`. Use `@tailwindcss/postcss` not `tailwindcss`.
  </action>
  <verify>
    <automated>npm run build 2>&1 | tail -20</automated>
  </verify>
  <done>Next.js builds with Tailwind v4 and Motion v12. No v3 artifacts.</done>
</task>

Every task has three parts: what to do, how to verify it worked, and what "done" means. The verification is automated — shell commands that the agent runs to prove correctness.

Step 5: Execute with Fresh Subagents

Run /gsd:execute-phase. This is where the architecture pays off.

GSD doesn't run all tasks in one long Claude session. It spawns a fresh subagent for each plan, giving it a clean 200K token context window. This solves the biggest problem with AI coding: context rot.

Loading diagram...

Context rot happens when Claude's quality drops as the context window fills up. Peak quality is at 0-30% context usage. By the time you're 50+ tasks into a session, the AI is working with degraded context — forgetting earlier decisions, repeating mistakes, losing coherence.

GSD's solution: Task 50 gets the same clean context as Task 1. The executor agent reads only the PLAN.md it needs, executes the tasks, creates an atomic git commit per task, and writes a summary. Then it's done. The next plan gets a fresh agent.

Each task gets its own commit:

306b99a feat: scaffold Next.js 16 with Tailwind v4
48e2f99 feat: establish design token system with Space Grotesk
52b11d0 feat: motion system with MotionProvider and PageTransition

If something breaks, git bisect finds the exact failing task. You can revert a single task without losing everything else.

Step 6: Verify Against Goals

Run /gsd:verify-work. A fresh verifier agent — separate from the one that wrote the code — checks every success criterion, every must_haves truth, every artifact, and every key link.

Loading diagram...

From my Phase 1 verification:

TruthStatusEvidence
Dark background and accent color visibleVERIFIEDglobals.css: --color-background: #000000, --color-accent: #00d4ff
Typography renders with glow effectsVERIFIED.text-glow-accent with 3-layer neon text-shadow
Animations respect prefers-reduced-motionVERIFIEDMotionProvider.tsx: <MotionConfig reducedMotion="user">

The verifier also runs anti-pattern detection — searching for TODO comments, empty stubs, orphaned exports, and version mismatches.

If something fails, you don't manually debug. GSD spawns diagnostic agents that find the root cause and auto-generate fix plans — ready for immediate re-execution with /gsd:execute-phase. You describe what's wrong, the system figures out why, and hands you a plan to fix it.

My v1.0 audit: 28/28 requirements satisfied, 10/10 must-haves verified, zero gaps.

Why This Works Better Than Prompting

Three reasons:

1. Specs eliminate ambiguity. When the plan says "create src/app/globals.css with @theme inline containing --color-accent: #00d4ff," there's zero room for interpretation. The agent produces exactly what was specified. No "I assumed you meant..." moments.

2. Fresh contexts prevent degradation. A single long Claude session progressively gets worse. GSD's subagent architecture means every plan gets peak-quality AI output. This is the single biggest technical innovation — it's why GSD-built projects hold together at scale.

3. Verification catches drift. The verifier agent has never seen the code before. It reads the success criteria and checks the codebase with fresh eyes. This is the AI equivalent of "don't review your own PR" — the builder and the checker are separate agents with separate contexts.

Writing Good Specs: What I've Learned

After building multiple projects this way, here's what makes specs effective:

Be specific about files and paths. Don't say "create a font configuration." Say "create src/app/fonts.ts that exports spaceGrotesk with variable: '--font-sans'." The executor agent should never guess which file to create or where to put it.

Write success criteria as user-observable behaviors. "Particles respond to mouse movement" is testable. "Implement a performant particle system" is not. The verifier needs concrete assertions it can check.

Declare what's NOT in scope. Scope creep kills AI-assisted projects faster than anything. My PROJECT.md lists 6 out-of-scope items with explicit rationale. The executor won't randomly add a theme toggle because it knows that was deliberately excluded.

Use must_haves.key_links for integration testing. The plan's key_links section defines regex patterns that prove files are correctly wired together. pattern: "spaceGrotesk\\.variable" in layout.tsx proves the font is actually applied, not just imported.

Keep tasks small. Each plan should have 2-3 tasks. Each task should fit comfortably in 50% of a context window. If a task needs more than 10 steps, it's two tasks.

Beyond Phases: Quick Mode and Milestones

Not everything needs the full workflow.

Quick Mode for Day-to-Day Work

Run /gsd:quick for bug fixes, small features, config changes, or one-off tasks. It uses the same planner and executor agents at the same quality — but skips research, plan checking, and verification. Quick tasks track separately in .planning/quick/ so they don't clutter your milestone phases.

/gsd:quick
> What do you want to do? "Add reading time estimate to blog post cards"

Same atomic commits, same fresh subagent context, same guarantees — just faster.

The Milestone Lifecycle

Phases live inside milestones. When all phases in a milestone are complete, GSD has a structured close-out process:

Loading diagram...

/gsd:complete-milestone archives all planning artifacts, tags the release in git, and writes to RETROSPECTIVE.md — capturing what worked, what was inefficient, and patterns established. These lessons feed forward into future milestones so the system gets smarter over time.

Cross-Session Memory with STATE.md

GSD tracks your exact position in STATE.md — which milestone, which phase, which plan, what decisions have been made, what blockers exist. When you close your terminal and come back tomorrow, /gsd:progress reads STATE.md and tells you exactly where you left off and what to do next. No context lost between sessions.

The Real Results

I shipped this portfolio in a single session: 5 phases, 13 plans, 28 requirements, 2,870 lines of code across 105 files. The verification audit found zero gaps. The retrospective captured 5 key lessons that feed forward into future projects.

Could I have built it without GSD? Sure. Would it have taken one session with zero rework? Absolutely not.

The methodology scales beyond portfolio sites. I use it for NovaMX, for internal tools, for any project where "it compiles" isn't a sufficient definition of done. The structure is the same regardless of project size:

FileRoleQuestion It Answers
PROJECT.mdVision & scopeWhat are we building and why?
REQUIREMENTS.mdTraceable requirementsWhat must be true, mapped to phases?
ROADMAP.mdPhase breakdownWhen does each piece get built?
STATE.mdSession memoryWhere did we leave off?
CONTEXT.mdLocked decisionsHow should this phase look and behave?
PLAN.mdExecutable specHow exactly does each task get implemented?
VERIFICATION.mdGoal-backward proofDoes it actually work as specified?
RETROSPECTIVE.mdLessons learnedWhat do we carry forward to the next milestone?

If you're already using Claude Code, read my guide on agents and skills — they're the execution layer that GSD orchestrates. Check out my tools and stack for the full development setup.

Start small. Pick your next feature and write a spec before coding. Define what "done" looks like before you start. You'll never go back to prompting blind.


Using GSD or building with spec-driven workflows? Reach out — I'm always interested in how other engineers structure their AI development process.