The next massive software failure will probably not come from a missing line of code. It will come from a missing sentence.
Right now, developers are using AI agents to write code faster than they can actually read it. I’m doing it too! The latest models can generate 1,000+ LoC in seconds. But that’s not the real problem. The problem is that speed removes friction. And friction used to be where thinking happened.
Recently, GitHub ran controlled experiments where developers using Copilot finished tasks faster (with numbers often quoted around “up to 55%”).1 And that sounds like a dream. Until you realize that speed without direction is just a faster way to reach the wrong place. In other studies, experienced developers got slower when AI increased review load, coordination cost, and rework.2 So the real question is not “Can AI write production-ready code?” It definitely can!
The question is:
“Can we keep AI speed without turning our systems into a high-throughput confusion factory?”
That is where Spec-Driven Development (SDD) shows up. The idea is simple: write a clear, structured specification (“spec”) before the code, so that AI gets directed in the right way and remain withing well defined constraints instead of “vibes.”
You many wonder:
Are we just reinventing 1990s Waterfall, but this time with an AI-powered engine on top?
In this post, I’ll cover 3 things:
- What SDD actually is and why vague thinking is now your biggest technical liability.
- The three levels of SDD and the specific one that quietly traps most teams.
- A practical checklist to decide which features deserve a real spec and which ones you should just build fast and move on.
By the end, you’ll know whether SDD is a real shift in how software gets built, or just another heavyweight idea that sounds smart and wastes everyone’s time.
What is SDD?
In SDD, the specification is the primary artifact.
Not the code. Not the framework. Not the Jira tickets. The spec is the source of truth and everything else is downstream.3
And no, the spec is not a vague paragraph or a wish list. A real SDD spec is a structured, testable description of behavior.
The spec defines:
- What the system must do (and what it must not do)
- Edge cases
- Business rules
- Failure modes
- An explicit definition of done (DoD)
The goal is to guarantee unambiguous behavior from the AI coding agent.
Why is SDD Relevant Now?
SDD is not new. What’s new is the environment it now lives in.
In the past, we used to say “code is the bottleneck.” Research in requirements engineering has been blunt for years: a big chunk of effort ends up as rework, and unclear or shifting requirements are a major driver.4
But now code is cheap, and ambiguity is expensive. AI raises the stakes because it will happily translate fuzzy intent into crisp code without asking clarifying questions. Think of an AI agent like a brilliant intern who never says “I don’t understand.” If your requirements are fuzzy, it won’t stop. It will confidently generate a clean, professional-looking solution that could be catastrophically wrong.
And there’s a second issue that feels almost philosophical: Humans transmit information slowly. We type maybe 40 words per minute on a good day. That speed forces reflection. But AI can generate 2,000 lines in minutes. It removes the pause where thinking used to happen.
When thinking disappears, ambiguity leaks straight into production. This is why some researchers now treat prompts as a form of requirements, and argue that classic requirements methods will become even more valuable in the generative era.5
Same problem → Higher speed → Bigger blast radius.
SDD vs Waterfall
Yes, SDD feels like Waterfall, you are not crazy. They both say “think before you build.” But they diverge on the most important thing: Waterfall assumes you can predict the future. SDD assumes you cannot, so it builds a fast loop where the spec, tests, and code evolve together. Modern SDD discussions emphasize tight iteration and executable checks, not month-long requirements phases.6 So yes, it looks like Waterfall if you only look at the order of steps. It behaves like something else if you look at the feedback loop.
A Matter of Intent
Let’s be honest: most bugs are not code bugs. They are argument bugs.
You did not write incorrect logic. You postponed a decision until it exploded in production.
Example: a payment gateway requirement says:
“The system must block risky transactions.”
One person interprets “risky” as “high-risk country.” Another interprets it as “amount above $1,000.” Both are reasonable. But both can be wrong in different ways.
Research shows this is not just anecdotal. Practitioners interpret conditionals in requirements inconsistently, even when they believe they are being precise.7
SDD forces that decision to happen early. When stakeholders are calm, when assumptions are visible, and when the codebase still has exactly zero lines of code.
SDD as a Pipeline
Spec-Driven Development is not about writing more documentation. It’s about removing ambiguity before the agent starts making decisions for you.
A practical workflow looks like this:
%%{init: {'theme':'base'}}%%
flowchart TB;
S["Spec (requirements + constraints + edge cases)"] --> P["Plan (tasks)"]
P --> I["Implementation (AI agent + humans)"]
I --> T["Verification (tests, properties, gates)"]
T -->|pass| M["Merge + Deploy"]
T -->|fail| S
subgraph "What changes with AI"
S
P
T
end
Requirements
Describe what the user experiences from the outside (not how the system works).
Example:
“When a user submits a support ticket, they receive a clear and empathetic response within five seconds. The response must reference their issue and suggest a concrete next step.”
Design
Define structure and boundaries. What information can the agent see? What tools can it use? What is it not allowed to do?
Example:
“The agent can read the ticket text and query the internal FAQ. It cannot invent policies, promise refunds, or escalate issues on its own.”
Tasks
Tasks must be explicit, sequential, and boring.
Example:
1. Extract the user’s core problem in one sentence.
2. Match it against FAQ categories.
3. Select the most relevant solution.
4. Generate a response using approved tone guidelines.
No “think carefully.” No “use best judgment.” Agents do exactly what you specify. Nothing more.
Build
Now you turn the spec into checks. Does the response reference the user’s issue? Is the response under 120 words? If it fails, the build fails. That’s the whole loop. It is intentionally simple. Because when working with non-deterministic tools, simplicity is not a style choice. It’s a reliability strategy.
The Three Levels of SDD
Not all SDD looks the same. In practice, there are three levels:
Level 1: Spec-First
You write a spec to clarify your own thinking, use it to guide the agent, then move on. Perfect for MVPs and experiments. The spec is useful even if it dies tomorrow.
Level 2: Spec-Anchored
The spec is a living artifact. If the code changes, the spec must change. This is where high-performing teams tend to land because it balances flexibility with discipline. GitHub’s Spec Kit explicitly pushes the idea of specs that can drive implementation and verification, not just explanation.8 This is also where most teams quietly fail. Not because they cannot write specs. Because they cannot keep them alive.
Level 3: Spec-as-Source
You edit the spec, and tooling regenerates the implementation. This is the low-code dream. Some newer tools and platforms point in this direction by making the spec the center of gravity.9
⚠️ My take: Level 3 is dangerous right now for complex systems. When the spec becomes the source of truth, the spec becomes the codebase. Meaning if you cannot express intent precisely (and natural language is notoriously ambiguous), you are moving bugs upstream and making them look like prose. At that point, you reinvented a programming language. But fuzzier. And some of the loudest skepticism you’ll hear about SDD is basically this argument, stated more rudely.10
Reality Check
🟢 Green light
SDD excels at:
- New features
- Greenfield systems
- Modernization efforts
When the slate is clean, a clear spec is a superpower.
🔴 Red light
SDD struggles with deeply entangled legacy systems. If logic is spread across five services, three cron jobs, and one senior engineer’s memory, a “clean spec” is not a document problem. It’s an archaeology problem. AI agents will suggest elegant refactors that ignore the decade of hacks quietly keeping the business alive.
Also, a personal confession:
Sometimes I can understand code faster than I can understand a folder full of Markdown. That might be because we have decades of tooling to navigate code, and far less mature tooling to navigate living specs.
So yes, SDD needs tooling support, not moral superiority. And there’s another reason SDD is having a moment: AI-assisted development can increase duplication, churn, and architectural drift when you do not constrain the agent with explicit rules and boundaries.1112
Here is an example of a simple spec verification in Java:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
public final class SupportReplySpec {
// Spec constraint: keep responses short and reference the user's issue.
// This is not "AI evaluation." This is build gating.
public static void validate(String ticketText, String reply) {
if (reply == null || reply.isBlank()) {
throw new IllegalArgumentException("Reply must not be empty");
}
int wordCount = reply.trim().split("\\s+").length;
if (wordCount > 120) {
throw new IllegalArgumentException("Reply must be <= 120 words (was " + wordCount + ")");
}
// Naive overlap check (good enough to catch obvious failures).
String[] ticketWords = ticketText.toLowerCase().split("\\W+");
String replyLower = reply.toLowerCase();
int overlaps = 0;
for (String w : ticketWords) {
if (w.length() >= 5 && replyLower.contains(w)) {
overlaps++;
if (overlaps >= 2) {
break;
}
}
}
if (overlaps < 2) {
throw new IllegalArgumentException("Reply does not reference the user's issue enough");
}
}
}
When to Use SDD
Run this three-point check:
- Risk: Does this touch money, security, or permissions? Use SDD.
- Longevity: Will someone else maintain this in a year? Use SDD.
- Exploration: Are you probing an API or testing an idea? Skip SDD. Just code.
In my opinion, the future isn’t “always spec-driven.” The future is spec-driven when being wrong is expensive.
Footnotes
GitHub, Research: quantifying GitHub Copilot’s impact on developer productivity and happiness (2022). ↩
METR, Measuring the impact of AI on experienced developers (2025). ↩
Thoughtworks, Spec-driven development (Technology Radar technique entry, 5 Nov 2025). ↩
Lars-Ola Damm, Lars Lundberg, and Claes Wohlin, A Model for Software Rework Reduction through Improved Requirements Engineering Practice (Journal of Systems and Software, 2008). ↩
Andreas Vogelsang, From Specifications to Prompts: On the Future of Generative LLMs in Requirements Engineering (IEEE Software column preprint, 2024). ↩
Thoughtworks, Spec-driven development: Unpacking one of 2025’s key new AI-assisted engineering practices (4 Dec 2025). ↩
Jannik Fischbach et al., How Do Practitioners Interpret Conditionals in Requirements? (arXiv, 2021). ↩
GitHub, Spec-driven development with AI: Get started with a new open source toolkit (2 Sept 2025). ↩
Tessl, Tessl launches spec-driven framework and registry (23 Sept 2025). ↩
Discussion thread capturing practitioner skepticism about SDD-as-source and ambiguity in natural language, Hacker News thread: “Spec-Driven Development (SDD)” (Sept 2025). ↩
GitClear, AI Copilot Code Quality: 2025 Look Back at 12 Months of Data (report, 2025). ↩
Thoughtworks, Complacency with AI-generated code (Technology Radar technique, 23 Oct 2024). ↩


