Skip to content
AI Architecture
11 min readVuong Ngo

Spec-Driven Development Tools: Where AI Project Memory Lives Is the Only Decision That Matters

The spec-driven development tool landscape grew from 6 to 13 tools in a single community repo, not because the field is converging, but because three communities are building on incompatible assumptions about where AI project memory should live.

Spec-Driven Development Tools: Where AI Project Memory Lives Is the Only Decision That Matters

Someone built a community comparison repo to manage the explosion of spec-driven development tools. The repo started with 6 tools. By June 2026, it listed 13. The map of the maze became part of the maze. [3]

That number crystallizes a frustration a developer named vladgur put into words in an Ask HN thread: "overwhelmed with a variety of SDD tools" and no practical way to choose between them "other than trying them all." [1] He was not wrong. The comparison repo exists because the community felt the same thing.

Here is what I think is actually happening. The explosion of spec-driven development tools is not a feature race. It is three incompatible bets about where the durable contract of the work should live. Once you see that distinction, the comparison table stops being the decision.

The question that matters is not "which SDD tool has the best features?" It is "where does the contract of the work live when this session ends?" [8]

If you have already read about why AI coding agents lose context across sessions, this post is the ecosystem follow-on: a map of what the field has built to solve that problem, sorted by the underlying architectural assumption rather than the feature list.

Three incompatible bets about where AI project memory lives: spec file, git history, and external board.

Why the Feature Table Leaves You Stuck

If you have compared SDD tools recently, you have probably landed on a feature matrix. Worktree support: yes or no. Agent count: one or many. Spec format: markdown or YAML. Custom constitution: supported or not.

Those are real features. They are also the second question.

The community spec-compare repo, the one that grew to 13 tools, is explicit about this in its own conclusions: tool selection depends on the use case, not a universal winner. Its scenario-based recommendations already fragment by context. Worktrees go to one tool, simplicity to another, enterprise to a third, greenfield to a fourth. [3]

MarkTechPost's "9 Best AI Tools for Spec-Driven Development in 2026" is a clean example of the format that leaves developers stuck. The roundup works as a menu. It does not help you answer which architectural bet to make first. [7]

Rick Hightower's independent analysis of GSD, Spec Kit, OpenSpec, and Taskmaster AI comes closest. It names the divergence directly: the tools "diverge" on approach, not just features. But even that analysis ultimately resolves into a feature breakdown once the divergence is identified. The underlying question stays unanswered. [6]

The frustration vladgur described is legitimate. You can read every comparison piece available and still not have an answer, because the comparison pieces are answering the wrong question. They tell you what each tool does. They do not tell you which bet to make before you evaluate what a tool does.

The spec-compare community repo grew from 6 to 13 SDD tools in 2026.
The spec-compare community repo grew from 6 to 13 SDD tools in 2026. Source: cameronsjo/spec-compare.

Three Spec-Driven Development Tool Models

The tools in the 2026 SDD landscape are not just competing on features. They are built on three different answers to the same underlying question: when this session ends, where does the contract of the work live?

Model A: The Spec File as Source of Truth

GitHub's Spec Kit, OpenSpec, and GSD share one core assumption: the spec document is the durable contract. The agent reads it. The developer updates it. A session restart is safe because the spec is an explicit, version-controlled file on disk. The spec is the memory. The restart is the design.

GitHub's framing of this approach is direct: SDD "inverts the usual order so the spec is the source of truth that generates code." [4] Spec Kit extends the model with a "constitution" layer for non-negotiable project principles that apply regardless of which agent is running or how the session started. [2]

The operational picture is best captured by sermakarevich, another developer in the same HN thread: "write specs first, decompose, implement separately, restart the session after each step because requirements are materialized in specs." [1] That sentence describes the model better than any feature table. The explicit restart is not a workaround. It is the intended flow.

The cost of this model is equally explicit: the spec has to be maintained. The developer owns the update cycle. If the spec drifts from what the agent actually built, the session-restart guarantee breaks down. The model works well when one developer owns both the spec and the implementation. It requires more coordination when multiple people or multiple agents are writing to the same spec.

Model B: Git History as Durable Record

A second community has concluded that the right answer is already in your repository. All agent state lives in markdown files and git history. No external service, no database, no subscription fee. The record is the repo. The audit trail is the commit log.

One contributor in an HN thread on agent architecture described this approach as "simpler than MCP, more transparent than a database." The argument is that your entire workflow history becomes auditable through the same tools you already use for code review. [5]

The appeal is real: zero setup cost, no service dependency, no new infrastructure to maintain. The constraint is equally real: the model requires disciplined commit hygiene. The agent has to commit incrementally and meaningfully, not in large undifferentiated blobs. If the commits are noisy, the "audit trail" is a fiction.

For solo developers who already commit carefully and work in bounded, single-session chunks, this model is often sufficient. The spec-file model and the git-state model are close relatives: both assume the state the agent needs is already somewhere in the local development environment.

Model C: External Shared State / Board over MCP

A third group has concluded that the spec file and the git history are both solving the wrong level of the problem. Project memory should not live inside the agent's context or inside the repo. It should live in a queryable external store the agent connects to over a protocol.

This is the bet Taskmaster AI, Conductor, and tools like Agiflow make. The agent queries a task board or workflow engine over a protocol like MCP and receives scoped, structured state: what is open, what is blocked, what was approved, what changed since the last session. The session can end at any point. The state is already external to both the agent and the repo. [6]

Agiflow is one instance of this model: a project board that connects AI assistants over MCP and supplies scoped board tools, shared state, artifacts, and workflow locks. The assistant stays the agent; Agiflow supplies the persistent state layer. The board-over-MCP approach addresses exactly this kind of multi-step AI workflow coordination.

Anthropic's guidance on context engineering provides the underlying rationale: context is a finite resource, and "durable structured state outside the window beats stuffing the prompt" with project history. [8] The board model takes that principle to its logical conclusion: nothing project-critical lives in the window at all.

The cost is the service. You need something running. That changes the math for a solo developer on a weekend project. It does not change the math for a team running multiple agents against the same codebase.

Comparison of three SDD context-storage models across five operational dimensions.
Comparison of Spec File, Git State, and External Board models across five operational dimensions. Source: synthesis from spec-compare community repo, HN threads, and Anthropic context engineering guidance.

What Each Model Actually Costs You

Classifying the models is only useful if you can match them to actual constraints. Here is what the trade-offs look like under pressure.

Session-restart safety is high across all three. That is the whole point of SDD. The difference is in recovery mechanics. Spec-file tools expect you to re-read the spec explicitly at the start of the next session. Git-state tools expect you to locate the right commit or markdown file. Board tools give the agent a protocol query. The speed and reliability of that recovery depends on how well you maintained the artifact in the first place. A stale spec and a sloppy commit log leave you in the same position.

Team sharing is the first real fork. Spec-file and git-state models share well when the spec is in the repo and the team commits consistently. Board models are shared by design: the board is the single source of truth for the whole team, with no coordination required beyond the protocol. If you have two or more people writing to the same codebase with AI assistance, the board model removes a whole category of "whose version of the plan is current?" problems.

Debugging transparency varies more than most comparisons acknowledge. A spec file is immediately readable: you open it in any editor and you understand the state. A git history is fully auditable but requires someone to know which commits to look at. A board is queryable, which is powerful for programmatic access, but the query log is the audit trail rather than plain human-readable history. None of these is strictly better; they are better for different inspection patterns.

Setup cost is the clearest differentiator. Spec-file and git-state models start with a file and a discipline, both of which you already have. Board models require a running service. That cost is recoverable at team scale, where the coordination benefits appear quickly. It is harder to justify for a solo developer working a bounded single-session problem.

Scale across agents is where the spec-file model hits its natural ceiling. It was designed for single-session, single-agent work: write the spec, implement, restart. Running multiple agents against the same spec file requires coordination the model does not provide. Git-state can work with enough commit discipline. Board models were built for exactly this scenario and assume multi-agent access as the default.

The spec-compare repo surfaces this pattern implicitly: its scenario-based recommendations fragment because different contexts genuinely favor different models. [3]

For the concrete ceiling that appears when you try to handle complex state inside the agent itself, the Claude Code Tasks experience is a clean illustration of why "in-tool memory has a ceiling" moves from a theoretical concern to a practical one. Local task state survives sessions on one machine. It does not cross machines, does not share across a team, and does not scale past the 3-task limit the documentation acknowledges. That is not a flaw; it is scope. But it means the in-tool model has the same structural limits as the spec-file and git-state models for anything beyond single-machine, single-agent work.

No universal winner. The right model depends on three questions:

  1. How many agents or reviewers work on the same codebase simultaneously?
  2. Do you need a queryable audit trail after the session ends?
  3. Are you willing to run a persistent external service?

Pick the model that matches your actual failure mode. Features follow from that answer.

What the Field Has Not Agreed On Yet

Thirteen tools in a comparison repo does not mean the field is converging. It means three communities are building on incompatible assumptions, and none of them is obviously wrong.

The spec-compare repo grew from 6 to 13 tools not because someone solved the problem. It grew because three different groups are each solving a different version of the problem. [3] The map expanded to cover more territory because the territory is genuinely contested.

The pattern Rick Hightower identified as "divergence" is structural, not temporary. The tools diverge because the underlying question has not been resolved: what is the right durable representation of AI project intent across sessions? [6] Until that question has a consensus answer, the tool landscape will keep fragmenting around the three models.

A consensus approach would need to satisfy four constraints at once:

  • Readable without tooling. Spec files win this. A markdown file is readable in any editor, by any reviewer, on any machine. No protocol, no running service.
  • Shareable without a service dependency. Git history wins this. Your team already has it. Every developer already knows how to use it.
  • Queryable after the fact. Board models win this. Structured state is retrievable with a query. Prose history requires human interpretation of the relevant commits.
  • Revision-controlled across team members. All three models handle this, with different amounts of friction.

No single model satisfies all four. That is why the field has not converged. Each model trades away one property to do well on the others.

The one protocol-level attempt at cross-tool convention is the Spec Kit "constitution" pattern: a file of non-negotiable project principles that any agent, in any tool, should read before starting work. [2] MCP-based task boards are a different attempt at the same handoff problem. Both are partial. Neither addresses the full four-constraint problem.

The HN thread that surfaced this fragmentation in the first place captures the situation accurately. The field does not lack tools. It lacks agreement on the question the tools are supposed to answer. [1]

That is not a failure of the tools. It is evidence that the underlying question is genuinely hard, and three reasonable answers are all partially right.


The spec-driven development tool market is not converging because three architecturally incompatible bets are all working in different contexts. Different teams are getting real value from spec-file approaches, git-state approaches, and board-over-MCP approaches, sometimes simultaneously in the same organization.

The comparison table is not useless. It is the second decision. The first decision is which context-storage model fits the way your team actually loses context, the way your sessions actually end, and the amount of infrastructure you are willing to maintain.

Once you have that answer, the feature table becomes legible. Without it, you are comparing thirteen tools on dimensions that do not map to the choice you actually need to make.

The companion post on why context is lost in the first place is the right prerequisite read: Why AI Coding Agents Lose the Plan.

If someone on your team is comparing SDD frameworks, send them this post before they pick based on the feature list.


Quick Reference

Spec FileGit StateExternal Board
Session-restart safetyHigh (explicit spec)High (commit log)High (protocol query)
Team sharingGood (in-repo)Good (discipline required)Native
Debugging transparencyReadable proseAuditable historyStructured query log
Setup costLowLowHigher (service required)
Multi-agent scaleSingle-session focusPossible with disciplineDesigned for it
Best forSolo, bounded sessionsSolo, audit-first teamsTeams, multi-agent, long workflows
---

References

[1] Ask HN: AI dev tech stack / workflow thread (vladgur, sermakarevich comments) - news.ycombinator.com/item?id=48413629

  • captured 2026-06-19.

[2] GitHub Spec Kit (official repository) - github.com/github/spec-kit

  • captured 2026-06-19.

[3] cameronsjo/spec-compare, Community SDD tool comparison - github.com/cameronsjo/spec-compare

  • captured 2026-06-19.

[4] Spec-driven development with AI: a new open source toolkit, GitHub Blog - github.blog

  • captured 2026-06-19.

[5] HN Mission Control thread (AlexCalderAI comment on git-state model) - news.ycombinator.com/item?id=47165602

  • captured 2026-06-19. [Claim, paraphrased; see RESEARCH.md note on verbatim re-verification.]

[6] Agentic Coding: GSD vs Spec Kit vs OpenSpec vs Taskmaster AI, Where SDD Tools Diverge (Rick Hightower, Medium) - medium.com/@richardhightower

  • captured 2026-06-19.

[7] 9 Best AI Tools for Spec-Driven Development in 2026 (MarkTechPost) - marktechpost.com

  • captured 2026-06-19.

[8] Effective Context Engineering for AI Agents, Anthropic - anthropic.com/engineering/effective-context-engineering-for-ai-agents

  • captured 2026-06-19.

Put this project board inside ChatGPT

Open Agiflow in ChatGPT to plan campaigns, create tasks, and check what needs attention. Create a free Agiflow account when you are ready to keep the board for your team.