GitHub All-Stars #5: Spec-Kit – How GitHub tames AI-coding chaos with Spec-Driven Development
Artur Skowroński
Head of Java/Kotlin Space
Published: Sep 24, 2025|22 min read22 minutes read
I’ve always claimed there’s no better way to learn than by building something from scratch… and the second-best way is reading through someone else’s code 😁.
At VirtusLab, we recently had a sobering thought - our “collection” of starred projects had grown to massive proportions, without bringing real value to us or the wider community. So we decided to change that: add a bit of regularity and become chroniclers of these open-source gems. That way, we’ll better understand them and discover the ones where we can actually contribute.
Every Wednesday, we pick one trending repository from the past week and give it attention by preparing a tutorial, article, or code review – learning from its creators in the process. We focus on whatever piques our interest: it could be a tool, a library, or anything the community deems worth publishing. One simple rule applies – it has to be a new or lesser-known project, not the big, widely recognized giants that rack up thousands of stars after a major update.
Today, we’re diving into a fresh project from GitHub engineers: github/spec-kit.
We’ve all been there. That frustrating yet strangely familiar dance with an AI assistant we call “vibe coding.” It starts with a simple prompt, followed by a series of increasingly desperate attempts to clarify what we actually want to achieve. The model generates code that looks correct, compiles, but subtly misses the point - ignoring key architectural constraints or simply failing to grasp the broader project context. It’s a chaotic, unpredictable process that doesn’t scale well.
In response to this chaos, Spec-Driven Development (SDD) was born—a disciplined, engineering-first approach that doesn’t reject AI but instead seeks to harness its power in a predictable way. Rather than treating AI as a magical black box, SDD forces us back to the roots of good engineering craft: precisely defining requirements before writing a single line of code.
github/spec-kit is GitHub’s official open-source toolkit designed to put this methodology into practice. It’s not just another language model, but a carefully curated collection of templates, scripts, and a command-line interface (CLI) that works with a variety of AI agents, such as GitHub Copilot, Claude, and Gemini.
In this article, we’ll break spec-kit down to its core components. We’ll explore its philosophy, architecture, and the powerful engineering patterns behind it to understand how GitHub is trying to transform chaotic “vibe coding” into a structured software development process.
To understand why spec-kit was created, we need to step back and look at the fundamental problem it solves. Traditionally, in software engineering, the ultimate source of truth has been the code. Documentation is often outdated, and the original business requirements get lost in the maze of implementation decisions. spec-kit proposes a radical shift in this perspective: the source of truth should not be the code itself, but the durable, versioned, and human-readable intent behind that code.
This project is a direct response to the limitations of spontaneous prompting. It addresses the problem of mismatched assumptions and lack of shared context, which plague development teams. By forcing requirements to be clearly defined from the very beginning, it creates a verifiable contract describing how the code should behave. This contract—the specification—becomes the “lingua franca” of the entire process, reducing ambiguity, guesswork, and errors when the AI agent moves into implementation.
This paradigm shift is the heart of Spec-Driven Development. It marks the transition from treating a specification as a one-off artifact to making it “executable”—a document that directly drives code generation, testing, and validation.
The following table synthesizes this fundamental shift, contrasting the chaotic ad-hoc approach with the structured methodology of SDD. The comparison makes it clear that SDD is not a cosmetic fix, but a profound change in workflow, the developer’s role, and the very nature of project artifacts.
Aspect
"Vibe Coding" (Ad-Hoc Approach)
Spec-Driven Development (with Spec-Kit)
Source of Truth
The developer’s fleeting thought; the last typed prompt.
Versioned spec.md and plan.md files.
Process
Unstructured, iterative trial and error.
A closed, four-phase process: Specify → Plan → Tasks → Implement.
Outcome
Often unpredictable, non-idiomatic code that just “looks right.”
Verifiable, consistent code that respects architectural constraints.
Developer’s Role
Prompt engineer, AI output debugger.
Architect, specification author, and validator of AI-generated artifacts.
Scalability
Hard to scale beyond small tasks; context is easily lost.
Designed for entire features and projects; context is managed and preserved.
At first glance, one might look for spec-kit’s architecture in its Python source code. Yet the true innovation and structure of this project do not lie in complex classes or modules, but in the rigorously defined, sequential workflow it enforces. The methodology is the architecture. The CLI tool is merely the orchestrator of this process, creating a tangible, auditable trail from the high-level intent (spec.md) all the way to the concrete implementation. This mirrors the classic goal of mature software engineering disciplines - now applied to the world of AI.
We have seen these two often
This process is divided into four closed phases. The key principle: you never advance to the next stage until the current one has been fully validated by a human. This is the main control mechanism that brings order to an otherwise potentially chaotic interaction with AI.
Phase 1: Specify – Defining the "What" and the "Why"
Everything begins with capturing the essence of the problem. This phase is about framing requirements from the user’s perspective. It does not deal with the tech stack or application design. Instead, it focuses on goals, anti-goals (what we deliberately exclude), personas, user journeys, and acceptance criteria.
Artifact: The main product of this phase is the spec.md file. This is not a static document but a “living artifact” that evolves with the project.
Process: The developer runs the /specify command with a general description of the functionality. Guided by spec-kit templates, the AI agent generates a detailed specification. The engineer’s role is to verify, refine, and approve it.
Phase 2: Plan – Designing the "How"
Once the specification is validated, the focus shifts to technical considerations. In this phase, the architecture, tech stack, data models, API contracts, and non-functional requirements are defined. Here, architectural decisions are codified in a machine-readable way.
Artifacts: A set of documents, such as plan.md, data-model.md, and api-spec.json are created. These files represent the technical plan for implementation.
Process: The developer runs the /plan command. Using the approved spec.md as context, the AI agent proposes a detailed technical plan. Once again, human verification and refinement are critical.
Phase 3: Tasks – Decomposing the Plan
A high-level technical plan is too large to hand over to an AI agent for a one-shot implementation. The third phase breaks it down into small, atomic, verifiable, and executable tasks.
Artifact: The result is a tasks.md file or an equivalent task list.
Process: The /tasks command analyzes plan.md and generates a granular list of steps to structure the implementation process. Each task should be small enough that its output (the generated code) is straightforward to review.
Phase 4: Implement – AI-Guided Execution
Only now, with solid foundations in the form of the specification, plan, and task list, does actual coding begin. Yet the bulk of the code is not written by the developer.
Process: The engineer systematically feeds the AI agent with individual tasks from the list. With full context from the earlier phases, the agent generates code for each small, well-defined problem. The developer’s role is to verify the generated code, run tests, and integrate it with the rest of the project.
Let’s now move from the philosophy and process architecture to the concrete patterns and techniques that make spec-kit such a powerful tool. This is where the real value lies for the engineer who wants to understand how everything works under the hood.
Pattern 1: Intent as Code – Declarative programming through spec.md
spec-kit treats natural language written in Markdown files as a form of declarative programming. Instead of writing imperative code that tells the computer how to do something, the developer declares the desired outcome in spec.md. They describe user journeys and acceptance criteria, and the system (human + AI) is responsible for translating this declaration into working code.
This is analogous to the declarative prompt style from the LangExtract example a few editions ago, but applied at a much higher level of abstraction—not just to data extraction, but to the entire software development process.
Below is a hypothetical yet realistic excerpt of a spec.md file for the “Taskify” application, inspired by examples from the documentation:
5Enable underwriters to capture applicant data (personal details, financials, risk factors).
6Provide automated risk assessment suggestions based on predefined rules and AI scoring models.
7Allow human underwriters to review, adjust, and approve/reject applications.
8Maintain a full audit trail of all actions for compliance purposes.
9Provide a dashboard for tracking the status of applications (Pending, In Review, Approved, Rejected).
10
112. User Journeys
12
132.1. Submitting a new insurance application
14
15The user (agent/broker) clicks the "New Application" button.
16A form opens with fields: Applicant Details, Policy Type, Coverage Amount, Health/Financial Disclosures.
17The user fills in the fields and clicks "Submit".
18The application enters the Pending state and is visible in the dashboard.
19
202.2. Reviewing and scoring an application
21
22The underwriter opens an application from the Pending list.
23The system displays both applicant data and an AI-generated risk score with supporting factors.
24The underwriter can adjust inputs, request additional documents, or override the suggested score.
25The underwriter clicks "Approve" or "Reject".
26
272.3. Audit and compliance check
28
29A compliance officer accesses the Audit Trail section.
30All actions (submit, score, adjust, approve/reject) are timestamped and tied to specific users.
31The officer exports a compliance report in PDF format.
32
333. Acceptance Criteria
34
35The application must load in under 3 seconds.
36All application data must be persisted securely in the database with encryption at rest.
37Risk scoring must complete within 2 seconds after submission.
38Each approval/rejection must generate an immutable audit log entry.
39The dashboard must refresh in real time when application statuses change.
Pattern 2: Architectural Guardrails – Persistent Context with constitution.md
One of the biggest challenges when working with LLMs is their limited memory and tendency toward “context drift”—forgetting the initial instructions during long conversations. spec-kit solves this problem elegantly with the memory/constitution.md file.
This file serves as a mechanism for defining stable, non-negotiable rules and constraints for the entire project. These might include rules such as “always use .NET Aspire and Postgres”, “all API endpoints must have unit tests”, or “the interface must comply with our design system”. The file is automatically attached to the context in key phases (especially Plan), acting as a set of guardrails that keep the AI agent on track.
This is a powerful architectural pattern. Instead of relying on the model to “remember” critical decisions, we externalize them into a durable, versioned document. constitution.md becomes a form of long-term memory for the AI agent—a fixed anchor that prevents it from drifting off course due to the temporary context of a single task.
Example content of constitution.md:
1Project Constitution
2General Rules
3Language: TypeScript
4Frontend Framework: React with Vite
5Styling: Tailwind CSS
6Code Quality: Prettier and ESLint configurations from the repository must be applied.
7Testing
8Every new feature must have at least 80% unit test coverage.
9E2E tests must be written for all critical user flows.
10Architecture
11The backend must be based on a microservices architecture.
12Communication between services must use asynchronous events.
1/specify "Create a simple task management application..."
Plan generation
1/plan
Task generation
1/tasks
Beneath this simple surface, the CLI manages the entire directory structure (.github, docs, memory, specs, templates), ensuring that the right files are created in the right places and that context flows correctly between the different phases.
Pattern 4: Conversation Templating – The Prompt Engineering Engine
At the heart of spec-kit’s flexibility and its support for multiple agents is the templates directory. Instead of hardcoding prompts, the project uses an advanced templating system. It includes separate template packages for different AI agents (Claude, Copilot, Gemini) and even for different shell environments (POSIX and PowerShell).
Each template file (e.g., spec-template.md) contains markers and instructions that the CLI dynamically fills with context from constitution.md, the user’s prompt, and the contents of existing files, before sending the final, precisely constructed prompt to the language model.
29- Explicitly list what is out of scope to prevent scope creep.
30{{/anti_goals}}
31
32## 3. Personas
33{{#personas}}
34- **{{name}}** — {{role}}
35 Needs: {{needs}}
36 Pain points: {{pains}}
37{{/personas}}
38{{^personas}}
39- TBD: Define 2–3 key personas (name, role, needs, pains).
40{{/personas}}
41
42(....) // Original template is much longer - models do not read in minds
This pattern illustrates a mature approach to prompt engineering. It treats prompts not as one-off incantations, but as versioned, reusable, and parameterizable assets. This transforms prompt engineering from an art into a repeatable engineering discipline.
Here’s the English translation of your passage:
Collision with Reality: Community Feedback and Practical Limitations
No expert report would be complete without a look at the practical realities of using the tool. spec-kit is not a silver bullet, and its adoption comes with certain challenges - clearly visible in GitHub community discussions.
Overkill for small tasks: Many users point out that for minor changes or bug fixes, going through the full four-phase SDD process feels bureaucratic and inefficient. The tool shines brightest when building new features or refactoring large parts of a system.
Context is king (and it’s hard): The effectiveness of spec-kit depends heavily on the developer’s ability to manage context. If the specification is unclear or the plan too vague, the AI agent will generate chaotic code or unnecessary files. The responsibility for providing precise, well-filtered context rests with the developer.
Redefining work, not eliminating it: spec-kit doesn’t reduce the amount of work - it changes it. Time spent writing implementation code drops drastically, but the time invested in planning, writing specifications, and validation increases. The balance shifts from ~80% coding to ~50% planning, 20% coding, and 30% verification. Not every team may be ready for this change.
Still experimental: GitHub itself labels spec-kit an “experiment.” The project is under active development, with frequent releases and an engaged community reporting issues and suggesting improvements. It is not yet a fully mature, polished product.
Analyzing spec-kit offers invaluable lessons about what the future of our profession might look like. It is not just a tool, but a manifesto for a new way of thinking about software creation.
From coder to architect: The engineer’s primary role shifts from writing implementation details to defining intent, architecture, and constraints. The most valuable activity becomes crafting a crystal-clear spec.md and a thoughtful constitution.md.
The power of staged refinement: spec-kit embodies a fundamental principle of solving complex problems: break the big problem into smaller ones, validate each step, and use automation (in this case, AI) to execute them. It replaces the risky “one-shot” strategy with iterative confidence-building.
Reliability engineering for AI: SDD can be seen as a form of reliability engineering applied to the AI-driven development process. Closed phases, explicit checkpoints, and persistent context are mechanisms designed to tame the nondeterministic nature of LLMs and ensure predictable, high-quality results.
Prompt engineering as a discipline: This project exemplifies how prompts should be treated as first-class engineering artifacts—versioned, templated, and managed with the same care as source code.
github/spec-kit and the Spec-Driven Development methodology are far more than just another gadget for programmers. They represent a bold step toward a mature, predictable, and scalable way of building software with the help of artificial intelligence. The project demonstrates how the raw power of language models can be combined with the discipline and rigor of traditional software engineering.
spec-kit suggests that the future of software engineering may lie less in mastering a specific programming language and more in the art of conducting precise, structured, and verifiable dialogues with AI.