The Builder Team: How AI Changes the Shape of Product Development

Most software organizations still describe themselves as cross-functional. But in practice, cross-functional often means a queue of functions.

Product defines the problem. Design shapes the experience. Frontend and backend engineering implement the system. DevOps deploys and operates it. Analytics measures what happened. QA finds what broke. Scrum, at its best, helped reduce the distance between these functions.

LLM-based development changes many of the steps in the product development process can now happen in parallel, in conversation with one another, and much closer to the original user problem. Requirements can become structured specs. Specs can become prototypes. Prototypes can become tests. Tests can become release gates. Release gates can become experiments. Experiments can become the next version of the requirements.

The future team is not a traditional squad with AI tools added on top. It is also not one person with a swarm of agents. The future team is a small group of high-context builders.

My working model: the ideal AI-native product development team is 4 to 6 people. Not 4 to 6 narrow specialists. Four to six builders.

A builder is not a product manager who learned to prompt, a designer who can generate mockups, or an engineer who uses Copilot. A builder is someone who can participate across the full path from user need to production outcome. They may have a depth spike in product strategy, UX, frontend, backend, infrastructure, experimentation, or data. But they are not limited to that spike.

They can reason about the customer. They can shape a spec. They can critique an interface. They can understand the system architecture. They can work with AI coding tools. They can evaluate quality. They can look at telemetry after launch and decide what should happen next.

The role becomes broader because the work becomes more continuous.

AI coding agents and assistants are increasingly useful across planning, design, build, testing, review, deployment, and maintenance. They can inspect codebases, identify dependencies, generate scaffolds, write tests, suggest edge cases, summarize logs, and propose fixes. But the human role does not disappear. It moves up the stack toward intent, judgment, architecture, quality, safety, and accountability.

The mistake is to treat AI as a productivity layer for the same old process. Give every function an AI tool, keep the same handoffs, keep the same ceremonies, keep the same tickets, and hope velocity goes up.

Some velocity will go up. But the deeper gains come from redesigning the system of work.

DORA’s 2025 research describes AI as an amplifier. It can improve throughput and product performance, but it can also magnify weakness in delivery stability when teams lack strong engineering practices, automated testing, feedback loops, and controls. Stack Overflow’s 2025 survey shows the same tension from a developer perspective: AI adoption is widespread, but trust is not. The most common frustration is that the output is almost right, but not quite.

That is why the next development team needs to be smaller, but not looser.

It needs more judgment per person.
It needs more product context per person.
It needs more quality discipline per unit of code.
It needs fewer handoffs and stronger shared ownership.

The builder pod

In a traditional team, the operating model is organized around roles.

In a builder pod, the operating model is organized around the product bet.

A typical 4 to 6 person builder pod might look like this:

One person has a product and outcome spike. They are strongest at customer understanding, strategy, prioritization, metrics, and experiment design.
One person has an experience and prototype spike. They are strongest at interaction design, information architecture, design systems, accessibility, and rapid validation.
One or two people have full-stack system spikes. They are strongest at architecture, data models, APIs, frontend implementation, performance, and maintainability.
One person has a platform, reliability, and operations spike. They are strongest at CI/CD, infrastructure, observability, security, deployment, and cost controls.
One person, sometimes shared across pods, has an evaluation and data spike. They are strongest at test strategy, product analytics, experimentation, model evaluation, and quality gates.

These are not rigid job descriptions, everyone can move through the whole loop: user requirements, PRD and spec generation, prototype validation, implementation, evaluations, A/B tests, early launches, and final production.

The pod is small enough to maintain shared context. It is large enough to preserve review, dissent, and operational coverage.

A hand-drawn builder pod diagram with outcome, experience, systems, platform, and evaluation roles organized around one product bet. — The builder pod is organized around the product bet, with depth spikes feeding shared context instead of separate queues.

The PRD becomes executable

In the AI-native workflow, the PRD is no longer only a document that explains what should be built. It becomes an executable contract between the team, the AI tools, the system, and the customer outcome.

A strong AI-native PRD or spec should include:

The user problem.
The job to be done.
The non-goals.
The product behavior.
The data and event requirements.
The design constraints.
The accessibility, privacy, and security constraints.
The acceptance criteria.
The evaluation plan.
The rollout plan.
The kill criteria.

A hand-drawn executable spec control panel surrounded by problem, constraints, acceptance, evaluation, and rollout gates. — A strong spec becomes the control surface that turns context into acceptance criteria, evaluation, and rollout decisions.

AI is highly sensitive to context. Vague requirements produce plausible software. Precise requirements produce software that can be tested, reviewed, and improved.

Spec-driven development is emerging for this reason. Thoughtworks describes it as a practice where well-crafted software requirements become prompts for AI coding agents, and in the more radical version, the spec becomes the source of truth while code becomes a generated byproduct.

Humans do not stop writing requirements. Requirements become more important. The spec becomes the steering wheel.

Scrum alone is not enough

Scrum was a major improvement over waterfall-era delivery. It pushed teams toward smaller increments, cross-functional collaboration, inspection, and adaptation. The official Scrum Guide already describes small, self-managing, cross-functional teams with no subteams or hierarchies. That remains directionally right.

The problem is not Scrum’s original intent. The problem is how many organizations operationalized it.

Backlogs became warehouses.
Story points became accounting systems.
Standups became status meetings.
Sprint planning became negotiation.
Sprint reviews became theater.
Retros became therapy for process debt that never changed.

AI-native development strains this model because the speed of work changes. A team can generate a prototype in hours. It can create multiple solution paths in parallel. It can test edge cases earlier. It can simulate implementation options before committing. It can ship behind flags sooner. It can learn from telemetry continuously.

A two-week sprint is not always the natural unit of learning anymore.

The better unit is the validated product bet.

The workflow evolves.

From ceremonies to context.
From tickets to specs.
From velocity to evidence.
From handoffs to shared ownership.
From sprint completion to production learning.

Agile values still matter. Individuals and interactions still matter. Working software still matters. Customer collaboration still matters. Responding to change still matters. AI-native development should be seen as a return to those values, not a rejection of them.

The new artifacts

If the old operating model was built around epics, stories, points, sprint rituals, and release trains, the new operating model needs different artifacts.

The builder brief. This is the one-page articulation of the product bet: the user, the problem, the outcome, the constraints, and the decision that needs to be made.
The context pack. This is the shared operating memory for humans and AI: relevant code paths, APIs, design tokens, data contracts, examples, previous decisions, analytics, guardrails, and system constraints.
The executable spec. This is the PRD that can drive generation, review, and evaluation. It includes acceptance criteria, non-goals, edge cases, observability requirements, and rollout logic.
The prototype record. This captures what was tried, what users saw, what the team learned, and what changed before production.
The evaluation matrix. This defines what must be true before the system can move forward: unit tests, integration tests, accessibility, performance, privacy, security, product behavior, and if the product uses AI directly, model quality and safety checks.
The launch ledger. This ties feature flags, A/B tests, exposure rules, metrics, incidents, decisions, and follow-up work into one record.

These artifacts are lighter than traditional process documents, but more operationally precise. They are the control plane for high-speed development.

What changes for product leaders

The PM role moves closer to the build. Product managers will not just write requirements and wait for implementation. They will use AI to explore concepts, generate prototypes, interrogate tradeoffs, write better specs, define evals, and inspect production behavior.

McKinsey describes this as part of an AI-enabled software product development life cycle, where PMs, engineers, and their teams spend more time on higher-value work and less on routine tasks. It also notes that AI blurs the boundaries between product management, design, and engineering by enabling product leaders to prototype and build proofs of concept with far less dependency on sequential handoffs.

This is not a demotion of product management. It is an expansion of product management.

The best product leaders will be fluent in customer needs, business strategy, system behavior, AI capability, experimentation, and operational risk.

They will not need to be the best coder in the room.

They will need to be one of the clearest thinkers in the room.

The near future

The near future of software development will not be defined by whether a company has access to AI coding tools. Access will commoditize.

The difference will be operating model. The best teams will be smaller. They will have fewer handoffs. They will rely more on high-quality context. They will treat evaluation as a first-class product discipline. They will ship earlier behind stronger controls. They will measure learning, not just output.

They will also be more accountable.

AI makes it easier to produce software but it does not automatically make it easier to produce good software. That is the central leadership challenge.

The future belongs to teams that can combine AI speed with human judgment, product taste, engineering discipline, and operational responsibility.

In that world, the most important question is not “How many engineers do we need?”

The better question is:

“What is the smallest team of builders that can own the outcome end to end?”

For many products, I think the answer will be 4 to 6.

Builder pod template

A practical internal template can be kept simple.

The pod mission should be one product outcome, not one function. For example: improve onboarding completion, reduce customer friction in a discovery flow, improve personalization relevance, increase self-service success, reduce operational toil, or launch a new AI-assisted workflow.

The core team should be 4 to 6 people with explicit depth spikes:

Outcome builder: customer problem, strategy, metrics, prioritization, experiment design.
Experience builder: UX, prototype quality, design system, accessibility, content.
System builder: full-stack architecture, APIs, data, frontend and backend implementation.
Platform builder: reliability, CI/CD, observability, deployment, security, cost.
Evaluation builder: tests, analytics, A/B testing, quality gates, model evals where relevant.
Domain builder: domain expertise, data interpretation, policy, compliance, research, or growth, depending on the product.

The pod should operate through a weekly learning loop rather than a sprint-completion loop:

Monday: define the bet, update the builder brief, agree on evals and rollout criteria.
Tuesday to Wednesday: swarm on prototype, spec, and implementation paths with AI assistance.
Thursday: run human review, product review, technical review, evals, instrumentation, and risk checks.
Friday: launch behind flags, review telemetry, decide whether to expand, revise, or kill.

That cadence can stretch or compress, but the sequence matters: problem, spec, prototype, build, evaluate, experiment, launch, learn.

A hand-drawn weekly learning loop moving from bet to prototype, review, launch, and telemetry around an evidence gauge. — The better unit is not sprint completion. It is a learning loop that turns bets into evidence.

The pod’s standing artifacts should be:

Builder brief.
Context pack.
Executable PRD/spec.
Prototype record.
Evaluation matrix.
Launch ledger.
Decision log.

The pod’s success metrics should include:

Time from user signal to prototype.
Time from approved spec to testable build.
Percentage of generated code meaningfully changed in review.
Eval pass rate before release.
Escaped defects.
Rollback rate.
Experiment velocity.
Learning velocity.
Product outcome movement.
AI cost per shipped outcome.
Human review load.

The leadership review should focus less on velocity and more on evidence. What did the pod learn? What changed in the product? What changed in user behavior? What got safer, faster, simpler, or more useful?

Operating principles

Use AI everywhere, but do not trust it everywhere. AI should be embedded across discovery, specification, design, coding, testing, documentation, and launch, but human judgment remains accountable for correctness, safety, strategy, and tradeoffs.
Make the spec stronger than the prompt. The quality of AI-assisted development depends on the quality of the context. A strong PRD, acceptance criteria, non-goals, constraints, and evaluation plan matter more than clever prompting.
Treat prototypes as thinking tools, not demos. AI makes it cheap to create prototypes, but the point is not to impress stakeholders. The point is to test assumptions, expose ambiguity, and learn earlier.
Move evaluation earlier than feels natural. Testing, instrumentation, accessibility, privacy, security, performance, and product-quality checks should be designed before or alongside implementation, not after the code appears complete.
Require human ownership for architecture, privacy, security, and launch decisions. AI can recommend, generate, and review, but accountability must remain with named humans who understand the system and the risk.
Measure production learning, not ticket completion. The most important question is not whether the team shipped the story. It is whether the team learned something meaningful from real users and improved the product outcome.
Keep the pod small enough that context stays shared. The advantage of a 4 to 6 person builder pod is not lower headcount. It is higher shared context, faster decisions, and fewer handoffs.
Keep enough diversity of expertise that review stays honest. AI-native teams need breadth, but not sameness. The best pods combine product judgment, design taste, engineering depth, operational discipline, data fluency, and dissent.
Design for reversibility. AI accelerates change, so the operating model must make change safer. Feature flags, staged rollouts, observability, rollback paths, and kill criteria should be part of the default workflow.
Do not scale the pod until the operating model works. Scaling a broken AI-native workflow only creates faster chaos. Prove the loop first: clear specs, useful prototypes, reliable evals, safe launches, measurable learning, and accountable ownership.

This writing reflects my personal perspectives on product management, AI, and content discovery. It does not represent the official position of my employer or any affiliated organization.