If AI Can Do Your Job, You Were Probably Measuring the Wrong Work

A lot of the AI conversation is still focused on whether frontier tools can produce the visible outputs of a job. Can AI write the PRD? Can it generate the design? Can it write the code? Can it build the spreadsheet? Can it summarize the meeting? The answer is increasingly yes, and I do not think we should minimize how important that is. These tools are getting very good at creating the artifacts that have historically made knowledge work visible.

But I think that framing misses the more important point. The artifact was never the whole job. It was the evidence that some deeper thinking had happened, or at least that deeper thinking was supposed to have happened. A PRD was supposed to clarify product judgment. A design was supposed to shape human understanding. Code was supposed to create a reliable system. A spreadsheet was supposed to improve a decision. A meeting was supposed to move work across teams. When we confuse those artifacts for the work itself, we end up with a shallow understanding of the roles around them.

That is why AI is such a useful forcing function. It is not only making some tasks faster. It is exposing which parts of the work were proxy work and which parts were the real work. The proxy work is the thing we can point to, review, count, forward, archive, or put into a status update. The real work is the judgment underneath it. As frontier tools absorb more proxy work, they force a much better question: what was this role really designed to address?

The real work is the judgment underneath it.

I am optimistic about that question, but not casually optimistic. I do not believe AI automatically makes organizations smarter. Used poorly, it can create a flood of polished artifacts that hide weak thinking. Used well, it can free people from some of the mechanical burden of producing those artifacts and push them closer to the part of the job that actually creates value. That is the shift I think every industry needs to take seriously.

A hand-drawn field-notes infographic showing visible artifacts like PRDs, pixels, code, spreadsheets, and meetings flowing into deeper judgment work. — Artifacts are evidence. Judgment is the work they are supposed to reveal.

The PRD was never product management.

Product management is one of the clearest examples because the PRD has become such a recognizable artifact. A strong PRD matters. It gives teams a common language, captures assumptions, defines requirements, clarifies scope, and creates something concrete enough for people to challenge. But the PRD itself is not the product work. The product work is choosing what matters, making tradeoffs explicit, and aligning a team around a path to value.

AI can help write a PRD. It can synthesize notes, propose user stories, sharpen acceptance criteria, generate edge cases, and improve the structure of a document. I think product managers should use it for those things. The mechanical parts of documentation are often slower than they need to be, and better tooling can create more time for higher quality thinking. But the tool cannot be accountable for the judgment behind the document.

That judgment is the hard part. It means understanding the customer problem deeply enough to know which pain is worth solving now. It means connecting that problem to business value, technical feasibility, organizational timing, and measurable outcomes. It means deciding what not to build. It means being honest about tradeoffs that different stakeholders may prefer to soften or avoid. A PRD can be beautifully written and still fail at all of that.

This is one of the risks AI introduces. It can make weak product thinking look more mature than it is. A vague idea can become a professional-looking document in seconds. A product strategy that has not been tested can be expressed in confident language. A set of requirements can look complete while still avoiding the core decision. The artifact gets better, but the thinking underneath it may not.

That is why the bar for product management should rise, not fall. If a tool can help produce the document, the product manager has more responsibility to make the judgment clear. The question is not whether someone can write a PRD anymore. The better question is whether the PRD expresses a real choice about value, tradeoffs, and the behavior we want to create in the product. AI can help us get to the document faster. It cannot decide what should matter.

The pixels were never design.

Design faces a similar challenge because its artifacts are highly visible. A screen looks like design. A prototype looks like design. A polished interface can create the impression that the design work is done. This is why AI-generated visuals have created so much discussion. When a tool can produce layouts, flows, components, and variations quickly, it becomes tempting to believe the discipline itself has been compressed into visual production.

But design is not just the arrangement of pixels. Design is how people understand, feel, decide, and act inside a system. It is how complexity becomes legible. It is how a product teaches someone what is possible, what matters, what is safe, what is reversible, and what to do next. The interface is the visible layer, but the real work is shaping human understanding.

AI can absolutely accelerate parts of design production. It can generate options, create prototypes, test patterns, produce first drafts, and help teams explore more directions in less time. That is useful. More variation at lower cost can be a real advantage when the team has the judgment to evaluate it. The problem is when visual fluency is mistaken for design quality.

A product can become more polished without becoming more understandable. A broken journey can receive better typography. A confusing flow can be rendered with cleaner components. A strategy that does not make sense can be wrapped in a beautiful prototype. That is not better design. It is surface quality masking unresolved product and human problems.

The designers who become more important in this environment are the ones who understand context, behavior, systems, and consequence. They know when a flow works technically but fails cognitively. They notice friction that other people have normalized. They understand when a choice creates confidence, confusion, trust, hesitation, or fatigue. They do not just ask whether the screen looks good. They ask whether the experience helps a person make sense of what is happening.

As AI makes visual output easier, taste and judgment become more valuable. Not taste as personal preference, but taste as disciplined sensitivity to context, clarity, hierarchy, and human consequence. The pixels matter because people experience products through them. But the pixels were never the whole job. The real design work is deciding what the experience should help people understand and how the system should make that understanding possible.

The code was never engineering.

Engineering is often reduced to code in the same way product management is reduced to PRDs and design is reduced to pixels. Code is the artifact that shows up in the repository. It is reviewed, tested, merged, deployed, and measured. Because it is so concrete, it is easy to mistake code production for the entirety of engineering work. AI coding tools make that mistake more visible because they can now produce useful code at increasing speed.

That matters. I do not think we should dismiss it. Frontier tools can scaffold services, refactor functions, generate tests, explain code paths, inspect dependencies, and help engineers move through implementation work faster. They can also help non-engineers understand technical systems with more precision. These are significant changes, and they will alter how teams build software.

But engineering is not just writing code. Engineering is designing systems that are reliable, adaptable, secure, observable, and clear enough for other people to build on. It is deciding where complexity should live and where it should not. It is understanding failure modes, latency, dependencies, data contracts, privacy boundaries, operational load, migration paths, and long-term maintenance cost. Good engineering is not only about making something work once. It is about making something work under pressure, over time, with other people depending on it.

AI can produce code that compiles and still introduce architectural debt. It can generate a solution that works for the immediate case while making the system harder to reason about later. It can add a dependency without understanding the organizational cost of owning it. It can write tests that validate the happy path while missing the failure mode that matters most. It can make implementation faster without necessarily making the system healthier.

This is why engineering judgment becomes more important as code generation improves. The best engineers will not be the ones who refuse these tools. They will be the ones who use them aggressively while staying accountable for the system. They will use AI to explore options, accelerate scaffolding, review unfamiliar code, and reduce repetitive work. But they will still own the architecture, the tradeoffs, and the long-term integrity of what gets shipped.

In that world, code remains essential, but it becomes easier to see what code was supposed to serve. It was supposed to create system capacity. It was supposed to make a product behavior real in a way that could be operated, extended, and trusted. AI can help produce the artifact. It cannot replace the engineering responsibility to make the system coherent.

The spreadsheet was never analysis.

Analysis has its own artifact problem. Queries, spreadsheets, dashboards, charts, and readouts are the visible outputs of analytical work. They are important because they allow evidence to move through an organization. Without them, teams operate on opinion, anecdote, and hierarchy. But analysis is not the spreadsheet. Analysis is the work of distilling signal from numbers and turning evidence into better decisions.

AI is going to make many analytical artifacts easier to produce. It can draft SQL, explain metric definitions, generate cuts of data, summarize trends, identify anomalies, and create first-pass visualizations. It can help more people ask questions of data without waiting for every request to move through a small number of specialists. That accessibility can be valuable, especially in organizations where data bottlenecks slow down learning.

But easier access to numbers does not guarantee better judgment. A bad question can now get answered faster. A misleading metric can now get a cleaner chart. A shallow conclusion can now be presented in more confident language. A correlation can be made to sound like a causal insight if the person using the tool does not understand the difference. The danger is not that AI creates analysis. The danger is that it creates analysis-shaped artifacts without analytical discipline.

The hard part of analysis is knowing whether the evidence should change what we believe. That requires understanding the metric, the population, the instrumentation, the time window, the segment, the missing data, and the incentives created by the measurement itself. It requires knowing when an average is hiding an important failure. It requires knowing when the result is durable, when it is actionable, and when acting on it could create a new problem somewhere else.

The strongest analysts will not define their value only by their ability to write queries or produce dashboards. They will define their value by improving decision quality. They will ask whether the question is framed correctly, whether the data can support the conclusion, whether the metric reflects the behavior we care about, and whether the organization is interpreting the result honestly. AI can help generate the evidence artifact. It cannot take responsibility for the decision discipline around that evidence.

This distinction matters because many organizations already struggle with measurement theater. They create dashboards that are reviewed but not used. They circulate readouts that summarize movement without clarifying implication. They optimize metrics without revisiting whether the metric still represents the desired outcome. AI will not fix that by default. It may even make it worse unless teams hold themselves to a higher standard.

The meeting was never program management.

Program management is often misunderstood because its artifacts can look administrative. The calendar invite, agenda, tracker, status update, dependency log, and meeting summary are all visible. They create structure around work, and they are often necessary. But they are not the full discipline. Program management is not the meeting. It is the work of connecting teams, surfacing dependencies, and keeping the end-to-end system moving.

AI is already useful here. It can summarize meetings, extract action items, draft updates, identify open questions, and reduce the manual burden of coordination. That is meaningful because organizations spend a lot of time on coordination overhead. Cleaner notes and faster status updates can help teams spend less energy reconstructing what happened and more energy deciding what needs to happen next.

But a summary is not the same thing as progress. A tool can capture what was said without knowing whether the right people were in the room. It can list action items without knowing whether they are owned by the right team. It can record a decision without knowing whether the decision was actually made or simply implied. It can make a meeting look productive even when the core dependency remains unresolved.

The real program management work is often about seeing the system between the teams. It means noticing when two groups believe they are aligned but are using the same words differently. It means recognizing when a date has become fictional. It means identifying when a risk is being softened because nobody wants to escalate it. It means understanding where work is stuck, where ownership is unclear, and where a conversation needs to happen before the status turns red.

The best program managers are not meeting administrators. They are operators of complex human systems. They understand how work moves through an organization, where ambiguity collects, and where handoffs tend to fail. AI can make the coordination artifacts easier to produce, but it cannot replace the judgment required to keep a cross-functional system moving toward an outcome.

This is especially important as organizations become more distributed, more technical, and more dependent on cross-functional execution. The work does not fail only because someone forgot to write a note. It fails because ownership was unclear, incentives were misaligned, dependencies were hidden, or decisions were delayed until the cost of delay became unavoidable. AI can document the system. Program leadership still has to move it.

The artifact is becoming cheaper, so the standard has to rise.

Across all of these roles, the same pattern keeps showing up. Frontier tools make artifacts cheaper to produce. PRDs, pixels, code, SQL, spreadsheets, summaries, status updates, and plans will all become easier to generate. Some of them will become dramatically easier. That does not make the underlying work less important. It makes the underlying work more exposed.

When artifacts were expensive, effort could hide inside them. A long document felt like serious product thinking. A polished prototype felt like a resolved experience. A large code change felt like engineering progress. A detailed spreadsheet felt like analytical rigor. A full calendar felt like coordination. But effort and value were never the same thing. AI is making that distinction harder to ignore.

The question now is not simply whether the artifact exists. The question is whether it clarified the work. Did the PRD make the tradeoff sharper? Did the design make the experience more understandable? Did the code improve the system? Did the analysis make the decision better? Did the meeting resolve a dependency or simply produce a better record of ambiguity? These are harder questions, but they are also more honest questions.

This is where I think AI should make us more demanding. We should not accept lower-quality thinking because the output looks polished. We should not confuse speed with clarity. We should not treat generated work as inherently strategic because it is well structured. The more fluent the tools become, the more disciplined we need to be about judging the work underneath.

The new risk is not only replacement. The more immediate risk is false confidence. AI can make unfinished thinking look finished. It can make shallow work look sophisticated. It can make a team feel further along than it actually is. That is why responsible use of AI is not just about adoption. It is about standards.

AI can make unfinished thinking look finished.

A hand-drawn review gate turning AI-generated artifacts into trustworthy decisions, systems, and experiences with checks for clarity, truth, consequence, and fit. — When artifacts get cheaper, the review standard has to rise.

The work underneath the work.

I do not believe the future of knowledge work belongs to people who simply produce the most artifacts the fastest. I believe it belongs to people who understand what those artifacts were supposed to accomplish. The artifact is the interface between thinking and execution, but it is not a substitute for judgment. When the artifact becomes easier to produce, the judgment behind it becomes more important.

For product managers, that means getting better at choosing what matters and aligning teams around value. For designers, it means getting better at shaping how people understand, feel, decide, and act. For engineers, it means getting better at designing systems that are reliable and adaptable over time. For analysts, it means getting better at turning evidence into better decisions. For program managers, it means getting better at keeping complex systems of people and work moving toward real outcomes.

This is the job beneath the job. It is the part that was always there, even when organizations over-focused on the artifact. AI did not create that distinction, but it is making the distinction impossible to avoid. If someone’s role has been reduced entirely to producing proxy work, AI will feel like a threat because AI is increasingly good at proxy work. But if the role is anchored in judgment, accountability, context, and outcome quality, then AI becomes leverage.

That leverage is not automatic. It requires people to know what they are trying to accomplish before they ask the tool to help. It requires teams to verify, critique, and improve what the tool produces. It requires leaders to reward clarity over volume and judgment over performance theater. It requires organizations to ask better questions about what work is for.

I think this is the more useful AI conversation. Not just which tasks can be automated, but what those tasks were standing in for. Not just how fast a tool can produce an artifact, but whether that artifact helps people make better decisions, build better systems, and create better experiences. Not just whether AI can do part of a job, but whether we understood the job correctly in the first place.

The future will not be defined by the disappearance of work. It will be defined by the disappearance of excuses for shallow work. As the proxy work gets easier, the real work becomes harder to avoid. That is not a reason to be afraid of frontier tools. It is a reason to raise the standard for how we use them.

The future will not be defined by the disappearance of work. It will be defined by the disappearance of excuses for shallow work.

This writing reflects my personal perspectives on product management, AI, and content discovery. It does not represent the official position of my employer or any affiliated organization.