The Frontier Recommendations PM

How AI coding agents will change product management for recommendations, search, and AI discovery.

I spend a lot of my time thinking about how people find what they love inside large digital systems. That work often shows up as a row of recommendations, a search result, a personalized home page, a discovery feed, or an AI-assisted answer. But the real product is underneath the surface. It is retrieval, ranking, query understanding, embeddings, metadata, feedback loops, experimentation, model behavior, latency, constraints, and judgment.

I would like for product managers to expand their understanding of frontier tools like OpenAI Codex, Anthropic Claude Code, and other AI coding agents. Don’t look at these tools as front-end accelerators. Many people imagine a tool that can create a prototype page, refactor a component, generate a simple UI, or write boilerplate code. That is useful, but it is not the most important opportunity.

Frontier tools can make product managers better at algorithmic product work. They can help us understand complex systems faster, write stronger algorithmic specifications, explore ranking and retrieval tradeoffs, prototype hypotheses, evaluate failure modes, and partner with engineering and science teams at a much higher level.

I do not believe every PM needs to become a production engineer. I do believe that every PM working in recommendations, search, personalization, or AI discovery needs to become more fluent in how algorithmic systems behave. The PM who can connect user intent to system behavior is going to have a material advantage. Frontier tools make that connection easier to build, easier to test, and easier to improve.

The PM who can connect user intent to system behavior is going to have a material advantage.

The misconception: these tools are not only for front-end work

The most common mistake I see is treating AI coding agents as code generators rather than system reasoning partners. If the only thing we ask these tools to do is generate a web page, then of course they will look like front-end tools. But that is a narrow use case, not the full capability.

The real value of a tool like Codex is not that it can write code in isolation. The real value is that it can read a codebase, follow a data path, explain a service boundary, generate tests, produce small prototypes, summarize a technical change, and help a product person reason through the consequences of an implementation. That is directly relevant to backend systems. It is directly relevant to ranking systems. It is directly relevant to search, recommendations, personalization, and AI-enhanced discovery.

Algorithmic products are not built only in model files. They are built across pipelines, services, feature stores, logs, experiments, caches, content metadata, business rules, quality constraints, and evaluation systems. A PM who does not understand those layers is operating with an incomplete map. Frontier tools help build that map faster.

We should stop asking whether these tools are useful for front-end work or backend work. The better question is whether they help us understand and improve the systems that shape user outcomes. For algorithmic PMs, the answer is yes.

Algorithmic product management is different

Product management for algorithmic systems is different from product management for conventional features. In a conventional feature, the PM can often specify the user experience directly. A button does something. A page shows something. A workflow has a beginning and an end. The product behavior is largely deterministic.

In recommendations and search, the product behavior is mediated by systems that choose among many possible outcomes. The PM is not only deciding what the experience should look like. The PM is deciding what the system should optimize for, what signals it should consider, what tradeoffs are acceptable, what failure modes matter, and how success should be measured.

A recommendation system usually has multiple stages. It may generate candidates from a large corpus, retrieve similar or relevant items, score those candidates, apply business or policy constraints, re-rank for diversity or freshness, and log outcomes for future learning. Search has its own set of layers: query understanding, lexical retrieval, semantic retrieval, entity recognition, ranking, blending, filtering, and evaluation. AI discovery adds another layer: natural language understanding, conversational context, generated explanations, content grounding, safety constraints, and trust.

A PM in this environment cannot be satisfied with a vague requirement like “make recommendations better” or “improve search quality.” Those are intentions, not specifications. The PM needs to know whether the problem is retrieval coverage, ranking quality, metadata sparsity, objective-function mismatch, cold start, popularity bias, session context, exploration, freshness, or evaluation. The same user complaint can have many different technical causes.

This is where frontier tools can help PMs move from surface language to system language. They help translate a user problem into hypotheses about retrieval, ranking, data, metrics, and model behavior. That translation is one of the most important skills in modern product leadership.

A hand-drawn field-notes system map showing user intent flowing through retrieval, ranking, constraints, evaluation, and a learning loop. — The frontier recommendations PM translates user intent into system behavior.

The PM role is shifting from requirement writer to algorithm designer

I am careful with the phrase algorithm designer. I do not mean that PMs should bypass engineers or scientists and write production ranking code independently. That would be the wrong lesson. I mean that PMs should be able to shape the design of algorithmic behavior with enough technical precision that engineering and science teams can build the right thing.

In algorithmic products, product intent becomes real only when it is encoded into systems. If we say we want a discovery experience to feel more relevant, the system still needs to know what relevance means. Is it historical affinity? Current session intent? Similarity to recently consumed items? Long-term satisfaction? Novelty? Completion likelihood? Return frequency? Explicit feedback? A blend of these signals? Each answer produces a different product.

The PM’s job is to make those choices explicit. We need to define the user problem, the target behavior, the signals we believe matter, the constraints we must respect, the tradeoffs we are willing to make, and the way we will evaluate whether the system improved. Frontier tools can help us do that work with more rigor.

A strong algorithmic PM should be able to sit with a tool like Codex and ask: where are candidates generated, where is the ranking score assembled, what features are used, where are experiment flags applied, where is the result slate re-ranked, what gets logged, what metrics are downstream, and where could this change produce unintended effects? This empowers PMs to engage engineering judgment intelligently.

Use frontier tools to understand the system before you specify the work

The first habit I want PMs to build is using frontier tools as system cartographers. Before writing a PRD, before proposing an experiment, before asking engineering for an estimate, the PM should use the tool to understand how the current system actually works.

For a recommendations system, that means mapping the path from a user request to the final ranked slate. The PM should understand the candidate sources, the retrieval logic, the scoring model, the feature inputs, the re-ranking rules, the filtering logic, the logging path, and the experiment framework. For search, the PM should understand how the query is parsed, which retrieval paths are used, how exact matches are blended with semantic results, how titles and entities are handled, and how relevance is measured. For AI discovery, the PM should understand how user intent is interpreted, how content is grounded, how generated responses are constrained, and how the system avoids confident but unsupported answers.

A frontier tool can accelerate this learning loop. The PM can ask it to identify the files or services involved in a ranking flow, explain the sequence of calls, summarize the purpose of each module, trace where a score is produced, and identify where a proposed change would need to be implemented. The PM should verify the output, of course. The tool can be wrong. But even an imperfect first map is valuable because it tells the PM what to inspect, what to ask, and where the unknowns are.

This is especially powerful for PMs joining an established system. Mature recommendation and search systems often carry years of accumulated decisions. There are old heuristics, active models, retired models, special cases, configuration layers, hidden dependencies, and experimental branches. A PM who relies only on architecture diagrams will miss a lot. A PM who uses an AI coding agent to interrogate the real implementation can become effective much faster.

Use frontier tools to write better algorithmic PRDs

The second habit is using these tools to improve product specifications. A normal PRD is not enough for algorithmic work. A strong algorithmic PRD should connect the user problem to the system behavior we expect to change.

When I review this kind of PRD, I want to see more than a goal statement. I want to see the hypothesis. I want to know which part of the system we believe is underperforming. I want to know whether the work affects candidate generation, retrieval, ranking, re-ranking, query understanding, metadata, content embeddings, feedback collection, or evaluation. I want to know which signals are expected to matter and which signals should not dominate. I want to know the target users, the edge cases, the guardrails, the data dependencies, and the launch risks.

A frontier tool can act as a spec reviewer before the document ever reaches engineering. The PM can ask it to find ambiguity in the requirement, identify missing instrumentation, suggest offline and online metrics, list likely failure modes, challenge the proposed objective function, and generate test cases. The PM can ask what would go wrong if the system optimized the stated metric too literally. That question is often where the best product thinking begins.

A hand-drawn algorithmic PRD workbench with sections for problem, signals, tradeoffs, metrics, failure modes, and guardrails. — Algorithmic PRDs should make the product hypothesis, system signals, metrics, guardrails, and failure modes explicit.

For example, if the PRD says we want to increase engagement from personalized recommendations, the agent can help pressure-test whether the metric might over-reward familiar content, reduce exploration, concentrate exposure around already popular items, or degrade long-term satisfaction. If the PRD says we want to improve search relevance, the agent can help separate query understanding problems from retrieval problems, ranking problems, and metadata problems. If the PRD says we want AI discovery to answer natural language questions about content, the agent can help identify grounding, hallucination, latency, and evaluation risks.

The point is not that the tool writes the PRD for us. The point is that it helps us make the PRD more precise. Precision is what makes algorithmic teams faster. Ambiguity does the opposite.

Precision is what makes algorithmic teams faster. Ambiguity does the opposite.

Use frontier tools to prototype product hypotheses

The third habit is prototyping. This is where many PMs underestimate themselves. A PM does not need to write production code to test a product hypothesis. There is enormous value in lightweight prototypes that clarify thinking before a full engineering investment is made.

A PM can use a frontier tool to create a notebook that simulates a scoring formula on sample data. The PM can generate SQL to inspect coverage across content types, user cohorts, or query classes. The PM can create a small evaluation harness that compares two ranking approaches on a labeled sample. The PM can build a synthetic example showing how a diversity constraint changes a recommendation slate. The PM can prototype a query taxonomy, a search-failure classifier, a metadata enrichment workflow, or a side-by-side review interface for qualitative evaluation.

These artifacts are not production systems. They are thinking tools. They help the PM arrive at engineering discussions with a sharper hypothesis. Instead of saying, “I think discovery feels repetitive,” the PM can say, “I sampled low-satisfaction sessions, grouped the repeated recommendations by candidate source, and the issue appears to be coming from retrieval concentration rather than final-stage ranking.” That is a different level of conversation.

This is how frontier tools make PMs better at algorithm work. They allow PMs to move beyond opinion and into structured exploration. They let us test whether a product idea has technical plausibility, whether the data exists, whether the metric can be computed, whether a proposed constraint behaves as expected, and whether an intuition survives contact with examples.

The best PMs will not use these tools to pretend to be engineers. They will use them to reduce the number of vague conversations the team has to tolerate.

Use frontier tools to improve evaluation and experimentation

The fourth habit is using frontier tools to become much more disciplined about evaluation. Algorithmic products live or die by the quality of their evaluation systems. A weak metric can make a bad product look successful. A narrow metric can produce hidden harm. A global average can hide a segment-level failure. A short-term engagement metric can miss long-term dissatisfaction.

This is particularly important in recommendations and discovery. If we only measure clicks or starts, we may reward shallow relevance. If we only measure completion, we may over-serve familiar content. If we only measure total engagement, we may miss fatigue, repetition, narrowing, or lack of novelty. If we only measure search success at the aggregate level, we may miss failures for ambiguous queries, niche interests, multilingual queries, new titles, or sparse metadata.

A frontier tool can help a PM design a stronger evaluation plan. The PM can ask it to propose offline metrics, online experiment metrics, guardrail metrics, quality review dimensions, segment cuts, and failure-mode taxonomies. The PM can ask it to identify where the evaluation plan is vulnerable to gaming. The PM can ask it to generate synthetic edge cases and test scenarios. The PM can ask it to distinguish between leading indicators and durable product outcomes.

For search, this might include exact-match success, semantic relevance, zero-result rates, reformulation rates, result abandonment, latency, title-level coverage, and query-class performance. For recommendations, it might include engagement, satisfaction, diversity, novelty, repetition, freshness, catalog coverage, cold-start performance, and long-term retention. For AI discovery, it might include answer groundedness, helpfulness, deflection quality, user correction rate, policy compliance, and whether the system actually helps users decide what to watch, read, buy, play, or do next.

The PM should not outsource evaluation judgment to the tool. The PM should use the tool to expand the set of considerations and expose blind spots. The final evaluation plan still requires product judgment, data science judgment, and engineering judgment. But the starting point becomes much stronger.

Use frontier tools to review implementation risk

The fifth habit is using these tools to review implementation risk. A PM does not need to approve every line of code. But a PM should understand whether the implementation matches the product intent, whether the necessary metrics are logged, whether the experiment is correctly gated, whether the affected users are the intended users, and whether there is a clear rollback path.

In algorithmic systems, small implementation choices can change product behavior materially. A feature may be computed with the wrong lookback window. A fallback may dominate more often than expected. A diversity rule may apply after filtering instead of before filtering. A model score may be blended with a heuristic in a way that changes the intended tradeoff. A logging change may make an experiment difficult to interpret. A cache may hide the effect of a launch. A guardrail may protect the average user but fail for a specific segment.

A frontier tool can help the PM ask better review questions. It can summarize a pull request in product language. It can identify where the code appears to implement the requirement. It can compare a design doc to the implementation. It can look for missing instrumentation or tests. It can explain what services, metrics, or data contracts may be affected. It can help prepare the PM for a technical review without pretending the PM is the final authority on code correctness.

This is a practical way to raise the quality of product leadership. The PM becomes less dependent on translation. The PM can see more of the system. The PM can bring better questions to engineers. The team still relies on engineering excellence, but the PM is no longer operating from a distance.

Use frontier tools to design better algorithms, not just better documents

The most important shift is that frontier tools can help PMs design better algorithmic behavior. This is where the tools become strategic, not just operational.

A PM can use an agent to reason through objective functions. What happens if we optimize for immediate engagement? What happens if we blend engagement with satisfaction? How should we think about novelty when users also want familiarity? How should we treat freshness in a catalog where newness matters for some users and not for others? When should diversity be a hard constraint, a soft feature, or a re-ranking rule? What should exploration look like for new users, inactive users, or users with very narrow histories?

A PM can use an agent to reason through retrieval design. Should this experience rely on collaborative signals, content-based similarity, semantic embeddings, editorial pools, contextual signals, or a hybrid strategy? Where does lexical retrieval still matter? Where does vector retrieval help? Where do embeddings fail because metadata is weak, user intent is ambiguous, or content similarity does not match human taste?

A PM can use an agent to reason through feedback loops. Which behaviors should update a user profile? Which behaviors should be ignored or downweighted? How do we distinguish accidental clicks from meaningful interest? How quickly should the system adapt to session intent? How do we avoid overfitting to a short burst of activity? How do we let users escape a recommendation loop?

A PM can use an agent to reason through AI discovery. When should an LLM explain a recommendation? When should it ask a clarifying question? When should it retrieve structured catalog data instead of generating a free-form response? How do we evaluate whether an AI answer actually improved discovery rather than simply sounding helpful?

These are not abstract technical questions. They are product questions. They shape what users experience. The PM who can use frontier tools to explore these questions in detail will make better decisions.

The roadmap for PMs

The roadmap I recommend is simple, but it requires consistency. The first stage is fluency. Use frontier tools to learn the architecture of the systems you own. Do not stop at the product surface. Understand candidate generation, retrieval, ranking, re-ranking, metadata, logging, experimentation, and the metrics that define success. Use the tool to create a map, then validate that map with engineers, scientists, documentation, dashboards, and code.

The second stage is specification. Use the tool to turn product intent into algorithmic requirements. Every major recommendation, search, or AI discovery PRD should explain the user problem, the system hypothesis, the target behavior, the relevant signals, the data dependencies, the expected tradeoffs, the evaluation plan, the guardrails, and the failure modes. If the PRD cannot explain how the system should change, it is not ready.

The third stage is exploration. Use the tool to prototype hypotheses with lightweight artifacts. Write SQL. Build notebooks. Generate synthetic examples. Create test cases. Compare ranking ideas on small samples. Analyze failure modes. The goal is not to replace engineering work. The goal is to make the PM’s hypothesis sharper before the team invests deeply.

The fourth stage is evaluation. Use the tool to challenge the measurement plan. Ask what the metric rewards, what it ignores, what segments may be harmed, what failure could be hidden by averages, and what guardrails are needed. For algorithmic products, evaluation is product strategy. Treat it that way.

The fifth stage is review. Use the tool to understand design docs, pull requests, experiment configurations, logging changes, and rollout plans. Ask whether the implementation matches the product intent. Ask whether the experiment will be interpretable. Ask whether the team has a rollback plan. Ask whether the right dashboards and diagnostics exist before launch.

The final stage is compounding. Turn the best workflows into reusable team assets. A team should have standard agent instructions for algorithmic PRD review, search-quality analysis, recommendation experiment design, metric critique, implementation-risk review, and launch-readiness assessment. Over time, these become part of the team’s operating system. The PM team gets better not only because individuals use tools, but because the organization learns how to use them repeatedly and responsibly.

A hand-drawn practice loop for frontier product managers with fluency, specification, exploration, evaluation, review, and compounding around user intent and system behavior. — Fluency compounds when agent workflows become shared team operating habits.

What I expect from technical PMs now

My expectation for technical PMs is changing. I no longer think it is enough to be a strong communicator, a good prioritizer, and a capable stakeholder manager. Those skills still matter, but they are table stakes in algorithmic product areas. The stronger PM is the one who can reason about the machinery that produces the experience.

A PM working on recommendations should be able to explain the difference between candidate generation and ranking. A PM working on search should be able to distinguish query understanding problems from retrieval problems and ranking problems. A PM working on AI discovery should understand why grounding, evaluation, and trust are not optional details. A PM working on personalization should understand how feedback loops can improve relevance but also create repetition, overfitting, or narrowness.

Frontier tools make this level of fluency more attainable. They give PMs a way to ask questions that previously required a long ramp-up. They help PMs translate between product language and system language. They help PMs inspect examples, generate hypotheses, and see the implications of choices that might otherwise stay hidden.

This does not lower the bar for PMs. It raises it. If a tool can help you understand the system, then not understanding the system becomes less acceptable. If a tool can help you pressure-test a metric, then shipping a weak evaluation plan becomes less acceptable. If a tool can help you identify failure modes, then ignoring them becomes less acceptable. The availability of better tools increases the responsibility to do better product work.

The availability of better tools increases the responsibility to do better product work.

The boundaries matter

I want PMs to be ambitious with these tools, but I also want them to be disciplined. A frontier tool is not an oracle. It can hallucinate. It can misunderstand a codebase. It can produce plausible but wrong explanations. It can generate prototypes that look convincing but fail under real data. It can miss security, privacy, reliability, or performance issues. The PM remains accountable for judgment.

There are also governance boundaries. PMs should use approved tools, respect data-handling rules, avoid exposing sensitive information, and involve engineering and data science when work approaches production systems. Generated code should be reviewed. Generated analysis should be checked. Generated recommendations should be treated as hypotheses, not truth.

This discipline is not a reason to avoid the tools. It is the condition for using them well. The best PMs will learn when to trust, when to verify, when to escalate, and when to stop. That judgment will become part of the craft.

The frontier recommendations PM

The future of product management in personalization, search, recommendations, and AI discovery will not be defined by PMs who simply write more documents. It will be defined by PMs who understand how product intent becomes system behavior.

That is what I mean by the frontier recommendations PM. The frontier recommendations PM is not trying to replace engineers or scientists. The frontier recommendations PM is trying to close the gap between user needs and algorithmic implementation. The frontier recommendations PM can move from a customer complaint to a ranking hypothesis, from a product goal to an evaluation plan, from a vague idea to a small prototype, from a technical diff to a product-risk question, and from an experiment result to a better understanding of the system.

For teams building discovery products, this is a major shift. The product surface is only the visible layer. The real work happens in the systems that decide what to retrieve, what to rank, what to explain, what to suppress, what to measure, and what to learn. PMs who stay at the surface will be limited. PMs who learn to work with the system will shape better products.

Frontier tools are not important because they make PMs look more technical. They are important because they make better product thinking possible. They help us ask better questions, write better specifications, design better experiments, and build better algorithmic experiences. Used poorly, they create noise and false confidence. Used well, they make us more precise, more curious, and more effective.

That is the standard I want for PMs working in this space. Do not use frontier tools only to move faster. Use them to think better. Use them to understand the system. Use them to design better algorithms. Use them to become the kind of product leader that modern discovery systems require.

Do not use frontier tools only to move faster. Use them to think better.

This writing reflects my personal perspectives on product management, AI, and content discovery. It does not represent the official position of my employer or any affiliated organization.