Using AI to Make Promotion Reviews Fairer
I see promotion and talent review cycles as the highest-leverage work product a leader produces. Yet, they are also the easiest to under-systematize, leaving careers vulnerable to the "uneven packet" problem.
The raw material usually arrives in uneven packets: different managers, writing styles, levels of detail, evidence quality, stakeholder feedback, and assumed context. Some packets are crisp and quantitative. Others are sprawling narratives. Some bury the most important evidence on page five. And because these reviews often happen in compressed windows, the people making decisions are forced to move quickly through material that deserves careful attention.
That creates risk because high-stakes people decisions require consistency, and consistency is hard when every packet asks the reviewer to reconstruct the case from scratch.
This is where I believe frontier AI should step in. Not to make the decision, that remains my absolute accountability, but to make the preparation more structured, the evidence easier to inspect, and the uncertainty impossible to ignore.
In a promotion process, an employee should not be advantaged because their manager wrote a more polished packet. And they should not be disadvantaged because their manager did not communicate the case as clearly as someone else did.
The review process should be about the strength of the evidence, not the storytelling quality of the packet.
I built an experimental workflow to address this. The goal was to turn raw review material into evidence-backed briefs with a rigorous verification layer to ensure every claim traces back to source material. The output was a better review surface.
The workflow
The process started with a folder of review packets in mixed formats. Each packet represented one candidate and included some combination of role history, manager narrative, scope changes, accomplishments, stakeholder feedback, growth areas, and recommendation rationale.
From that input, the AI workflow produced:
- A standardized one-page brief for each candidate
- A confidence audit against the original source packets
- A priority ranking to support discussion
- An interactive review dashboard for the live meeting
- A notes and decision tracker
- Final shareable artifacts after corrections

Every candidate was reviewed through the same structure, even when the source packets were very different. The workflow was not trying to optimize for the most persuasive packet. It was trying to create a fairer comparison surface.
The AI was not asked to decide who had the strongest case. It was asked to normalize the inputs so the humans in the room could review the cases more consistently.

How each packet was summarized
For each candidate, the AI generated a structured review brief with the same sections.
The first section was basic employee information: current role, proposed role, time in level, manager or sponsor, performance context where available, and promotion history where the source packet supported it.
The second section was a short package snapshot, constrained to two or three sentences designed to answer: what is the case, at a glance?
Then came key accomplishments. The model extracted the strongest examples from the packet and grouped them into a small set of evidence-backed themes. A synthetic example might be:
- Platform modernization across a legacy workflow
- Launch leadership under ambiguous timelines
- Cross-functional operating model improvements
- Mentorship or team-scaling contributions
- Business impact from a newly standardized process
The next section was a scope change table. This was one of the most useful pieces because promotion discussions are often really scope discussions. What changed from the current level to the proposed level? Is the person already operating at that level, or is the packet mostly describing future potential?
After that came stakeholder support. The brief captured who provided feedback, whether they supported the recommendation, and the most specific quote or evidence available. The AI was told to not compress rich feedback into generic phrases like "strong support" if the source contains a more specific reason.
The brief then moved into growth opportunities. Even strong cases should show what the person needs to keep developing.
Then came the section that turned out to be the most valuable: reasons to challenge the recommendation. This section forces the model to act as a "pressure tester." It pushes on the case: Is the scope expansion demonstrated or just aspirational? Are stakeholder quotes specific or generic? This changes the AI from a cheerleader into a partner that prepares me for the hard questions a panel will ask.
Each brief ended with four practical decision-support sections:
- Key questions for the panel
- Risk if not approved
- Impact if the person leaves
- Replacement difficulty
Those sections helped move the discussion from "is this person good?" to "what decision are we making, what evidence supports it, and what questions still need a human answer?"
Ensuring 100% Accuracy
The most important part of the AI workflow was the verification layer.
After the briefs were generated, I ran a confidence audit against the original packets. The instructions to the AI were explicit: do not verify the generated summary against itself. Go back to the source material.
For every data point, the audit classified the claim as one of four categories:
- Verified: the claim matches the source material
- Cross-referenced: the claim is not in this candidate's packet, but is supported by another available source
- Inferred: the claim is plausible, but not actually present in the source
- Error: the claim conflicts with the source

That classification created a much clearer review surface. A sentence that reads confidently in a summary can have very different meanings depending on whether it is verified, inferred, or wrong.
The audit was designed to catch specific failure modes that show up when using AI on personnel material:
- Invented profile links or external references
- Guessed stakeholder titles
- Generic paraphrases replacing specific evidence
- Manager summaries presented as direct stakeholder feedback
- Scope details bleeding from one candidate into another
Asking a model to "double-check" its own work is not enough. If the model only reviews the summary it already wrote, it can confirm its own mistakes.
The verification pass has to force source-grounding.
The rule became: if the source does not support it, mark it unknown. Do not fill the gap with a plausible guess. Once the initial audit surfaced these structural omissions, I filled the identified information voids and initiated a fresh source-grounding pass to maintain rigorous evidence standards.
Using an LLM as an evidence judge, not a talent judge
This is where the "LLM as judge" pattern became useful.
In this workflow, one pass generated the structured brief. A separate judging pass evaluated whether the generated claims were actually supported by the source material.
In people decisions, AI should not be the decision-maker. It should not own the recommendation, weigh the tradeoffs, or replace the accountability of the leaders in the room.
But AI can be useful as an evidence auditor, especially when the task is repetitive, detail-heavy, and vulnerable to inconsistency.
The judging layer was not asked to decide who should be promoted. It answered a narrower and more auditable question: can this claim be backed up?
The judging layer also made uncertainty visible. Instead of burying uncertainty inside polished prose, the workflow surfaced it. Missing title? Mark it as missing. Feedback behind a link? Note that limitation. Scope table synthesized from narrative instead of explicitly stated? Flag it.
That made the final human discussion better because I could distinguish between evidence, interpretation, and open questions.
The dashboard

The final step was turning the briefs into an interactive dashboard for the live review meeting.
The dashboard had a candidate sidebar for quick navigation, a summary table, filters for which sections to show, an individual candidate view, side-by-side comparison for two or three candidates, a questions-only view, approve/defer/deny tracking, and a notes field for capturing discussion.
Instead of jumping between long documents, I could compare candidates through the same structure. If the conversation moved to risk, every candidate's risk section was available. If the group needed to focus only on open questions, there was a questions view. If a decision changed during discussion, the tracker updated the tally.
The dashboard did not replace judgment. It reduced the amount of cognitive overhead required to exercise judgment well.
What changed
The bigger benefit was consistency. Every candidate got the same sections. Every case got a challenge section. Every summary went through a source-grounding audit. Every uncertainty had somewhere to go. The workflow made it harder for a polished packet to hide weak evidence, and harder for a messy packet to obscure a strong case.
Fairness in talent processes isn’t just about good intent; it’s about process design. By using AI to standardize the evidence, we make it harder for a messy packet to obscure a strong case or for a polished narrative to hide weak evidence.
Frontier AI tools are well-suited to this kind of work because they can absorb messy inputs, produce structured outputs, and then help critique those outputs. But they only become useful when the workflow is designed with controls:
- Standardize the summary format
- Require source-grounded claims
- Add an explicit verification pass
- Use an LLM as a judge for evidence quality
- Preserve human accountability for the decision
I don’t want AI ever making promotion decisions. I want AI helping leaders prepare so rigorously that our human judgment is as fair and consistent as possible.
For product leaders, this is the near-term opportunity with frontier tools:
- Automate the preparation, not the accountability.
- Standardize the evidence, not the outcome.
- Make the process faster, but also make it more careful.
The best use of AI in leadership work is creating the conditions for better, more consistent human judgment.
Author’s note: I used this workflow as part of my own preparation for participating in a review panel. It was not used as part of the formal promotion review process, and it was not used to make or determine any promotion decision.
For a tool or workflow like this to become part of an official talent process, it would need the appropriate level of due diligence, governance, privacy review, HR review, legal review, and organizational alignment. That was not the case here. This was a personal preparation exercise to help me engage with the material more consistently and thoughtfully.
My hope is that, in the future, rigorously reviewed and approved tools like this can be made available more broadly to leaders and employees, so that important people processes can become more consistent, evidence-based, and fair.