When AI Meets Incentive Comp: A Curious Test Drive

Mari Denton
Jun 23, 2025
3 min read

by Mari, Managing Partner, I&A Partners Founder & CEO

Introduction

At I&A Partners, we believe compensation should motivate the right behaviors and stand up to scrutiny across finance, HR, and leadership. In the age where everyone is talking about AI, I started to wonder:

Could AI help us interpret incentive models? Not calculate, not automate—but reason through structure? Could it spot patterns, anticipate outcomes, or help refine how we talk about these plans?

I’m not an AI expert. I’m a practitioner, a designer of comp strategies, a builder of systems, a translator between logic and behavior. But I wanted to explore what was possible.

So I decided to run a little experiment.

What I Built

I created a simplified Excel workbook—streamlined but representative:

A tiered commission grid (basis points based on volume and units)
A summary tab showing employee earnings—gross, net, recoverable draw, and overrides
Two roles: Loan Officers and Branch Managers
A few deliberate complexities: a “better-of” logic tier, personal production compensation, override scenarios

I picked three free AI tools—Copilot, ChatGPT, and Claude—and gave each the same starting prompt:

“Analyze this file. Tell me what it does, how it’s structured, and what formulas it uses.”

Then I followed up with a scenario:

“What happens if Pat Johnson closes two additional loans at $300,000 each?”

No special setup. Just a clean test.

Microsoft Copilot: The Curious Collaborator

Copilot impressed me in subtle ways. It interpreted structure well, identified roles quickly, and most notably—it asked clarifying questions. When I explained the “better-of” rule, it reapplied logic and recalculated based on what I shared.

It stumbled a bit when it relied too heavily on job titles to infer compensation structures—assuming role equaled logic—but it adjusted when nudged.

It wasn’t always accurate. But it was curious. Collaborative.

Best for: Exploratory modeling, thinking through edge cases, companion brainstorming.

ChatGPT: The Detached Analyst

ChatGPT’s interpretation was clean and... uninspired. It summarized the tabs, identified some formulas correctly, and laid out structure. But it failed to grasp key components of the recoverable draw—and when it misfired, it didn’t iterate or ask for clarification.

It felt like the output of a search engine: static, polished, but disconnected from any reasoning.

Best for: Research, descriptions, reciting surface-level comp mechanics.

Claude: The Perceptive Interpreter

Claude couldn’t read Excel formulas, so it asked for screenshots. That felt like a limitation—until it became an asset.

Claude inferred structure based on layout and labels, identified trigger points in the commission grid, and reasoned its way to a conclusion. It even spotted a strategic nuance: it explained why tiering was driving behavior, noting that the higher rate now applied to the full volume—not just the incremental production.

Then came the test.

When I fed it the same scenario for a branch manager—James Brown—it initially misapplied logic… but quickly caught something important:

“These numbers look like LO compensation. Is James mislabeled, or is the model using role-based overrides?”

It flagged the misalignment and proposed options. When I confirmed override logic was in play, it recalculated—including the personal comp and the new override amount based on additional volume.

It didn’t just model the change—it narrated the design.

Best for: Interpreting intent, stress-testing structure, and flagging role logic inconsistencies.

What I Learned

None of these tools are ready to replace comp engines. And they shouldn’t. The math matters too much.

But they can still play a role:

Copilot kept the logic moving
ChatGPT got the surface right
Claude interpreted incentive behavior

They helped surface assumptions, reveal misalignments, and refine the way I talked about comp—not just how I built it.

So… Would I Use Them Again?

Not to calculate. But to pressure-test logic? To model structure? To reframe language? Yes—carefully, thoughtfully, and never without a qualified human at the table.

Because these tools don’t just answer—they reflect. And sometimes, that’s the perspective you didn’t know you needed.

Comments