All posts
Leadership2026-05-15

The supply chain VP's playbook for AI agents in retail

If an agent re-orders across your network and gets it wrong, it gets it wrong at scale — and you're accountable. How to automate decisions with guardrails.

Kevin Didelot12 min read

You run a network. Service levels, inventory turns, on-shelf availability across hundreds of stores and tens of thousands of SKUs. When one of those numbers moves the wrong way, it lands on your desk. So when a vendor says "AI agents that decide and execute," your first reaction isn't excitement. It's a specific, well-earned fear.

The fear is this. A planner who makes a bad call gets one store, one category, one order wrong. An agent that makes the same bad call makes it everywhere at once.

It re-allocates stock away from the stores that needed it. It re-orders against a demand signal that was noise. It propagates one flawed assumption across the entire network before anyone notices. And you are the one who answers for it.

That fear is not Luddism. It's operational accountability working exactly as it should. The question is not whether to take it seriously — you must.

The question is whether "don't automate decisions" is the right conclusion to draw from it. It isn't. The right conclusion is narrower and more useful: don't automate decisions without governance. This article is the playbook for the governed version.

The real risk isn't automation, it's ungoverned automation

Strip the fear down to its mechanism and you find one word: scale. An agent's danger and its value come from the same property — it acts on every SKU, every store, every hour, without the throughput ceiling a human team hits. That scale is exactly why you'd want it. It's also exactly why an error compounds.

But notice what the fear quietly assumes: that the agent acts unconstrained. That it's free to re-order any quantity, re-allocate any stock, trigger any transfer, with nothing standing between its decision and your network.

An agent with no limits is dangerous at scale. An agent with hard limits is safe at the same scale. The scale didn't change. The governance did.

Meanwhile, the do-nothing option has a cost you already pay. Your planners cover the strategic SKUs, the A-class items, the categories that get attention. The long tail — the thousands of slow movers, the secondary stores, the off-peak reorders — gets a rule of thumb, a stale safety stock, or nothing at all. That's not control. It's uncovered surface area dressed up as control, and it leaks margin quietly every week.

So the real comparison isn't agent vs. human judgement. It's governed automation that covers the whole network versus manual judgement that covers the part you have bandwidth for, plus neglect everywhere else. The risk you're trying to avoid — getting it wrong at scale — is real. But ungoverned automation is one way to hit it, and under-coverage is another. The playbook below is how you get the scale without the blast.

Staged autonomy: the recommend → approve → bounded-auto ladder

You don't hand an agent the keys to the network on day one. You move it up a ladder, one rung at a time, where each rung earns the next by proving its decisions against reality. Autonomy is granted per decision type, not per system — and it's always reversible.

Rung one: recommend. The agent reads the live state of the network, applies the rules, and proposes the decision — the reorder quantity, the transfer, the allocation split. A planner sees the proposal and the reasoning, and decides. Nothing executes without a human.

At this stage you're not testing whether the agent is right. You're measuring how often its proposal matches what your best planner would have done, on the SKUs where you can check. This is the same advisory posture most retail AI is stuck in today — but here it's a starting rung, not the destination.

Rung two: approve. Once the match rate on a decision class is consistently high, you flip it. The agent now prepares the decision and the execution, and the planner approves in bulk — confirming a batch of reorders rather than keying each one. The human is still in the loop on every action, but the friction collapses. Crucially, you watch where the planner rejects: every rejection is a rule the agent didn't know yet, and it goes back into the engine.

Rung three: bounded auto-execute. When rejections on a decision class fall to near zero, that class graduates. Inside a defined envelope — quantities under a cap, value under a threshold, the routine reorders that fit every rule — the agent acts on its own. Outside the envelope, it drops back to approve automatically. The result is the inversion that actually scales: the routine 90% runs without a human, and your planners spend their judgement on the 10% of exceptions that deserve it.

The ladder is the whole point. You never face a binary "trust the agent or don't." You face a continuous, evidence-based decision: which decision classes have earned which rung, on the data you've watched accumulate. And every rung is reversible — a class that starts misbehaving drops back down, automatically.

The guardrails that make agents safe

Staged autonomy is the process. The guardrails are the enforcement — the hard constraints that hold at every rung, including bounded auto-execute. Three of them carry the weight.

Business rules as hard limits, not suggestions

The most important guardrail is that your business rules live inside the engine as constraints the agent cannot violate — not filters applied after the fact. These are the rules that already govern your network:

  • Minimum order quantities and supplier lead times
  • Margin floors and markdown calendars
  • Store receiving capacity and allocation fairness across the network

The agent doesn't consider these and weigh them off against the objective. They bound the space of actions it's allowed to take at all.

This is the difference between an agent that might breach a constraint if the optimisation pulls hard enough, and one that cannot. The action that breaches it was never in the option set. A decision is applicable by construction or it isn't generated.

That property is also what makes recommendations worth approving in the first place. A planner stops rejecting outputs that ignore a constraint nobody coded, which is precisely why so much retail AI advice gets discarded.

Blast-radius controls

Hard limits keep each individual decision inside the rules. Blast-radius controls keep a systemic error from propagating before anyone catches it. These are caps on aggregate movement, not per-decision.

No more than X% of a SKU's network stock re-allocated in one cycle. No more than Y total units auto-ordered against a single supplier per day. And no transfer wave that empties a region.

The logic is borrowed from how you'd run any high-throughput system: assume something will eventually go wrong, and make sure the failure is contained rather than networkwide. If a demand signal turns out to be noise, a blast-radius cap means the agent over-orders within a bounded envelope you can absorb. It does not over-order across every SKU it touched that night. The cap converts a catastrophe into an incident.

Override that always wins

The third guardrail is the simplest and the most reassuring: a human override that takes precedence over any agent decision, instantly, with no ceremony. Pause a decision class. Roll back a batch. Freeze auto-execution on a category, a supplier, a region. The override is not an escalation ticket — it's a switch, and the agent yields to it without argument.

This matters less because you'll use it constantly and more because knowing it's there changes how the organisation relates to the agent. Planners supervise with confidence when they know they can intervene the moment something looks off. The override is what makes the autonomy psychologically affordable — and the audit trail behind it (who acted, what fired, why) is what makes it accountable.

The change-management: getting planners to supervise, not fight

Here is the part the technology can't solve, and the part that decides whether the rollout sticks. Your experienced planners have spent years building the judgement the agent is now partly automating. If they perceive the agent as a replacement, they will fight it. They override correct decisions to prove they still matter, withhold the unwritten rules that would make it better, quietly route around it. And they'll be right to, because nobody supervises a system they're trying to discredit.

The reframe that works is concrete, not motivational. The planner's job is not going away; it's moving up the ladder alongside the agent. They stop keying thousands of routine reorders and start governing the agent that keys them: defining the envelopes, reviewing the exceptions, deciding which decision classes are ready to graduate. The planner becomes the supervisor of a system that covers far more network than they ever could by hand. That is more leverage, not less, on the same accountability they already carry.

Three things make that reframe real in practice, and skipping them is how rollouts fail:

  • The rejection loop is visible and it pays off. When a planner overrides the agent, that correction has to visibly become a rule the agent respects next time. If overrides vanish into a void, planners learn the system doesn't listen, and supervision degrades into resentment.
  • Planners own the envelopes. The decision to move a class from approve to bounded auto-execute is theirs to make, on evidence they can see — not a setting an admin flips on their behalf. Autonomy granted by the experts is supervised; autonomy imposed on them is sabotaged.
  • The first scope is chosen to build trust, not to impress. Start on a decision class where the agent's value is obvious and the downside is bounded — high-volume routine replenishment on stable SKUs, not the strategic call that defines someone's quarter. Trust compounds from there.

This is the same shift that separates a tool you assemble from a transformation you co-own. It's why deployment has to be led with operations, not delivered to them. The retailers who get this right don't have better models. They have a closed decision-to-execution loop their teams actually supervise — and the supervision is the product, not an afterthought.

The real question to ask

The fear you started with was the right instinct pointed at the wrong target. The danger was never automation. It was automation without rules, without blast-radius limits, without an override, deployed to planners instead of with them. Remove those and what's left isn't a loss of control — it's control extended across the part of your network you've never had the bandwidth to govern.

So ask the question that actually decides the outcome. Not "can I trust an agent to decide," but "what governance would let me trust it on this decision class, at this rung, with these limits?" That question has answers.

It turns an all-or-nothing leap of faith into a staged, reversible, evidence-based rollout. That's the difference between continuous coverage and the weekly meeting where the network drifts between sessions. The accountability stays yours. The reach finally matches it.

For the governance and technology questions supply chain leaders face during AI rollout, see our Supply Chain VP FAQ.


How would staged autonomy work on your network?

At Solya, we offer supply chain leaders a personalized 30-minute working session. We map, on your own decision classes and constraints, where staged autonomy would start, what guardrails would bound it, and which decisions are ready for which rung.

You'll walk away with:

  • A read on which of your decision classes are candidates for recommend, approve, or bounded auto-execute first
  • The specific business rules and blast-radius limits that would govern an agent on your network
  • A staged, reversible rollout sequence designed around your planners supervising, not fighting
Kevin DidelotCo-founder & CTO, Solya

Co-founder & CTO of Solya.

Related articles