Leadership2026-04-28

A CFO's guide to retail AI ROI: which P&L line, and when

Most retail AI ROI decks don't survive a finance review. Here's how to tell cost-saving AI from margin-generating AI — and where it actually moves the P&L.

Kevin Didelot13 min read

A retail AI ROI deck arrives on a CFO's desk roughly once a quarter. It opens with a market-size slide, moves to a "transformational" headline percentage, and closes with a payback curve that bends upward in year two. The CFO has seen this shape before. They know that a blended ROI number with no named P&L line behind it is not a business case — it's a marketing artifact dressed as a finance one.

The reasonable CFO position is not anti-AI. It's anti-handwaving. "Show me exactly which line of the P&L this moves, by how much, and when it shows up — then tell me what I can't attribute cleanly." That request is not skepticism for its own sake. It's the only frame in which an AI investment can be compared against everything else competing for the same capital.

This article answers that request in finance terms. It draws the one distinction that decides the entire business case — cost-saving versus margin-generating AI. It maps the specific P&L lines retail decision AI actually moves, gives a bottom-up model you can defend in a budget review, and stays explicit about what's hard to attribute. No headline percentage. The number should fall out of your own numbers.

Cost-saving AI vs margin-generating AI

Almost every retail AI pitch belongs to one of two categories, and the categories behave completely differently in a P&L. Conflating them is the single most common reason an AI investment underdelivers against its business case.

Cost-saving AI makes an existing process cheaper or faster. It drafts the supplier email, summarizes the sell-through report, auto-tags product images, answers the tier-one support ticket. The value is real but it is soft: hours freed, not euros banked. Freed hours only convert to P&L impact under one of two conditions — you remove the headcount, or you redeploy it to demonstrably revenue-generating work. Retail organizations rarely do either cleanly.

The analyst whose Monday-morning report is now automated does not get cut; they absorb the slack. The saving is real on a stopwatch and invisible on the income statement.

Margin-generating AI changes a decision — what gets bought, where stock is allocated, when a markdown is taken, which SKU is replenished. The output is not a faster process; it's a different action, and the better action carries a measurable gross-margin difference versus the action that would otherwise have been taken. This is the only category that lands directly on the gross-margin line, because it changes the economics of the underlying transaction, not just the labor cost of arranging it.

The distinction matters to a CFO for a precise reason: the two categories require different burdens of proof and convert at different rates. Cost-saving AI needs an honest headcount or redeployment argument before it touches the P&L at all — and most don't have one. Margin-generating AI can be tied to a dated decision, a counterfactual, and a gross-margin delta. One is an efficiency story you have to fight to bank. The other is a margin story you can measure per unit of action.

A useful filter for any AI proposal: does this make a decision, or does it make a task faster? If the answer is "faster task," demand the headcount line before crediting any ROI. If the answer is "different decision," ask which decisions, how many, and what the margin is on each — which is exactly the rest of this article.

The P&L lines retail decision AI actually moves

Retail decision AI is not diffuse. It moves four named lines, each tied to a category of operational decision a retailer already makes thousands of times a season. Those decisions just happen slower, later, and with less information than the optimal version. The value is the gap between the decision taken and the decision that should have been taken.

ROI calculator

What do manual retail decisions cost you?

Your numbers

Annual revenue€50,000,000

Total retail revenue across the network.

Number of stores40

Selling locations — used for the per-store view.

Gross margin45%

Average gross margin across the assortment.

Out-of-stock rate8%

Share of demand lost to items being out of stock.

Markdown & overstock rate12%

Revenue lost to markdowns, clearance and overstock.

Recoverable with AI decisioning

€1,650,000

€41,250 per store / year3% of annual revenue

Where it leaks

Estimated annual leakage: €7,800,000

Lost sales — stockouts€1,800,000

Markdowns & overstock€6,000,000

Recoverable share: €1,650,000

Book a working session

Estimate based on your inputs and Solya benchmark recovery rates (25% of stockout loss, 20% of markdown loss). Your real baseline is measured during onboarding.

Open the full-screen tool ↗

The calculator above turns that gap into a number for your own volumes. It's the annual margin sitting between the decisions you take today and the ones a decision layer would take.

Markdown depth

The markdown line is the most direct and the most under-instrumented. A product flagged for markdown four-to-six weeks late doesn't just get marked down. It gets marked down deeper, because the window to clear it at a shallow discount has closed. In seasonal categories that delay routinely moves the required discount from -20% to -50%, with brutal leverage on gross margin.

Decision AI moves this line by surfacing the right SKU/store pairs at the right moment, while a shallow markdown still clears the stock. The bankable effect is the difference between the discount you take early and the discount you're forced to take late — straight onto gross margin. We unpack the mechanics of this in why 70% of retail markdowns are still manual.

Overstock carrying cost

Excess stock costs 20–30% of its value per year to hold, before a single markdown. The book value an ERP shows is only a fraction of the true bill. Decision AI moves this line by preventing the overstock from forming and accumulating. It reallocates before the imbalance hardens, returns to the supplier while the contractual window is open, and transfers between stores instead of holding. Every avoided week of carry is a directly recoverable cost.

The full anatomy of this hidden bill is in the real cost of overstock — the seven cost layers no ERP consolidates.

Stock-out lost sales

The most under-counted of the four, because it never appears as a cost — it appears as revenue that simply never happened. Retailers lose up to 4% of sales to stock-outs even while warehouses overflow with the same product elsewhere. Decision AI moves this line by re-localizing stock to where demand actually is, in time to capture the sale rather than after the customer has substituted or walked. The recovered gross margin on those captured sales is the value. The second and third-order costs (substitution, lost loyalty) compound on top, as laid out in the real cost of stock-outs.

Inventory turns and working capital

The slowest-moving but most strategic line. Every euro frozen in stock that shouldn't be there is a euro not funding a best-seller's replenishment or a new collection. Decision AI raises turns by keeping the right stock in the right place, which releases working capital and improves the cash-conversion cycle. This is the line a CFO feels most directly, because it touches the balance sheet, not just the income statement. It is the through-line of from stock to cash: how decisions drive retail performance.

Notice what these four lines share: none of them is a labor-cost line. They are all transaction-economics lines, moved by changing the decision rather than speeding up the process around it. That is the structural reason decision AI is a margin story, not an efficiency story.

How to build the business case

The vendor instinct is to hand you a blended ROI percentage. Reject it. A defensible retail AI business case is built bottom-up from your own operational volumes, and it has the same shape for every one of the four lines above.

The model has three terms:

Value = addressable decision volume × decision-quality lift × margin per decision

Take each term in turn, because each is a number you already own or can defend.

Addressable decision volume. How many decisions of this type does the network make per season? Markdown decisions per SKU/store, replenishment calls per week, allocation arbitrations at season start. This is countable from your own systems — it is not an estimate the vendor supplies. It also bounds the opportunity honestly: AI cannot move decisions you don't make.

Decision-quality lift. Of those decisions, what fraction is currently suboptimal, and by how much does better timing or information improve them? This is the term to be conservative on. Do not assume every decision improves — assume a realistic adoption rate (the share of recommendations actually executed) and a realistic per-decision improvement. A business case that assumes 100% adoption and a perfect lift is the vendor deck you were trying to avoid.

Margin per decision. What is the gross-margin value of getting one decision right versus the status quo? A markdown taken at -20% instead of -45% on a given inventory value. A reallocation that captures a full-price sale instead of a stock-out. This is where your category margins, your average ticket, and your markdown ladders enter — your numbers, not benchmarks.

Multiply the three, sum across the four P&L lines, and subtract the fully-loaded cost of the platform plus the internal change effort. The output is a range, not a point estimate, and it should be expressed as a fraction of EBIT so it can be compared against other capital uses. On a retailer doing several hundred million in revenue, the public benchmarks cluster around a 1–3 point EBIT improvement. But the only number worth committing to is the one this model produces on your volumes, with your adoption assumption and your margins.

Two disciplines keep this honest. First, run it per line and never net them into one figure — markdown value and working-capital value have different time signatures and different attribution confidence. Second, write down the baseline before you start, because a business case with no recorded counterfactual cannot be settled at season end.

Time-to-value: why it lands in-season

The payback-in-year-two curve is the part of the vendor deck a CFO should question hardest — because for decision AI it is usually wrong in the safe direction. The unit of value is a single dated decision, so value accrues the first time a better decision is executed, not after a multi-year platform maturation.

The logic is structural. A markdown taken three weeks earlier banks its margin difference this season. A reallocation that captures a sale captures it this week. There is no model-maturation period during which the value is theoretical — the first correctly-timed decision is already worth its margin delta.

This is the opposite of an analytics platform whose value is deferred until enough dashboards are built and trusted. A decision that executes is worth its delta the moment it executes.

What to measure, so the value is bankable rather than asserted:

Decision adoption rate — the share of recommendations actually executed on the floor without manual rework. This is the lead indicator; value cannot exceed it.
Per-line margin delta against a recorded baseline — markdown depth versus prior season's curve, carrying cost versus prior weeks of cover, captured-versus-lost sales on re-localized stock.
Working-capital release — turns and days-of-inventory at the network level, read off the balance sheet quarter over quarter.

And the honest part, which belongs in the business case explicitly: clean attribution is hard, and pretending otherwise is what discredits AI ROI claims. A season has weather, a competitor's promotion, a macro swing — all moving the same lines decision AI moves. The defensible answer is not a fabricated counterfactual; it is a holdout.

Run a comparable set of stores or categories without the new decision layer, then measure the delta against it. Accept that the attributable number is the controlled difference, not the gross movement. A CFO will trust a smaller number that survives a holdout far more than a large one that doesn't.

That trade — a more modest figure, defensibly attributed, landing in-season rather than a large figure promised over three years — is precisely the trade a finance function should want. It is also the one most vendor decks refuse to make.

The question to settle before you sign

Run any retail AI proposal through one filter before it reaches a steering committee: is this cost-saving AI or margin-generating AI, and which P&L line does it claim to move? If the answer is a process made faster with no headcount line behind it, the ROI is soft and you should treat it as such. If the answer is a decision changed (markdown, allocation, replenishment, return), then you can model it bottom-up on your own volumes. Measure it against a holdout, and bank what survives.

The retailers getting real return from AI are not the ones who bought the most ambitious deck. They are the ones who insisted on a named P&L line, a defensible per-decision margin, and an in-season measurement plan. They walked away from anything that couldn't produce all three.

For the questions your CFO peers ask first — TCO ranges, payback windows, working-capital release, board framing — see the CFO FAQ.

Want a business case built on your P&L, not a vendor's?

At Solya, we run a 30-minute diagnostic with retail finance and operations leaders. We model the addressable decision volume, per-decision margin, and realistic time-to-value on your own numbers, not a blended ROI percentage.

You'll walk away with:

A per-line view of where decision AI moves your P&L (markdown, carry, lost sales, working capital)
A bottom-up business-case skeleton built on your decision volumes and margins
An honest attribution plan, including what a holdout can and can't prove

Kevin DidelotCo-founder & CTO, Solya

Co-founder & CTO of Solya.

Leadership2026-05-15