All posts
Comparisons2026-05-13

How to choose a retail decision platform: a guide

You've decided to buy a retail decision platform. The feature checklist won't tell you which one survives production. Here are the criteria that will.

Kevin Didelot11 min read

You've made the decision to buy rather than build. The forecasting is good enough, the data lake is in place, and the gap that remains is operational: decisions still get made by hand, late, and inconsistently across stores. So you start collecting vendor demos and comparison sheets — and almost immediately, the evaluation drifts toward a feature checklist.

The feature checklist is the wrong lens. It treats a decision platform as a sum of capabilities you can tick off. What actually determines value is whether those capabilities combine into executed decisions under real production conditions. Two platforms can match line-for-line on features and behave completely differently at scale — one drives adoption, the other produces recommendations nobody applies. This guide ranks the capability criteria that predict that difference, and gives you a way to weight them for your own context.

The capability criteria that actually matter

There are six. Each is a capability, not a feature — meaning you evaluate it by watching the platform behave on your data, not by reading a spec sheet. Together they explain almost every gap between a platform that ships value and one that joins the graveyard of stalled data projects.

1. Business-rule integration

Every retailer runs on rules that never made it into a system: minimum facings, supplier MOQs, store-cluster logic, the markdown cadence your buyers defend by hand. A platform either embeds these rules at the core of how it decides, or it applies them as a filter after the fact. And a filtered recommendation is a recommendation that was already wrong before it got trimmed.

This is the criterion that quietly sinks most deployments. When 80% of retail business rules live in operators' heads rather than in the system, a platform that can't ingest them will produce output your teams immediately distrust. Ask how the rules get in, who maintains them, and what happens when two rules conflict.

2. Decision-not-recommendation output

A recommendation says "demand for SKU 4471 will rise 12% next week." A decision says "move 40 units from store A to store C by Thursday." The first is a forecast wearing an action's clothing; the second is something an operator can execute without re-deciding anything. The distinction between forecasting and deciding is the single most under-evaluated capability in this category.

Watch for output that still requires human arbitration — quantities to confirm, trade-offs to resolve, a "review and adjust" step. That arbitration is exactly the bottleneck you were buying the platform to remove. A real decision is specific, constrained, and ready to act on.

3. Native execution

A decision that can't reach the system of record is a slide, not a decision. The platform must propagate validated decisions back into your ERP, WMS, pricing engine, or e-commerce stack — bidirectionally, in continuous flow, robust to partial failures. Anything that produces a file to re-key, or an email for someone to action, has pushed the hard part back onto your teams.

This is where many evaluations are too generous. "Integrates with SAP" can mean a nightly export or a live write-back — and the difference is the entire industrialization cost. Make the vendor show the write path on your systems, not a logo wall.

4. Adoption design

The platform that wins is rarely the most accurate one — it's the one operators trust enough to act on without overriding. Adoption is a design property: explainability of each decision, the ability to see why a rule fired, a clean override path that feeds back rather than silently discarding the signal. A platform that treats the operator as a passive recipient will see its decisions ignored at the same 70% rate that kills POCs at scale.

Evaluate this by asking who the daily user is and watching the interface they'll actually live in. Adoption rate, not model accuracy, is the KPI that predicts value.

5. Time-to-value

A platform that needs nine months of integration before the first decision ships is a platform whose ROI you'll be defending in a steering committee long before it arrives. Time-to-value is a capability of the architecture, not a promise on a slide. It depends on how the platform connects, how rules are configured, and whether the first scope can go live without rebuilding for the next one. Ask for the path to the first executed decision in weeks, on a real scope, with your data.

6. The learning loop

A decision platform that doesn't measure its own outcomes is flying blind. The sixth criterion is whether the platform closes the loop — capturing what was decided, what was executed, and what resulted, then feeding that back into the next decision. Without it you get a static engine that degrades as your assortment, network, and season shift. With it you get a system that compounds, which is the whole reason to own a platform rather than run a one-off model.

How to weight them for your context

The six criteria are universal, but their weights are not. The same scorecard, weighted for a 15-store regional chain and a 600-store international group, points at different platforms. Three variables drive the weighting.

Network size. Below roughly 20 stores, a centralized decision system still beats spreadsheets, but adoption design and time-to-value dominate — you have fewer operators and less tolerance for a long integration. Above that threshold, where decisions fragment across stores and regions, native execution and the learning loop become decisive because manual propagation simply stops scaling.

Category mix. Fashion and seasonal assortments live and die on markdown timing and end-of-season risk, so decision-not-recommendation output and business-rule integration carry the most weight. Grocery and high-rotation categories lean on continuous execution and the learning loop, where the cost of a stale decision is paid daily rather than per season.

System landscape. If your stack is consolidated on one ERP with clean APIs, native execution is a lower-risk criterion and you can weight adoption higher. If you run a patchwork of legacy and acquired systems, native execution and time-to-value jump to the top — because that's precisely where build-from-scratch integration estimates double.

The method is simple: assign each criterion a weight from 1 to 5 for your context, score each platform 1 to 5, multiply, and sum. The weighting conversation is more valuable than the final number. It forces your data and operations leaders to agree on what decisive means before a vendor frames it for you.

Red flags

Failure is usually visible in the demo, if you know what to watch for. These are the signals that predict a stalled deployment more reliably than any reference call.

  • The demo shows recommendations, never executed decisions. You see charts, scores, and suggestions — but never an action propagated into a system. The execution path is the part that's hard to fake, which is exactly why it gets skipped.
  • Business rules are described as a "configuration step" handled later. If rule integration is post-sale work scoped vaguely, the rules will arrive as a filter, and you'll spend the first season fighting rejected output.
  • Accuracy is the headline metric. A vendor leading with RMSE or forecast precision is selling a model, not a decision platform. Accuracy is necessary and nowhere near sufficient.
  • No override-and-learn path. If the operator can only accept or ignore — with no way to correct a decision and have the system absorb the correction — adoption will erode and the loop never closes.
  • The integration estimate is a single confident number. Real integration on a heterogeneous retail stack carries variance. A vendor who won't talk through failure modes hasn't done it at scale.
  • Every answer is "yes, we can do that." A platform with a point of view says no to some things. Uniform agreement means the capability is configurable in principle and proven nowhere.

A scorecard you can reuse

Take this into your evaluation. Weight each criterion 1–5 for your context, score each platform 1–5 on observed behavior (not claims), and compare the weighted totals.

CriterionWhat "5" looks likeYour weightPlatform score
Business-rule integrationRules embedded at the core; conflicts resolved by the engine__
Decision-not-recommendationOutput is specific and executable, no human arbitration__
Native executionBidirectional write-back to your systems in continuous flow__
Adoption designExplainable, override-and-learn, built for the daily operator__
Time-to-valueFirst executed decision in weeks, on a real scope__
Learning loopDecided / executed / resulted captured and fed back__

Two rules for using it honestly. Score on what you saw the platform do on your data, not on what the deck claims it can do. And if a criterion you weighted 5 scores a 2, no total elsewhere should rescue it — a decisive weakness is decisive, not averageable.

The Solya angle

This scorecard exists because the market still sells decision platforms as feature lists, while value is decided on capabilities. Solya was built around the six criteria directly:

  • Business rules embedded in the engine rather than filtered after the fact
  • Decisions specific enough to execute, not recommendations to arbitrate
  • Native bidirectional connection to your operational systems
  • Adoption tracked as the primary KPI
  • A first scope live in weeks, on real data
  • A closed loop that compounds with every season

It's the architectural conclusion of the criteria above. Not because they were written to fit it, but because a platform that ignores any one of them stalls in exactly the ways this guide describes.

Before your next demo

Ask yourself one question before the next vendor call: which of the six criteria is decisive for your network, and have you weighted it before the vendor frames the conversation? The evaluation that starts from your weighting survives the demo. The one that starts from the vendor's feature sheet ends in a stalled deployment six months later — and a scorecard nobody filled in.

For a category-level reading of the vendor field before you score individual platforms, see the 2026 AI inventory software landscape — three archetypes, seven criteria, and where each fits.


Evaluating a retail decision platform?

We offer data and operations leaders a 30-minute diagnostic to pressure-test your evaluation against your own network, category mix, and system landscape — vendor-neutral, grounded in your context.

You'll walk away with:

  • A weighted version of the six-criteria scorecard for your network
  • The two or three criteria that are decisive in your context, and why
  • The red flags most likely to surface in your shortlisted demos
Kevin DidelotCo-founder & CTO, Solya

Co-founder & CTO of Solya.

Related articles