All posts
Leadership2026-05-06

Integrating a retail decision engine without the rot

A decision engine reads from and writes back into your core systems. Here's the integration architecture that survives production — and how to prove it.

Kevin Didelot13 min read

You read the deck and the part that lands is not the uplift chart. It is the architecture slide that doesn't exist yet. Another platform promising to fix retail, and you are the one who will be living with it. The point-to-point integrations that multiply quietly, the credentials sprayed across systems, the connector you patch long after the champion who bought it moved teams.

You are not the gatekeeper who says no. You are the person accountable when the decision layer becomes a fragile web nobody can safely change. That accountability is the right lens — and it is the one this article is written through.

Most platforms a retailer evaluates are read-only consumers. A BI tool pulls from the warehouse and renders dashboards. A forecasting model reads history and emits a number into a file. If they break, a report is late — annoying, not load-bearing. A decision engine is a different animal, and the integration risk profile is different in kind, not degree.

This article lays out where that risk actually lives, the architecture that contains it, and the security posture it demands. It also shows how to read a vendor's integration maturity before you commit your team to maintaining their choices.

Why a decision engine is a harder integration than most

A decision engine sits in the middle of the operational stack and touches almost all of it. To decide what to replenish, mark down, allocate, or transfer, it reads from many systems at once. The ERP, the WMS, the POS, the e-commerce platform, the pricing engine, and the planning tool.

Then comes the part that separates it from every read-only system you have integrated before. It has to write the decision back into the system that executes it. A markdown that can't post to the pricing engine is a spreadsheet. An allocation that can't write to the WMS is a suggestion nobody acts on.

That bidirectional requirement is where naive integrations rot. Reading is forgiving; you can retry, cache, tolerate a stale feed. Writing is not. A write that fires twice creates a double markdown. A write that half-succeeds leaves the pricing engine and the WMS disagreeing about reality.

This is the precise dead end we describe in how retail data becomes useless without a decision layer. The decision exists but can't reach the system that would act on it. It is also why a WMS and a decision platform are not substitutes — the WMS owns execution, the decision layer owns the choice.

Then add cadence. This is not a nightly batch. Retail decisions are time-sensitive: a transfer decided Tuesday is worthless if it lands Friday.

So the engine runs near-real-time, on tens of thousands of SKU/store pairs. It runs against systems that each have their own latency, their own data model, and their own idea of what a "product" is. The integration has to reconcile those differences continuously, not in a one-off mapping done during onboarding.

And the source systems disagree with each other. The ERP's stock figure and the WMS's stock figure are rarely identical at any given second. An integration architecture that assumes clean, consistent, agreeing inputs will work in the POC and fall over in week three of production. The hard part was never computing the decision. It is moving the right inputs in and the right decisions out, every hour, across systems that were never designed to agree.

The integration architecture that doesn't rot

Point-to-point is how the rot starts. One bespoke connector per system pair, each with its own retry logic, its own field mapping, its own failure mode. Six systems and you are maintaining fifteen brittle bridges, and every schema change in any system can break any of them. The architecture that survives production is built on four principles that you should be able to find in any platform worth signing.

Event-driven and API-first

The engine should consume changes as events and act on them, rather than polling every system on a timer and diffing the result. When stock moves, an event fires; the engine reacts. Event-driven design decouples the systems — the ERP doesn't need to know the decision engine exists, it just emits its events as it always did. And API-first means every read and every write goes through a versioned, documented interface, not a direct database tap. A vendor that integrates by reaching into your ERP's tables has already told you how the next five years will go.

The semantic layer as the integration contract

This is the load-bearing one. Every source system has its own vocabulary. A SKU here is an article there, a store is a site is a location, a "season" means three different things in three places. The semantic layer is the single agreed model of your business that every integration maps into and out of. It is the contract that says what a product is, what a store is, what available-to-sell means here.

Without it, every connector reinvents the mapping, and the mappings drift apart. With it, you change the mapping in one place when a system changes, and every decision stays consistent. We unpack why this matters at the decision level in what a decision layer is in retail.

Idempotency on every write

Networks fail mid-request. Messages get redelivered. The only safe assumption is that any write might arrive more than once. An idempotent write can be replayed safely — the same markdown decision posted twice produces one markdown, not two.

This is implemented with decision keys: each decision carries a stable identifier, and the execution side recognizes a repeat and ignores it. If a vendor cannot explain their idempotency model in one sentence, assume they don't have one, and assume you will be reconciling duplicates manually.

Explicit partial-failure handling

The interesting question is not what happens when everything works. It is what happens when the engine writes a transfer to the WMS, the WMS accepts it, and the write-back to the ERP fails. You now have two systems in different states, and silence is the worst possible response.

A durable architecture treats every cross-system decision as a transaction with a defined outcome. It either completes everywhere, rolls back cleanly, or surfaces a reconciliation task a human can resolve — never a silent divergence. Ask any vendor to walk you through that exact scenario. The quality of the answer tells you whether they have run in real production or only in a demo.

Security and data governance posture

A read-only tool needs read access. A decision engine needs write access to the systems that run your stores, and that changes the security conversation entirely. The questions are not optional, and they are yours to answer to your CISO.

Least-privilege, scoped per system. The engine should authenticate to each system with the narrowest possible permissions — read on the feeds it consumes, write only on the specific objects it is allowed to change. The pricing engine credential should be able to post a markdown and nothing else. No shared super-user account spanning every system; one over-privileged credential is the breach that takes the whole stack down. Scoped, rotatable, per-system credentials are the floor, not a premium feature.

Know where the data lives. You need a clear answer to where your data physically resides and where it is processed — region, residency, and whether it ever leaves your governance boundary. For a European retailer this is a GDPR question with a regulator attached, not a preference. A platform that cannot tell you precisely where a given dataset is stored and processed has not thought about the thing your legal team will ask about first.

An audit trail on every decision. Because the engine writes back into execution systems, every write it makes must be logged: what decision, on what inputs, under which business rule, at what time, executed where. This is not a compliance checkbox — it is how you debug, prove correctness, and stay reversible.

When a store manager asks why the price changed, the answer has to be retrievable in seconds. When something goes wrong, the trail is the difference between a five-minute rollback and a week of forensic guesswork. An engine that decides but does not record is an engine you cannot trust with write access.

How to evaluate a vendor's integration maturity

A platform built for integration and a platform that bolted it on after the fact demo identically. The difference only shows in production — unless you know which questions force it into the open before you sign. These separate the two.

  • "Walk me through a write that half-fails." Built-for vendors have a crisp answer about transactions, rollback, and reconciliation. Bolted-on vendors describe the happy path and go quiet on the failure.
  • "Show me your semantic model and how a new system maps into it." A real semantic layer is a thing they can show you. If the answer is "we map fields per integration," every connector is a bespoke liability.
  • "What credentials does the engine need in each system, and at what scope?" The mature answer is narrow and per-system. The worrying answer asks for broad admin access "to make integration simpler."
  • "How do you handle a schema change in our ERP?" Built-for platforms isolate the change to one mapping. Bolted-on ones treat your schema change as a support ticket and a connector rebuild.
  • "What does your audit trail capture, and can I export it?" A platform that writes decisions should hand you the full record without effort. Hesitation here is a tell.

Two more reads. First, look at whether integration is the vendor's product or their professional-services line. A platform built for integration ships connectors and a documented API as product; one that bolted it on sells you a six-month integration project per system.

Second, ask what happens if you leave. A platform built on open, API-first integration is one you can disconnect. A web of point-to-point connectors into your core systems is the lock-in you feared. That is the reason this is fundamentally a build-versus-buy question about who owns the industrialization, not just a feature comparison.

The closing read

The fear you started with is legitimate, and most platforms earn it. But the failure mode is not "decision engines are inherently fragile." It is "an engine integrated without architecture rots, and one integrated with it does not."

Event-driven and API-first, a semantic layer as the contract, idempotency on writes, explicit partial-failure handling, least-privilege security, and a full audit trail — that is the specification. Hold a vendor to it and you are no longer the gatekeeper saying no. You are the architect who made the decision layer safe and durable, which was always the more interesting job.

For the integration and security questions IT directors raise during decision-engine rollout, see our IT Director FAQ.


Pressure-test a decision engine against your stack

At Solya, we offer IT and data leaders a 30-minute architecture review to map a decision engine against your real stack — your systems, your latency, your governance constraints. No generic pitch: a concrete read on what integration would actually take, and where the risk sits.

You'll walk away with:

  • A view of which of your systems are read-only versus write-back, and the integration shape each implies
  • The security and data-residency questions to put to any vendor before you sign
  • An honest read on the integration effort and the lock-in risk for your specific stack
Kevin DidelotCo-founder & CTO, Solya

Co-founder & CTO of Solya.

Related articles