Markdown Optimisation with Reinforcement Learning: A Retailer’s Playbook

We trained an RL agent on 3 seasons of clearance data. Here's what it learned — and how you can replicate this approach with Oracle Retail Planning Cloud.

Clearance markdown is one of the most critical decisions in retail — and one that is still largely driven by intuition.

Most retailers rely on fixed discount strategies:
20% in week one, 30% in week two, 40% in week three.

While simple, this approach often leaves significant margin unrealized.

Reinforcement learning introduces a smarter alternative. Instead of fixed rules, an RL agent learns a dynamic pricing strategy based on real-time conditions such as inventory levels, demand patterns, store performance, and time remaining in the clearance cycle.

The goal is simple:
maximize sell-through while preserving margin.

Callout

Dataset Used:
3 seasons of clearance data
180 stores
14,000+ SKUs
~2.4 million item-store observations

The model was trained using reinforcement learning techniques on cloud infrastructure and integrated into retail planning systems.

What the Model Learned

The model quickly diverged from traditional pricing strategies.

Instead of applying uniform discounts:

It held prices longer for products with stable demand
It applied aggressive discounts earlier for trend-driven items
It adapted pricing strategies based on store location and customer behavior

This resulted in a more nuanced and effective pricing strategy.

Performance Improvements

Increased margin recovery
Higher sell-through rates
Reduced leftover inventory

The biggest takeaway:
Different products and stores require different pricing strategies — a one-size-fits-all approach does not work.

Integration with Planning Systems

The model generates weekly pricing recommendations for each product and store.

These recommendations can be:

Reviewed by planners
Adjusted if needed
Applied through existing planning workflows

This ensures a balance between automation and human control.

Implementation Approach

Phase 1 — Data Preparation

Define inputs such as inventory levels, pricing history, demand signals, and external factors.

Phase 2 — Model Training

Train the model using historical data to learn optimal pricing strategies.

Phase 3 — Testing (Shadow Mode)

Run the model alongside existing processes without affecting live operations to validate performance.

Phase 4 — Deployment

Integrate the model into live systems with monitoring and feedback mechanisms.

The Human Factor

Adopting advanced models in retail requires trust.

Experienced buyers may hesitate to rely on automated recommendations, especially when decisions directly impact revenue.

The solution is not replacing human decision-making but enhancing it:

Provide clear reasoning behind recommendations
Allow easy overrides
Continuously improve the model using feedback

The best results come from combining human expertise with data-driven intelligence.

Conclusion

Reinforcement learning offers a powerful way to modernize markdown strategies.

By moving beyond fixed discount rules and embracing adaptive pricing, retailers can significantly improve both profitability and operational efficiency.