The Evolution of Addressing the Cold Start Problem in Amazon’s Product Search

Danny McMillan
November 1, 2023

Searching for new and relevant products online often leads to frustrating, irrelevant results. This “cold start” problem has persistently challenged e-commerce companies for years. However, significant progress has been made between 2016 and 2021 in developing solutions to overcome this issue.

This in-depth article traces the key milestones researchers have achieved in tackling the cold start challenge over the past several years. We’ll cover:

The origins of clearly defining the cold start problem
Early attempts at manual mitigation
Using predicted priors to provide initial signals
Spearfishing to kickstart engagement
Optimizing the approach for balanced improvements
Ongoing innovations to further improve discoverability

The Core Cold Start Challenge

In 2016, Dr. Daria Sorokina of Amazon Search gave a presentation that clearly defined the cold start problem for product search systems. She used examples like new Harry Potter book releases and hot new gadgets to illustrate the issue:

These highly relevant new products would initially rank low in Amazon’s search results, even for very related keyword searches. The rankings gradually improved over time as more behavioral data accumulated from users engaging with the products.

But new products required tedious, ongoing manual tuning by Amazon’s elite A9 search team to accelerate the pace. It simply wasn’t scalable.

The core of the issue was obvious – the lack of user engagement data prevented the search algorithms from accurately judging the relevance of new products. This caused them to rank much lower than appropriate, even for very relevant queries.

While the problem was clearly articulated, the available solutions at the time were limited and inefficient. Broadly speaking, the options were:

Wait for behavioral data to slowly accumulate over weeks and months
Manually tweak rankings of high-priority new items

Neither path was sustainable at the massive scale of Amazon’s e-commerce engine. More automated, scalable solutions were desperately needed to address the cold start problem.

Moving Towards Predicted Priors

By 2019, Amazon researchers had made significant progress in developing more programmatic solutions.

Their key innovation was a method to generate “prior” estimates for the missing behavioral signals that stymied new products, such as click-through rate.

These predictive priors leveraged the non-behavioral attributes of products, like brand, author, actor, etc. to make initial predictions of how engaging the product would be.

Introducing these estimated priors provided enough of an initial behavioral signal boost to improve the rankings of new products. Bayesian updating models then refined the estimates as real user engagement data started accumulating.

In A/B testing, this new approach showed very promising results – increasing new product impressions by 97% and clicks by 58% compared to a control baseline without priors.

However, it came at the cost of lower overall purchase rates. The undiscerning nature of the exploration meant that low-quality new items were being over-exposed as well, degrading the overall user experience.

Clearly, further refinements were required to improve the balance of discovery and revenue metrics. But this was a major step forward in programmatically overcoming the cold start problem versus purely manual solutions.

Spearfishing to Accelerate Behavioral Signals

In addition to algorithmic approaches, marketplace designers employ various tactics to help new products gather that critical initial set of engagement data needed to break through the cold start barrier.

One proven human-centered technique is known as “spearfishing” – meticulously targeting very specific, niche queries where a new product is highly likely to rank well and get clicked on.

For example, someone searching for “Led Zeppelin 2022 remastered box set” has clearly indicated their precise intent to find details on that new release. Focusing on tailoring such laser-focused, high-intent queries allows new products to start accumulating real behavioral signals.

But this process tends to be slow and unreliable. It depends on product owners correctly guessing these long-tail keywords, and hoping enough volume exists there.

The predictive priors technique provided a more systematic way to give new products a generalized boost across all related queries. But spearfishing remained, and remains, an important human-led supplementary tactic.

Balancing Improvements Via Rapid Testing

By 2021, the Amazon research team had iterated to significantly improve the real-world results of their cold start systems in production.

Their key upgrades focused on balancing the twin goals of discoverability and revenue:

Aggregated behavior data – They expanded the training data for predictive priors models to include aggregated product-level engagement metrics. This provides a stronger signal for generally popular new products while limiting over-exposure.
Rapid Bayesian updating – They implemented real-time indexing of user feedback to accelerate the Bayesian updating loops from 24 hours down to 2 hours. This prevented suboptimal priors from being exposed for too long.
Early stopping – They added business logic to stop showing clearly low-quality explorations after sufficient data came in. This further optimized the user experience.

These enhancements led to much stronger real-world results in A/B testing:

14% increase in new product impressions
11% increase in new product purchases
No degradation in overall revenue metrics

The researchers reiterated empirical Bayes techniques as an effective framework for tackling cold start through smart, selective feature exploration.

But more work was still needed to turn these promising directions into fully generalized production systems.

Ongoing Innovation to Connect Shoppers with New Relevance

In just 5 years, techniques to address the long-standing cold start problem went from scattered insights to demonstrated systems making a real impact.

Going forward, researchers focused on further enhancing these approaches:

Improving relevance and engagement predictions to minimize failed explorations
Incorporating long-term metrics beyond immediate revenue impact
Developing more personalized cold start solutions
Quantifying and optimizing diversity gains

The evolution of cold start solutions represented an important area of innovation. Shopper satisfaction relied on their ability to easily find relevant new products. Seller satisfaction depended on fair exposure for their new product launches.

And the long-term health of any marketplace rested on continuously integrating new buyers, sellers, and products.