Ranking models are what make it possible to have an efficient product search experience in the first place. They’re the gears that drive the whole machine.
“Water is wet” moment: Ranking products is complicated.
It should come as no surprise, then, that the models are complicated too.
So let’s start with the basics: What data actually goes into training these models?
There are quite a few. Customer actions like…
…are all used as labels, and fed into the model.
Now, there’s a bit more to this science of labels that we’ll get into later. We’ll shortly cover positive and negative labels, and how these relate to conversion KPIs.
But for now, it will suffice to know that customer actions are used as signposts to help the ranking models understand what’s driving customer behavior.
Amazon search engine is a machine—literally—at gathering data on customer actions, and compiling these actions into training sets for the ranking models. In doing so, it regularly computes unique keywords for specific marketplaces, categories and user features.
If you like, this process is akin to a meticulous librarian categorizing books based on various attributes, ensuring that each book (or in this case, product) is placed where it’s most likely to be found by the right reader (or buyer).
Once the Amazon search engine has computed the keywords, they are then reintroduced or “reissued” into their respective contexts.
This means that each keyword is placed back into the specific marketplace, category, or user feature from which it was originally derived. This ensures that the keywords maintain their relevance and specificity. Once this step is complete, feature values are collected for all items in the match set.
Okay, the terminology getting a little dense here, so let’s pause and break down what that means.
A “match set” refers to the group of products that match a given search query.
For each of these products, the search engine collects “feature values”. These could include various attributes of the product such as its price, brand, customer ratings, and so on.
Simple enough, right?
At this stage, feature vectors enter the picture.
As the Amazon search engine does its thing, the collection of feature values forms a comprehensive database for each product that matches the search query.
So a “feature vector” is essentially a list of the feature values for a given product.
Or if you’re not jargon-ed out by now – A feature vector is a multi-dimensional representation of a data item that includes various characteristics of the item.
This vector provides a detailed profile of the product in terms of its various attributes. And the process of collecting feature values for all items in the match set ensures that this feature vector is as close as possible to the original query made by the customer.
In other words, the feature vector accurately represents the details of the product that are most relevant to the customer’s search.
As usual, if we zoom out, the science is all about delivering accurate and relevant search results.
Let’s pause here, because this section makes for a good pop quiz. And a very interesting case study in the inner workings of Amazon.
If I asked you to name the biggest challenge in (algorithmically) ranking products on Amazon listings, what would your guess be?
It’s an interesting one. Spoiler in the next paragraph!
Here’s the real answer: Finding a way to show results from multiple categories, in a meaningful way, for a single search.
Yep, that simple-sounding problem has kept many Amazon data scientists up at night. You’ll recall that the Hunger Score was developed to help solve this very problem.
Here’s why it’s such a challenge:
Let’s imagine that someone searches for “Game of Thrones” on Amazon.
Roadblock #1: Do they want to find a Game of Thrones book? A DVD? A board game? The search results are going to look very different for each of those product types.
Hopefully, this scenario helps to make clear why solving the category problem is so important.
The A9’s solution is to mix together search results from various categories, based on an educated guess about exactly what the shopper is looking for.
Now, it’s easier to guess what the shopper’s looking for if you have the historic data to use as a reference point.
So for popular search terms, this guess is based on what shoppers have actually done in the past when they’ve searched for that term – known in A9 parlance as behavioral search. Amazon’s vast datasets of customer actions pay dividends here.
So far, so good. But it gets trickier when you don’t have any historic data to work from.
Here’s an illuminating quote directly from an A9 team member:
“A large percentage of queries are only seen once or twice and there are quite a few that have been seen a lot of times”
Yep, that means what you think it means. For the distinct majority of shopper searches, Amazon doesn’t have a large enough volume of data to draw reliable conclusions. It’s not dissimilar to the challenge we saw earlier in working out Query Category Scores.
So how does the A9 handle these cases?
In a sentence: It predicts the person’s intent using a special type of language model.
The methodology is as follows: To predict the shopper’s intent, the algorithm looks at all of the search terms that shoppers have used in the past 90 days, resulting in clicks on products in a specific category.
As for the rest of the equation? Well, that remains a “KFC spices” situation: We’ll just never know. And perhaps we’d rather not know.
This brings us to the concept of “affinity” between a search term and a category.
Think of this metric as a measure of how likely it is that the search term would be found in that category. It’s like the algorithm’s way of playing matchmaker between your search and the vast array of products on Amazon.
The Power Law Distribution is the statistical model that the Amazon A9 Algorithm uses to deal with some of this complexity.
It has a lot in common with the Pareto Distribution – the famous 80/20 rule common to statistics classes, stating that 80% of the results often come from 20% of the causes.
But in a nutshell, the Power Law Distribution just means that a few search terms are very common and used frequently, while a large number of search terms are used very rarely. It’s like a popularity contest where a few terms are the cool kids and the rest are just trying to fit in.
In terms of Amazon ranking algorithms, this distribution is crucial. Since the most common search terms are used very frequently, the algorithms prioritize results for these terms. In effect, the algorithm is saying, “These are the terms people are most likely to be searching for, so let’s make sure we show them the good stuff first.”
Now, let’s bring it home with some real-world implications.
If you’re an Amazon seller in the US market, you’ve likely developed a depth of relevant keywords. The US market is a beast unlike most others in terms of search term depth.
However, when you hit the European market, you might find it’s a different ball game. While the US market might have a depth of 20 search terms, Germany, the second largest market, might only have half of that, and the UK half again.
Managing expectations is key here, as trying to crack Europe often leads to unpleasant surprises for sellers accustomed to the expansive market of the US.
Right, action points.
On the face of it, the above seems like a dilemma. The Amazon A9 Algorithm optimizes results for the search terms that are used very frequently – but isn’t it a tough battle to rank for these?
Yes indeed. But—particularly in the US market on Amazon —if you can make up that search volume by running PPC Ads to a lot of long tail keywords rather than a handful of high-volume ones, it’s possible to reap the benefits of the volume with much less of a fight over market share.
Not that it’s going to be easy (we’re talking about Amazon in 2023). We’ve seen that even the A9 algorithm struggles to deal with some of the nuances of low-volume search terms. Well, those nuances don’t get any easier when you’re dealing with them yourself.
For one, by their very nature, low-volume longtail keywords are going to yield (for the most part) infrequent conversion windows.
The statisticians among you will be able to finish my train of thought here. Infrequent conversions mean that it can be tough to “dial in” your Amazon PPC campaigns, because you may simply not have enough data for your results to be statistically significant.
And given the increased number of PPC campaigns, there’s no getting around this being a more labor-intensive approach.
But, for those prepared to pay the price and roll with the numerical punches, it can nonetheless be a highly fruitful strategy.
And most importantly, it’s one that works in harmony with what we know about the A9.
Looking for a Better Agency?
Are you a 7 or 8-figure Amazon seller who is…
Speaking of working in harmony with Amazon’s A9, let’s tackle the chunk of the algo where the real magic happens.
The Query Ranking Module is the part of the algorithm that assigns a ranking value to products—”rank juice,” as it’s more memorably called—when a shopper enters a query.
As with most things on Amazon, the concept sounds simple and conceals a lot of complexity.
Because as it turns out, computers don’t automatically understand natural language – which means there’s a lot of work that needs to happen to go from a shopper’s query to ranking products for that query.
First up…
No, it’s not a scene from Mission Impossible.
You see, the ranking model can only take over once the Query Understanding Model has taken the user’s search query, and broken it down into components that the ranking algorithm can understand. Then it can figure out what the user is looking for.
So when you enter a search on Amazon, the Query Understanding Model is the first place it lands.
The query that arrives on the doorstep of this model is usually in the form of unstructured text which can’t be categorized by the algo.
Therefore, the goal of the query extraction process is to structure the query as much as possible to match it to the right product, and improve its ranking.
You could imagine this as the difference between raw iron ore (the user’s search query) and polished steel (the query after being structured and understood). Raw iron ore is all well and good, but if you want to iron your clothes with it, you’re going to need to put some work into refining it.
An example will illustrate how the Query Understanding Model actually works.
At the core of the model is the task of query tagging, which is what it sounds like – parsing and assigning Amazon – intelligible tags to the various parts of the query.
Let’s imagine you’re searching for a Nike shoe for women, that costs under 200 dollars, and is Prime eligible. The raw query would likely look something like “nike shoe under 200 women prime”.
The algorithm needs to tag different parts of the query:
What started as a natural language query has now been translated into Amazon-ese – so the ranking model can get to work on matching it with the most relevant products.
At this stage, I hope you can see how the feature vectors we talked about earlier come full circle in helping the Amazon A9 to rank products. Amazon now has a clear line of communication between what the user’s searched for, and the features of the product itself (for example, it can clearly map the user’s request for a specific brand to the “Brand” field of the feature vector).
Not that this is an exhaustive list. The other types of info that could be (and often are) tagged in search queries includes:
And these are just the ones that we know about – so it’s safe to assume that the A9 Algorithm is doing a lot of data crunching under the hood.
One takeaway is that tagging specific features of your product—let’s say “waterproof”, “noise-canceling” or “wireless”—isn’t going to hurt your chances of being deemed relevant to shoppers looking for products with those features.
Trying to cover every possible reason for ranking woes would be impossible.
But let’s walk through an example, which should provide some helpful pointers by using what we’ve covered so far.
Let’s say you’re trying to index your t-shirt for the keyword “vintage” and you’re having no luck. Here are a few possible reasons you might not be ranking:
Once the Amazon search query has been understood, the ranking model gets to work and assigns a ranking value (rank juice) to each product, which represents how relevant each product is to the user’s search query.
After that, you know the drill – most relevant products to the top!
Naturally, there’s a lot of complexity that goes into executing that simple idea (are you noticing a theme here?). So here’s where your usual Amazon listing optimisation tools come into play: Titles, bullet points, price, reviews and a medley of other attributes are taken into account by the ranking model.
Rank juice also feeds heavily into Amazon’s auto-complete feature.
Like the SERPs themselves, the auto-complete suggestions don’t take long to respond to changing customer demands.
In other words, just as a product’s organic ranking will soon rise if it becomes popular, a search term that starts to get a lot of searches will soon show up in the auto-complete drop-down. Once again, we see Amazon constantly tweaking and refining based on user search behavior.
The principle goes beyond basic search volume data, though. If a product becomes popular, increased organic ranking isn’t the only benefit – Amazon may also start to suggest that product in the auto-complete list.
This gives an encouraging lifeline to ASINs which are stagnating. If a previously unpopular product becomes popular, Amazon may still readily start to suggest it in the auto-complete list. The same holds true for previously unpopular search terms.
Like most aspects of the A9, the algorithm is designed to be dynamic and adaptive – and one corollary of that is a willingness to move in the direction of customer demand.
So far, we’ve covered the journey from query to results. And we’ve seen that Amazon likes to structure its data to the fullest extent that it can.
But much of that structuring is done before we even get to the starting line.
You see, the category ladder—an often overlooked aspect of product search—plays a pivotal role in the process. It’s not just about identifying what a product is (say, a shoe) but also about placing it within a carefully arranged hierarchy of categories.
For instance, a shoe isn’t just a shoe. It could be a women’s shoe, an athletic shoe, or even a women’s athletic shoe. And Amazon has put a tremendous amount of money into understanding that broader context of the user’s search.
This hierarchical categorisation is the compass that guides the ranking model, helping it to understand the intricate relationships between different products and categories.
And this understanding, in turn, empowers the model to make more informed decisions about which products are most relevant to the user’s search.
Implicit in this model is a common cause of indexing problems – and one very seldom appreciated.
Sellers often assume that flagging rankings are an indicator that Amazon doesn’t view their ASIN as sufficiently relevant. And they often have a point. But this is far from the only possible cause. Given that most ranking is measured on a category level, a lot of your product’s fate is decided long before in-category relevance factors enter the picture.
Understanding this can give you a huge leg up when root-causing ranking issues. At the very least, it can help you to avoid sinking a ton of money into an ASIN that never had a chance to begin with.
In considering this part of the algorithm, the “why” remains what it always has been: Amazon’s desire to understand where the shopper is coming from, in order to serve them the most relevant results possible, and in doing so provide them with the best possible experience.
The category ladder, meanwhile, gives us another piece of the “how.”