Generic, high volume search queries are the bread and butter of any e-commerce site. But a big chunk of Amazon’s searches come from obscure long tail keywords searched rarely if ever before. These long tail queries create unique challenges for Amazon’s search algorithms and third-party sellers trying to rank well.
Long Tail Challenges for Amazon
Although obscure queries are common on Amazon, they create difficulties both for A9’s algorithms and sellers trying to optimise for them.
- Data Sparsity
By definition, long tail keywords have very few searches. So there is little historical click and conversion data for algorithms to learn from. With only a handful of prior searches, it’s hard to establish robust query-item relevance signals. This data sparsity leads to unreliable embeddings.
- Noisy Relations
Besides sparsity, connections derived from limited long tail searches tend to be noisy. For example, misclicks distort relevance, a single ambiguous search maps to unrelated items, or a user abandons a search so no clicks get logged. With so few samples, outliers and errors persist rather than getting drowned out by volume.
- Zero History
The extreme case is “zero-shot” queries that have never been seen before. With absolutely no historical behavioural signals to rely on, ranking results for these queries becomes extremely difficult:
- The search algorithm cannot look at past queries to find semantic matches or click patterns.
- There is no way to map the novel query to inventory just based on word matches or embeddings.
“Inventory mappings” refers to associating the new query to products based solely on the query text and catalog data, without any behavioural search history about this query.
- No relevance signals exist to rank which products would be best since this exact query has never been seen.
Essentially the search engine is operating completely in the dark. There are no behavioural signals or lookup patterns to go by. While techniques like semantic parsing or generative AI may hold promise for handling zero-shot queries, the paper did not go into specifics there. The main point is that with no historical signals at all, ranking results becomes almost intractable. Even humans would struggle to identify relevant search results for queries never seen before.
Amazon Seller Difficulties
These long tail problems inherent in Amazon’s search data pose difficulties for third-party sellers as well:
- Low Visibility
With little behavioural data, long tail product listings tend to have poor visibility in Amazon’s search rankings. They get outranked by more popular products matching generic queries.
- Missed Opportunities
Sellers miss sales opportunities from very targeted visitors who would be prime purchasers. But without search visibility, the right products never get seen and bought (without, Hungerscore, Cold Start and predictive (SE BERT).
- Tough SEO Optimisation
Traditional SEO approaches rely heavily on volume keyword metrics like search popularity. Long tail queries lack this data, so it’s hard to identify and optimise for them.
Amazon’s Progress on Long Tail Search
Amazon employs thousands of researchers to push the frontiers of e-commerce search algorithms. Recently they have focused on addressing the long tail problem specifically.
Published Amazon research papers give some clues into their state-of-the-art and future directions:
- Neural Ranking Models
Neural networks now power key parts of Amazon’s search ranking models. Their ability to learn more complex query-product relevance has improved long tail performance. But data sparsity remains a bottleneck, requiring techniques like self-supervised pretraining on unlabeled examples.
- Hypergraph Algorithms
Exploring alternatives to simple bipartite query-product graphs is another avenue. Hypergraphs allow modeling higher-order relationships between products themselves. This helps inform search rankings by connecting related items even if they don’t directly match the query.
- Generative AI
An emerging approach trains generative AI models like BERT on past search queries and clicked products. The fine-tuned model can then infer reasonable search results for completely unseen queries. So far, generative approaches lag behind specialised search engines. But rapid progress in AI research suggests their potential.
While Amazon keeps the internals of its A9 search algorithm secret, these published papers demonstrate active R&D investment into solving the long tail challenge.
The Future of Long Tail Search
Providing relevant results for obscure long tail queries at scale remains an active challenge across e-commerce. Amazon is pushing the boundaries of neural search algorithms, hypergraphs, and generative AI to conquer the long tail. But work still remains to overcome inherent data sparsity and noise.
Cracking the long tail represents the next horizon for search engines. Companies that make progress first will gain a competitive advantage. For Amazon sellers, staying up-to-date on the latest in AI search research can uncover advantages. Though success will depend on operationalising those insights through smart optimisation, creativity, and persistence.
By zeroing in on the long tail, brands have an opportunity to win shelf space in thousands of micro-niches that remain untouched. For those willing and able, huge rewards await.