Item-to-Item Collaborative Filtering with Amazon's Recommendation System

Introduction

In making its product recommendations, Amazon makes heavy use of an item-to-item collaborative filtering approach. This essentially means that for each item X, Amazon builds a neighborhood of related items S(X); whenever you buy/look at an item, Amazon then recommends you items from that item’s neighborhood. That’s why when you sign in to Amazon and look at the front page, your recommendations are mostly of the form “You viewed… Customers who viewed this also viewed…”.

Other approaches.

The item-to-item approach can be contrasted to:

  • A user-to-user collaborative filtering approach. This finds users similar to you (e.g., it could find users who bought a lot of items in common with you), and suggest items that they’ve bought but you haven’t.
  • A global, latent factorization approach. Rather than looking at individual items in isolation (in the item-to-item approach, if you and I both buy a book X, Amazon will make essentially the same recommendations based on X, regardless of what we’ve bought in the past), a global approach would look at all the items you’ve bought, and try to detect properties that characterize what you like. For example, if you buy a lot of science fiction books and also a lot of romance books, a global-approach algorithm might try to recommend you books with both science fiction and romance elements.

Pros/cons of the item-to-item approach:

  • Pros over the user-to-user approach: Amazon (and most applications) has many more users than items, so it’s computationally simpler to find similar items than it is to find similar users. Finding similar users is also a difficult algorithmic task, since individual users often have a very wide range of tastes, but individual items usually belong to relatively few genres.
  • Pros over the factorization approach: Simpler to implement. Faster to update recommendations: as soon as you buy a new book, Amazon can make a new recommendation in the item-to-item approach, whereas a factorization approach would have to wait until the factorization has been recomputed. The item-to-item approach can also be more easily leveraged in several areas, not only in the recommendations made to you, but also in the “similar items/other customers also bought” section when you look at a particular item.
  • Cons of the item-to-item approach: You don’t get very much diversity or surprise in item-to-item recommendations, so recommendations tend to be kind of “obvious” and boring.

How to find similar items

Since the item-to-item approach makes crucial use of similar items, here’s a high-level view of how to do it. First, associate each item with the set of users who have bought/looked at it. The similarity between any two items could then be a normalized measure of the number of users they have in common (i.e., the Jaccard index) or the cosine distance between the two items (imagine each item as a vector, with a 1 in the ith element if user i has bought it, and 0 otherwise).

Edwin Chen

Surge AI CEO: data labeling and RLHF, designed for the next generation of AI.


Need high-quality, human-powered data? We help top AI and LLM companies around the world create powerful, human-labeled datasets.


Ex: AI, data science at Google, Facebook, Twitter, Dropbox, MSR. Pure math and linguistics at MIT.


Surge AI
Surge AI Blog
Surge AI Twitter
Surge AI LinkedIn
Surge AI Github

Twitter
LinkedIn
Github
Quora
Email

Recent Posts

A Visual Tool for Exploring Word Embeddings

Surge AI: A New Data Labeling Platform and Workforce for NLP

How Could Facebook Align its ML Systems to Human Values? A Data-Driven Approach

Exploring LSTMs

Moving Beyond CTR: Better Recommendations Through Human Evaluation

Propensity Modeling, Causal Inference, and Discovering Drivers of Growth

Product Insights for Airbnb

Improving Twitter Search with Real-Time Human Computation

Edge Prediction in a Social Graph: My Solution to Facebook's User Recommendation Contest on Kaggle

Soda vs. Pop with Twitter

Infinite Mixture Models with Nonparametric Bayes and the Dirichlet Process

Instant Interactive Visualization with d3 + ggplot2

Movie Recommendations and More via MapReduce and Scalding

Quick Introduction to ggplot2

Introduction to Conditional Random Fields

Winning the Netflix Prize: A Summary

Stuff Harvard People Like

Information Transmission in a Social Network: Dissecting the Spread of a Quora Post

Introduction to Latent Dirichlet Allocation

Introduction to Restricted Boltzmann Machines

Topic Modeling the Sarah Palin Emails

Filtering for English Tweets: Unsupervised Language Detection on Twitter

Choosing a Machine Learning Classifier

Kickstarter Data Analysis: Success and Pricing

A Mathematical Introduction to Least Angle Regression

Introduction to Cointegration and Pairs Trading

Counting Clusters

Hacker News Analysis

Layman's Introduction to Measure Theory

Layman's Introduction to Random Forests

Netflix Prize Summary: Factorization Meets the Neighborhood

Netflix Prize Summary: Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights

Prime Numbers and the Riemann Zeta Function

Topological Combinatorics and the Evasiveness Conjecture

Item-to-Item Collaborative Filtering with Amazon's Recommendation System