In 2003, Amazon revealed their secret to the world: their item-item based collaborative filtering algorithm! In their paper, *Amazon.com Recommendations Item-to-Item Collaborative Filtering*, which has ~6000 citations on Google Scholar as of October 2019, they share the details on their recommender system that drove their website’s revenue for years.

There are 3 interesting problems that are solved in this paper. Those are:

- how to represent items
- how to compute similarity of items, and
- how to scale the solution.

And, the *most* interesting part of this is: they patented it.

Before we discuss the patent, and ask some questions, let’s step through each sub-problem and how they solved it.

**Problem 1: Representing Items**

A question when expressing real world items in mathematics is: how do we represent them?

There are many ways to do this: we could assign a number to an item and let that be its representation, we could create a one-hot encoding, we could label an item with some text and convert that text into some latent representation, we could learn item representations in an embedding matrix, and the list goes on.

The way Amazon, in this paper, chooses to represent items is simply:

*Every item is represented by the set of users who purchased it.*

That means that if users A, B, and C have purchased item 1, then we represent item 1 as: *{A,B,C}*.

We can then create a one-hot vector where the 0th entry is 1 if user A purchased the item, and 0 otherwise, the 1st entry is 1 if user B purchased the item, and 0 otherwise, etc.

Then item A is represented as: *[1,1,1,….0]* in *R^M* where we have *M* users, so *M* entries in our item vector.

It’s simple, and easy to understand. An added benefit is that we can easily represent users in the same manner, if we so choose to. That is, *every user is represented by the set of items they have purchased*.

Now that we have our item representations, we can compute similarity.

**Problem 2: Computing Similarity**

Another question when trying to find similarities between mathematical objects is: what do we mean by similarity? There are many ways to compute it. For a few of those methods, check out my blog post on similarity metrics.

Amazon uses cosine similarity, for the reason, I believe, that it’s very simple:

The smaller the angle between the two item vectors (items *i* and *j*), the more similar the two items.

**Problem 3: Scaling**

So, great solution, but the problem is, the runtime isn’t great. In order to compute similarities, you’d have to loop over all possible item-pairs.

Since many product pairs have no common customers (perhaps someone who buys mountain biking equipment would not also buy textbooks to learn about mushroom foraging), another method can be proposed:

They propose another method:

*For each item in product catalog, I1:
—For each customer C who purchased I1:
—— For each item I2 purchased by customer C:
——— Record that a customer purchased I1 and I2:
— For each item I2:
—— Compute the similarity between I1 and I2:*

This means comparisons are only done between items that have common customers.

It’s *O(N^2M)* in the worst case, but *O(NM)* in practice, as most customers have only a few number of purchases. Sampling users who purchase the most popular items helps reduce runtime, with only a small reduction in quality.

So they pre-compute everything and save it, so they can have fast *O(1)* lookups.

It’s a great little trick. But there’s a catch! They patented it. Such a simple idea, too!

**Open thoughts, questions, rambling**

Did Amazon seriously patent this? And did the patent seriously expire only in 2018? I find it fascinating that Amazon was able to patent “[c]ollaborative recommendations.” Does that mean if someone pre-2018 implemented this paper, and put it into production, the could have been sued? Why is this a patent, when it’s really just academic research (in my naive opinion and perspective)? I suppose they want to protect their intellectual property (IP), but what they patented seems so rudimentary.

**> The argument ***for ***patenting**

Yes, Amazon patented their algorithm. But that’s fair for them to do – when added to their infrastructure, it becomes a piece of proprietary technology for the firm. And they wouldn’t want other companies to go out and copy their system. It’s their IP, after all. A patent isn’t just made for physical inventions, a piece of software is an invention too.

**> The argument ***against ***patenting**

The machine learning community is all about open source. The knowledge of many is greater than the knowledge of one individual. To me, patenting comes across as a way of hoarding your knowledge, and your money. I see patents as a great way to protect inventions, and to give credit to their inventors. But I always saw that as applied to physical inventions – ones you can hold in your hand, and that interact with the built environment. Is a math equation really an invention, or is it simply an advancement in the literature that should be published, and free for anyone to use, and importantly, free for anyone to advance and improve upon?