Recommendation System


Table of Contents

1. Key Points
2. Introduction
3. Exploratory Data Analysis
4. Rank Based Recommendations
5. User Based Collaborative Filtering
6. Matrix Factorization
7. Conclusions
8. Acknowledgements

Key Points
Experimental Design
1. Concepts of Experiment Design
    Types of Experiments
    Types of Sampling
    Measuring outcomes
    Creating Metrics
    Controlling Variables
    Checking Validity
    Checking Bias
    Ethics in Experimentation
2. Statistical Considerations in Testing
    Statistical Significance
    Practical Significance
    Experiment Size
    Using Dummy Tests
    Non-Parametric Tests
    Missing Data
    Analyzing Multiple Metrics
    Early Stopping
3. A/B Testing
    Building a Funnel
    Deciding on Metrics
    Bias
    Analize Data
    Draw Conclusions
Recommendation Engines
1. Introduction to Recommendation Engines
    Knowledge Based Recommendations
    Content Based Recommendations
    Collaborative Filtering Based Recommendations
      Model Based Collaborative Filtering
      Neighborhood Based Collaborative Filtering
        Pearson's correlation coefficient
        Spearman's correlation coefficient
        Kendall's Tau
        Euclidean Distance
        Manhattan Distance
1. Matrix Factorization for Recommendations
    Singular Value Decomposition - SVD
    Latent Factors
    Funk SVD
    Cold Start Problem

Introduction

For this project we analyze the interactions that users have with articles on the IBM Watson Studio platform, and make recommendations to them about new articles we think they will like.


Exploratory Data Analysis

Before making recommendations, we need to explore the data.

We are interested in finding out the distribution of how many articles a user interacts with in the dataset.



Rank Based Recommendations

Before we start building recommendations, we need to find the most popular articles simply based on the most interactions. These articles are the ones we might want to recommend to new users.

The popularity of an article can really only be based on how often an article was interacted with.



User Based Collaborative Filtering

We need function to reformat the df datafraame to be shaped with users as the rows and articles as the columns. The following must happen:

Each user should only appear in each row once.
Each article should only show up in one column.
If a user has interacted with an article, then place a 1 where the user-row meets for that article-column. It does not matter how many times a user has interacted with the article, all entries where a user has interacted with an article should be a 1.
If a user has not interacted with an item, then place a zero where the user-row meets for that article-column.



The following function should take a user_id and provide an ordered list of the most similar users to that user (from most similar to least similar). The returned result should not contain the provided user_id, as we know that each user is similar to him/herself. Because the results for each user here are binary, it (perhaps) makes sense to compute similarity as the dot product of two users.



We will use the function that provides the most similar users to find articles we can recommend to each user.



The next step is to improve the consistency of our user_user_recs function.

Instead of arbitrarily choosing when we obtain users who are all the same closeness to a given user - choose the users that have the most total article interactions before choosing those with fewer article interactions.

Instead of arbitrarily choosing articles from the user where the number of recommended articles starts below m and ends exceeding m, choose articles with the articles with the most total interactions before choosing those with fewer total interactions. This ranking should be what would be obtained from the top_articles function.



Matrix Factorization

In this part we will build and use a matrix factorization to make article recommendations to the users on the IBM Watson Studio platform.

Acknowledgements

The guidance for this notebook and the functions descriptions was provided by Udacity for students to build portfolio.