Movies Recommendation System Using Cosine Similarity

3 min readMay 7, 2021

On the Internet, where the number of choices is overwhelming, there is need to filter, prioritize and efficiently deliver relevant information in order to alleviate the problem of information overload, which has created a potential problem to many Internet users. Recommender systems solve this problem by searching through large volume of dynamically generated information to provide users with personalized content and services. This paper explores the different characteristics and potentials of different prediction techniques in recommendation systems in order to serve as a compass for research and practice in the field of recommendation systems.

Movie Recommendation System Image courtesy — towardDS

What is a Recommendation System?

Simply put a Recommendation System is a filtration program whose prime goal is to predict the “rating” or “preference” of a user towards a domain-specific item or item. In our case, this domain-specific item is a movie, therefore the main focus of our recommendation system is to filter and predict only those movies which a user would prefer given some data about the user him or herself.

What is Cosine Similarity?

Cosine similarity measures the similarity between two vectors of an inner product space. It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction. It is often used to measure document similarity in text analysis.

In the below project we are going to use Cosine Similarity:

You can follow along with the Jupyter notebooks from my Github repository.

The data set can be downloaded from: Here

Step Followed:

Import libraries, and load the two datasets which are i. movies.csv ii.rating.csv merge both datasets with common columns.
Exploratory data analysis and visualization
Create a matrix ‘movie_user’, movieId in row index, userId in column index, fill the cells with rating.
Finally make a recommender function to get the top 10 recommended movies.

Load and merge the datasets to get the final data set

Perform Exploratory Data Analysis

ratings_mean_count = pd.DataFrame(movie_data.groupby('title')['rating'].mean())ratings_mean_count['rating_count'] = pd.DataFrame(movie_data.groupby('title')['rating'].count())