Photo by cottonbro from Pexels

Movies Recommendation System Using Cosine Similarity

Ameer Hamza

--

On the Internet, where the number of choices is overwhelming, there is need to filter, prioritize and efficiently deliver relevant information in order to alleviate the problem of information overload, which has created a potential problem to many Internet users. Recommender systems solve this problem by searching through large volume of dynamically generated information to provide users with personalized content and services. This paper explores the different characteristics and potentials of different prediction techniques in recommendation systems in order to serve as a compass for research and practice in the field of recommendation systems.

Movie Recommendation System Image courtesy — towardDS

What is a Recommendation System?

Simply put a Recommendation System is a filtration program whose prime goal is to predict the “rating” or “preference” of a user towards a domain-specific item or item. In our case, this domain-specific item is a movie, therefore the main focus of our recommendation system is to filter and predict only those movies which a user would prefer given some data about the user him or herself.

What is Cosine Similarity?

Cosine similarity measures the similarity between two vectors of an inner product space. It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction. It is often used to measure document similarity in text analysis.

In the below project we are going to use Cosine Similarity:

You can follow along with the Jupyter notebooks from my Github repository.

The data set can be downloaded from: Here

Step Followed:

  1. Import libraries, and load the two datasets which are i. movies.csv ii.rating.csv merge both datasets with common columns.
  2. Exploratory data analysis and visualization
  3. Create a matrix ‘movie_user’, movieId in row index, userId in column index, fill the cells with rating.
  4. Finally make a recommender function to get the top 10 recommended movies.

Load and merge the datasets to get the final data set

Perform Exploratory Data Analysis

ratings_mean_count = pd.DataFrame(movie_data.groupby('title')['rating'].mean())ratings_mean_count['rating_count'] = pd.DataFrame(movie_data.groupby('title')['rating'].count())

Visualization

Count Plot show the movies rating:

Joint Plot:

The denser points mean more number of rating. very few have given their ratings between 3 and 4.

Create a movies_user matrix using pivot table, movieId, userId and rating as its value.

Create a function to get the top 10 recommended movies:

Final Result

Call the function and get the top 10 recommendations of the movie- ‘Jumanji, The (1995)’

Similarly, you can put in the name of any movie present in the dataset and get its top 10 recommendations.

--

--

Ameer Hamza

Passionate about data engineering and committed to improving data quality, accuracy, and accessibility through creative problem-solving and continuous learning.