Movies Recommendation System Using Cosine Similarity
On the Internet, where the number of choices is overwhelming, there is need to filter, prioritize and efficiently deliver relevant information in order to alleviate the problem of information overload, which has created a potential problem to many Internet users. Recommender systems solve this problem by searching through large volume of dynamically generated information to provide users with personalized content and services. This paper explores the different characteristics and potentials of different prediction techniques in recommendation systems in order to serve as a compass for research and practice in the field of recommendation systems.
What is a Recommendation System?
Simply put a Recommendation System is a filtration program whose prime goal is to predict the “rating” or “preference” of a user towards a domain-specific item or item. In our case, this domain-specific item is a movie, therefore the main focus of our recommendation system is to filter and predict only those movies which a user would prefer given some data about the user him or herself.
What is Cosine Similarity?
Cosine similarity measures the similarity between two vectors of an inner product space. It is measured by the cosine of the angle between two vectors and determines whether two vectors are pointing in roughly the same direction. It is often used to measure document similarity in text analysis.
In the below project we are going to use Cosine Similarity:
You can follow along with the Jupyter notebooks from my Github repository.
The data set can be downloaded from: Here
Step Followed:
- Import libraries, and load the two datasets which are i. movies.csv ii.rating.csv merge both datasets with common columns.
- Exploratory data analysis and visualization
- Create a matrix ‘movie_user’, movieId in row index, userId in column index, fill the cells with rating.
- Finally make a recommender function to get the top 10 recommended movies.
Load and merge the datasets to get the final data set
Perform Exploratory Data Analysis
ratings_mean_count = pd.DataFrame(movie_data.groupby('title')['rating'].mean())ratings_mean_count['rating_count'] = pd.DataFrame(movie_data.groupby('title')['rating'].count())
Visualization
Count Plot show the movies rating:
Joint Plot:
The denser points mean more number of rating. very few have given their ratings between 3 and 4.
Create a movies_user matrix using pivot table, movieId, userId and rating as its value.
Create a function to get the top 10 recommended movies:
Final Result
Call the function and get the top 10 recommendations of the movie- ‘Jumanji, The (1995)’
Similarly, you can put in the name of any movie present in the dataset and get its top 10 recommendations.