Cynthia Fonderson
November 3, 2022

Building & Optimizing A Movie Recommendation System

Posted on November 3, 2022  •  1 minutes  • 212 words

The goal of this project is to build a movie recommendation system, using latest data scraped from GroupLens and The Movie Database. The GroupLens data was last updated on September 26, 2018. The dataset includes data from 283228 users between January 09, 1995 and September 26, 2018, and contains 27,753,444 ratings and 1,108,997 tag applications across 58,098 movies.

Data collection

Movie ids and information were taken from GroupLens and used to scraped movie data from The Movie Database using an API. The scraped data contain information about a movies name, cast, crew, release year, adult rating, poster, revenue and runtime amongst others. Following data cleaning and feature engineering, the dataset had 18 fields descibed below:

Variable Description
id Movie ID in the TMDb
year Movie release year
title Movie title in english
runtime Movie runtime in minutes
collection Collection name, if applicable
genres Movie genres
tagline Movie tagline
overview Plot overview
cast Names of the first 5 cast members
director Movie director name
producer Movie producer(s) name
keywords Movie’s keywords
adult Movie’s adult rating (bool)
prod_comp Name of production company/ies
languages Languages spoken in the movie
popularity Movie’s popularity on TMDb
vote_count Number of users who rated the movie on TMDb
vote_avg Movie’s average score on TMDb



Optimization work flow

workflow

Full Project

Contact me / Contactez-moi

I work on everything data / J'aime tout ce qui concerne la science des données.