Set up a PostgreSQL database and load IMDb data using Docker Compose to facilitate analysis and querying.
- Any machine with Docker and Docker Compose installed.
wget
for downloading IMDb datasets.
This project uses Docker Compose to set up a PostgreSQL database and load data from IMDb datasets. The data is downloaded, unzipped, and inserted into PostgreSQL tables.
The IMDb datasets used include:
name.basics.tsv.gz
: Information about people involved in the film industry. (Download URL)title.basics.tsv.gz
: Information about film titles. (Download URL)title.crew.tsv.gz
: Crew information for each title. (Download URL)title.episode.tsv.gz
: Information about TV episodes. (Download URL)title.principals.tsv.gz
: Principal cast and crew information. (Download URL)title.ratings.tsv.gz
: User ratings for each title. (Download URL)title.akas.tsv.gz
: Alternative titles for each title in different languages and regions. (Download URL)
Table Name | Columns |
---|---|
person | id, primary_name, birth_year, death_year, primary_profession, known_for_titles |
title | id, title_type, primary_title, original_title, is_adult, start_year, end_year, runtime_minutes, genres |
rating | title_id, average_rating, num_votes |
crew | title_id, directors, writers |
episode | id, parent_title_id, season_number, episode_number |
principal | title_id, ordering, person_id, category, job, characters |
title_aka | title_id, ordering, aka, region, language, types, attributes, is_original_title |
-
Clone the repository:
git clone [email protected]:soumya-codes/imdb-postgres.git cd imdb-postgres
-
Download IMDb data files:
make download-dataset
-
Start the Docker Compose setup:
make run
To stop and remove the Docker containers and network created by Docker Compose:
make teardown