Make sure that the correct host is selected in the configuration file. Also, provide the data path for Movies Title
, Actors Details
, and Principals
in the configuration or place the files in already created folders.
The solution is containerized and automated using Dockers and Make. Use following commands to run the solution.
First, build and start the containers:
make build
Execute the solution:
make run
Execute test cases:
make run-testcases
Prune/Delete containers:
make stop
Execution time of the solution is aproximately 02 mins with 04 cores and 12G of memory.
The distribution graph is stored in resources/graph
To run the solution locally, open config file, comment out line 07
and remove the comment character from line 08
.
python runner.py --Remote False
A top-down approach is used in combination with divide-and-conquer. First, the larger data frame is selected and reduced using the filters, then the calculations are performed on the reduced data to achieve better performance.
For more samples of my work, please visit GitHub
Email :[email protected]