Project1 wikipedia viewcount

This project takes data files from wikipedia of how many views each page has as retrievable here https://dumps.wikimedia.org/other/pageviews/

It then combines the views from each country and discards any pieces with less than 100 views

This layer of the project gives the top 10 of the resulting page counts in order

Technologies Used

Hive
Big-Data-Europe's Hive docker container - https://github.com/big-data-europe/docker-hadoop
More in 1st part at the link at the bottom of this README

To-do list:

Set up Big Data Europes hive container

git clone https://github.com/big-data-europe/docker-hive && cd docker-hive
docker-compose up -d

copy your previous data to the namenode container then access it and prepare the data

docker cp {YOUR HADOOP NAMENODE CONTAINER ID}:output output
docker exec -it {YOUR docker-hive_namenode} bash
hadoop fs -get output output

Now you can leave the namenode container and run

sbt run