A Docker image to play around with Apache Hadoop in Pseudo Distributed Mode (single cluster mode).
Below are the steps to play around with this image using Play with Docker.
- First of all, create an account on Docker Hub.
- Login to Play with Docker using the Docker Hub account you just created.
- You should see a green "Start" button, click on it to start a session.
- Create an instance by clicking on "+ Add new instance" in the left pane, to create a VM.
- A new terminal should show up in the right pane. Here, we need to pull the Docker image from Github Container Registry (GHCR). To do so, execute:
docker pull ghcr.io/kasipavankumar/hadoop-docker:latest
- After the image has been pulled into the VM, we need to start a new container & switch into it's terminal (mostly bash). To do so, execute:
docker run -it ghcr.io/kasipavankumar/hadoop-docker:latest
At this stage, the image will be booting up by executing all the required steps to start Hadoop.
From now on, you will be inside container's bash (terminal) and can start using Hadoop's filesystem commands. 🚀
The final Docker image weighs around 1.8GB, wherein Hadoop & Java take up the majority piece. When analyzed using Dive, the efficiency came out to be around 99% (sweet).
D. Kasi Pavan Kumar (c) 2021