Apache Hadoop using Docker 🐳

A Docker image to play around with Apache Hadoop in Pseudo Distributed Mode (single cluster mode).

Below are the steps to play around with this image using Play with Docker.

First of all, create an account on Docker Hub.
Login to Play with Docker using the Docker Hub account you just created.
You should see a green "Start" button, click on it to start a session.
Create an instance by clicking on "+ Add new instance" in the left pane, to create a VM.
A new terminal should show up in the right pane. Here, we need to pull the Docker image from Github Container Registry (GHCR). To do so, execute:

docker pull ghcr.io/kasipavankumar/hadoop-docker:latest

After the image has been pulled into the VM, we need to start a new container & switch into it's terminal (mostly bash). To do so, execute:

docker run -it ghcr.io/kasipavankumar/hadoop-docker:latest

At this stage, the image will be booting up by executing all the required steps to start Hadoop.

From now on, you will be inside container's bash (terminal) and can start using Hadoop's filesystem commands. 🚀

A note size of the image

The final Docker image weighs around 1.8GB, wherein Hadoop & Java take up the majority piece. When analyzed using Dive, the efficiency came out to be around 99% (sweet).

D. Kasi Pavan Kumar (c) 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly