Hadoop and Yarn Setup

Set passwordless login

To create user

sudo adduser testuser
sudo adduser testuser sudo

For local host

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa 
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

For other hosts

ssh-copy-id -i ~/.ssh/id_rsa.pub user@host
ssh user@host

Pre-requisities:

JAVA Setup should be completed and JAVA_HOME should be set in the ~/.bashrc file (environment variable).
Make sure the nodes are set for password-less SSH both ways(master->slaves).
Since we use the environment variables a lot in our scripts, make sure to comment out the portion following this statement in your ~/.bashrc , If not running interactively, don't do anything. Update .bashrc

Delete/comment the following check.

 # If not running interactively, don't do anything
 case $- in
     *i*) ;;
       *) return;;
 esac

Same username/useraccount should be need on master and slaves nodes for multinode installation.
The following dependencies must be installed: make, ctags, gcc, flex, bison

Installations:

To automate hadoop installation follows the steps,

git clone https://github.com/kmadhugit/hadoop-cluster-utils.git

cd hadoop-cluster-utils

Configuration
1. To configure hadoop-cluster-utils, run ./autogen.sh which will create config with appropriate field values.
2. User can enter SLAVE hostnames (if more than one, use comma seperated) interactively while running ./autogen.sh file.
3. Default Spark-2.0.1 and Hadoop-2.7.1 version available for installation.
4. While running ./autogen.sh file, it will prompt for your choice for setting up Hive and Mysql on your master, select desired option.Hive and Mysql setup is required for running spark benchmarks like TPCDS,HiBench.
5. User can edit default port values, spark and hadoop versions in config
6. Before executing ./setup.sh file, user can verify or edit config
7. Once setup script completed,source ~/.bashrc file to export updated hadoop and spark environment variables for current login session.
Ensure that the following java process is running in master. If not, check the log files

 checkall.sh

Invoke checkall.sh ensure all services are started on the Master & slaves

NameNode
JobHistoryServer
ResourceManager

Ensure that the following java process is running in slaves. If not, check the hadoop log files

DataNode
NodeManager

HDFS, Resource Manager, Node Manager and Spark web Address

HDFS web address : http://localhost:50070
Resource Manager : http://localhost:8088/cluster
Node Manager     : http://datanode:8042/node (For each node)
Spark            : http://localhost:8080 (Default)

Useful scripts

 > stop-all.sh #stop HDFS and Yarn
 > start-all.sh #start HDFS and Yarn
 > CP <localpath to file> <remotepath to dir> #Copy file from name nodes to all slaves
 > AN <command> #execute a given command in all nodes including master
 > DN <command> #execute a given command in all nodes excluding master
 > checkall.sh #ensure all services are started on the Master & slaves

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hadoop and Yarn Setup

Set passwordless login

Pre-requisities:

Installations:

About

Releases

Packages

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
conf		conf
hadoop		hadoop
utils		utils
LICENSE		LICENSE
README.md		README.md
autogen.sh		autogen.sh
restart-all.sh		restart-all.sh
setup.sh		setup.sh

License

sparkonpower/hadoop-cluster-utils

Folders and files

Latest commit

History

Repository files navigation

Hadoop and Yarn Setup

Set passwordless login

Pre-requisities:

Installations:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages