-
Notifications
You must be signed in to change notification settings - Fork 357
Pseudo Distributed Hadoop Setup
You need an operating system that can run Hadoop. It should run well on most Linux distributions, we have tried Ubuntu and Fedora. Mac OS X 10.6 (Snow Leopard) or later should work as well.
If you are using Windows, try installing Hadoop in a Virtual Machine running Linux.
NOTE: When you use Mac OS X, make sure to replace any /home
directory references with /Users
Make sure you can ssh to your local system
Create ssh key - no need to do this if you already have one
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Add the ssh key to authorized keys so you can log in without a password
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
chmod 622 ~/.ssh/authorized_keys
Try connecting to local host with ssh (should not be prompted for password)
ssh localhost
Navigate to http://hadoop.apache.org/releases.html
Click Download
link, then click Download release now!
link
Pick a download mirror
Click hadoop-1.0.4/
directory link
Download "hadoop-1.0.4-bin.tar.gz"
I've unpacked it in a directory named '~/Hadoop'
mkdir ~/Hadoop
cd ~/Hadoop
tar xvzf ~/Downloads/hadoop-1.0.4-bin.tar.gz
Create a hadoop-1.0.4-env file with the following content (modify the JAVA_HOME and HADOOP_PREFIX to match your system):
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.19.x86_64
export HADOOP_PREFIX="${HOME}/Hadoop/hadoop-1.0.4"
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_CONF_DIR=$HADOOP_PREFIX/conf
export HADOOP_LIBEXEC_DIR=$HADOOP_PREFIX/libexec
export PATH=$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin:$PATH
core-site.xml
should have this content
(you can modify the value for 'hadoop.tmp.dir' directory to your liking, remember to replace '/home' with '/Users' for Mac OS X):
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/${user.name}/Hadoop/hadoop-1.0.4-store</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
</configuration>
hdfs-site.xml
should have this content:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.support.broken.append</name>
<value>true</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
mapred-site.xml
should have this content:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
You need to add the JAVA_HOME setting that we also set in the previous step above (again adjust this to match your system):
...
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.19.x86_64
...
Start by sourcing the environment settings
source hadoop-1.0.4-env
Format the hadoop file system
You only do this step once for a new cluster!
hadoop namenode -format
Start Hadoop namenode, datanode and secondary-namenode
start-dfs.sh
Check that you have the dfs daemons running
jps
You should see something like:
[trisberg@localhost ~]$ jps
27932 SecondaryNameNode
27827 DataNode
26384 NameNode
27988 Jps
Start Hadoop job-tracker and task-tracker
start-mapred.sh
Check that you have the dfs and mapred daemons running
jps
You should see something like:
[trisberg@localhost ~]$ jps
28170 TaskTracker
27932 SecondaryNameNode
28053 JobTracker
27827 DataNode
26384 NameNode
28259 Jps
If one of the processes is missing you need to check the logs for clues to what could be wrong. Log files are available in the hadoop-1.0.4/logs
directory.
Once the cluster is up and running you can access the web interfaces on these adresses:
- NameNode: http://localhost:50070/dfshealth.jsp
- JobTracker: http://localhost:50030/jobtracker.jsp
This would be a good time to run the tests for the spring-hadoop project.
- source is available here: https://github.com/SpringSource/spring-hadoop
- run tests using the command:
./gradlew -Phd.fs=hdfs://localhost:8020 -Phd.jt=localhost:8021 clean build
When you are done testing you can use these commands to shut the cluster down:
stop-mapred.sh
stop-dfs.sh