SomethingTechnical: September 2013

1. Install JAVA on the machine

a. Copy the jdk1.7.0 folder to /usr/lib/jvm/ [download java from sun.java.com and extract it to get the jdk1.7.0 folder, save it to a USB device for copying to all machines]

If the jvm folder is not in /usr/lib/ you will have to create a new folder.

b. Link java/javac/jar to alternatives

$> ln –s /usr/lib/jvm/jdk1.7.0/bin/java java /etc/alternatives/java [do this for javac and jar also]

$> ln –s /usr/lib/jvm/jdk1.7.0/bin/javac javac /etc/alternatives/javac

$> ln –s /usr/lib/jvm/jdk1.7.0/bin/jar jar /etc/alternatives/jar

If ln does not work, (maybe because an older version of java is already linked to alternatives]

$> update-alternatives --install /usr/bin/java java /usr/lib/jvm/jdk1.7.0/bin/java 1

$> update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/jdk1.7.0/bin/javac 1

$> update-alternatives --install /usr/bin/jar jar /usr/lib/jvm/jdk1.7.0/bin/jar 1

c. Check with

$> java –version. You should see jdk1.7.0 instead of open-java The command should output something comparable to the following on every node of your:

java version "1.7.0"

Java(TM) SE Runtime Environment (build 1.7.0_22-b04)

Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)

2. Install open-ssh server/ open-ssh client on the machine

a. $> sudo apt-get install open-ssh-server

b. $> sudo apt-get install open-ssh-client

3. Add a dedicated hadoop user on the machine

a. $> sudo addgroup hadoop

b. $> sudo adduser --ingroup hadoop hduser

4. Configure ssh

a. $> su –hduser

b. $> ssh-keygen –t rsa –P “” [Press enter to select the default file authorized_keys]

c. $> cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

d. $> ssh localhost [To test whether open—ssh server has been configured correctly]

e. $> exit

5. Install Hadoop on the machine

a. download hadoop from hadoop.apache.org to get the hadoop folder, make the changes as per step 6 and then save it to a USB device for copying to all machines

b. $> sudo mv hadoop-1.0.3 hadoop

c. $> cp hadoop /home/hduser/.

d. $> cd /home/hduser

e. $> chown –R hduser:hadoop hadoop

6. Configure the hadoop environment [To be done only once after copying hadoop, then you copy the folder with the changes to all machines]

a. Open the file hadoop/conf/hadoop-env.sh [$> gedit hadoop/conf/hadoop-env.sh]

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0

b. Open the file hadoop/conf/mapred-site.xml [$> gedit hadoop/conf/mapred-site.xml] Replace master with IP of master node

mapred.job.tracker

master:54311

c. Open the file hadoop/conf/core-site.xml [$> gedit hadoop/conf/core-site.xml] Replace master with IP of master node

fs.default.name

hdfs://master:54310/

d. Open the file hadoop/conf/hdfs-site.xml [$> gedit hadoop/conf/hdfs-site.xml] Replace master with IP of master node

dfs.data.dir

home/hduser/hadoop/dfsdata

e. Open the file hadoop/conf/master - Type the IP of master

f. Open the file hadoop/conf/slaves - Type the IP of slaves on one line each

7. Update $HOME/.bashrc

a. $> sudo gedit $HOME/.bashrc

b. Make the changes to the .bashrc file

export JAVA_HOME=/usr/lib/jvm/jdk1.7.0

export HADOOP_HOME=/home/hduser/hadoop

export PATH=$PATH:$JAVA_HOME/bon:$HADOOP_HOME/bin

export HADOOP_CLASSPATH=$HADOOP_HOME

c. To run the bash

$> . ~/.bashrc

8. Create the folders needed by hadoop

a. $> mkdir –p /home/hduser/dfsdata

9. Edit /etc/hosts on all nodes

10. Only on master:

a. $> ssh-copy-id –i $HOME/.ssh/id_rsa.pub hduser@slave01

b. $> ssh slave01 [to verify whether master is able to talk to slave01 without a password]

SomethingTechnical

Monday, September 30, 2013

Setting Up a Hadoop Cluster

About Me

Blog Archive

My Shelfari Bookshelf