1.
Install
JAVA on the machine
a.
Copy the jdk1.7.0 folder to /usr/lib/jvm/ [download java from sun.java.com and extract it
to get the jdk1.7.0 folder, save it to a USB device for copying to all machines]
If the jvm
folder is not in /usr/lib/ you will have to create a new folder.
b.
Link java/javac/jar to alternatives
$> ln –s /usr/lib/jvm/jdk1.7.0/bin/java java
/etc/alternatives/java [do
this for javac and jar also]
$> ln –s /usr/lib/jvm/jdk1.7.0/bin/javac javac
/etc/alternatives/javac
$> ln –s /usr/lib/jvm/jdk1.7.0/bin/jar jar
/etc/alternatives/jar
If ln does not work, (maybe because an older version
of java is already linked to alternatives]
$> update-alternatives --install /usr/bin/java java
/usr/lib/jvm/jdk1.7.0/bin/java 1
$> update-alternatives --install /usr/bin/javac javac
/usr/lib/jvm/jdk1.7.0/bin/javac 1
$> update-alternatives --install /usr/bin/jar jar /usr/lib/jvm/jdk1.7.0/bin/jar 1
c.
Check with
$>
java –version. You should see
jdk1.7.0 instead of open-java The command should output something comparable to the following on
every node of your:
java version "1.7.0"
Java(TM) SE Runtime Environment
(build 1.7.0_22-b04)
Java HotSpot(TM) 64-Bit Server
VM (build 17.1-b03, mixed mode)
2.
Install
open-ssh server/ open-ssh client on the machine
a. $>
sudo apt-get install open-ssh-server
b. $>
sudo apt-get install open-ssh-client
3.
Add a
dedicated hadoop user on the machine
a. $>
sudo addgroup hadoop
b. $>
sudo
adduser --ingroup hadoop hduser
4.
Configure
ssh
a.
$> su –hduser
b.
$> ssh-keygen –t rsa –P “” [Press enter to select the default file
authorized_keys]
c.
$> cat $HOME/.ssh/id_rsa.pub >>
$HOME/.ssh/authorized_keys
d. $>
ssh localhost [To
test whether open—ssh server has been configured correctly]
e.
$> exit
5.
Install Hadoop on the machine
a.
download
hadoop from hadoop.apache.org to get
the hadoop folder, make the changes as per step 6 and then save it to a USB
device for copying to all machines
b.
$>
sudo mv hadoop-1.0.3 hadoop
c.
$>
cp hadoop /home/hduser/.
d.
$>
cd /home/hduser
e.
$>
chown –R hduser:hadoop hadoop
6.
Configure the hadoop environment [To be done
only once after copying hadoop, then you copy the folder with the changes to all
machines]
a.
Open the file hadoop/conf/hadoop-env.sh [$> gedit hadoop/conf/hadoop-env.sh]
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0
b.
Open the file hadoop/conf/mapred-site.xml [$> gedit hadoop/conf/mapred-site.xml] Replace master with IP of master node
c.
Open the file hadoop/conf/core-site.xml [$> gedit hadoop/conf/core-site.xml] Replace
master with IP of master node
d.
Open the file hadoop/conf/hdfs-site.xml [$> gedit hadoop/conf/hdfs-site.xml] Replace
master with IP of master node
e. Open the
file hadoop/conf/master - Type the IP
of master
f.
Open the file hadoop/conf/slaves - Type the IP of slaves on one line each
7.
Update
$HOME/.bashrc
a.
$> sudo gedit $HOME/.bashrc
b.
Make the changes to the .bashrc file
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0
export HADOOP_HOME=/home/hduser/hadoop
export PATH=$PATH:$JAVA_HOME/bon:$HADOOP_HOME/bin
export HADOOP_CLASSPATH=$HADOOP_HOME
c.
To run the bash
$> . ~/.bashrc
8.
Create the folders needed by hadoop
a.
$> mkdir –p /home/hduser/dfsdata
9.
Edit /etc/hosts on all nodes
10.
Only on master:
a. $> ssh-copy-id –i $HOME/.ssh/id_rsa.pub hduser@slave01
b. $> ssh slave01 [to verify whether master is able to talk to
slave01 without a password]