Installing hadoop 2.4.0
First of all, THESE ARE IMPORTANT FOR INDRODUCTION
*Ubuntu server 14.04
*VM in VirtualBox with 1GB RAM
*SWAP SPACE : SELECT more than 2 GB
Up-to-date system.
1 2 |
kutayzorlu@coder_telekom:~$ sudo apt-get update kutayzorlu@coder_telekom:~$ sudo apt-get upgrade |
Moreover, it is advisable not to run Hadoop services through a general-purpose user, so the next step consists in adding a group hadoop
and a user hadoop-user
belonging to that group
1 2 |
kutayzorlu@coder_telekom:~$ sudo addgroup hadoop kutayzorlu@coder_telekom:~$ sudo adduser --ingroup hadoop hadoop-user |
Installing Java
The mentioned tutorials suggest a potentially unsafe procedure in order to install the jdk through apt-get
,
1 2 3 4 5 6 7 |
kutayzorlu@coder_telekom:~$ wget "http://server12.kutayzorlu.com/7u45-b18/jdk-7u45-linux-x64.tar.gz" ... kutayzorlu@coder_telekom:~$ tar -xvzf jdk-7-linux-x64.tar.gz kutayzorlu@coder_telekom:~$ sudo mkdir /usr/local/java kutayzorlu@coder_telekom:~$ sudo cp -r jdk1.7.0_45 /usr/local/java kutayzorlu@coder_telekom:~$ sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/local/java/jdk1.7.0_45/bin/javac" 1 kutayzorlu@coder_telekom:~$ sudo update-alternatives --set javac /usr/local/java/jdk1.7.0_45/bin/javac |
Finally, a couple of environment variables should be set up so that the java executables are in $PATH
and hadoop knows where java has been installed: this is easily accomplished adding
1 2 3 4 |
JAVA_HOME=/usr/local/java/jdk1.7.0_45 PATH=$PATH:$HOME/bin:$JAVA_HOME/bin export JAVA_HOME export PATH |
at the end of /etc/profile <<< We need to edit that to set automatically JAVA PATH
1 2 3 |
kutayzorlu@coder_telekom:~$ . /etc/profile kutayzorlu@coder_telekom:~$ javac -version javac 1.7.0_45 |
Setup SSH
All communications with Hadoop are encrypted via SSH, thus the corresponding server should be installed:
1 |
kutayzorlu@coder_telekom:~$ sudo apt-get install openssh-server |
and the hadoop-user
must be associated to a key pair and subsequently granting its access to the local machine:
1 2 3 4 5 |
kutayzorlu@coder_telekom:~$ su - hadoop-user hadoop-user@coder_telekom:~$ ssh-keygen -t rsa -P "" #<<<<<<<<<<<< Generating public/private rsa key pair. The key's randomart image is: hadoop-user@coder_telekom:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys |
Now hadoop-user
should be able to access via ssh to localhost
without ( WE SET THE PASS = “null”)providing a password:
1 2 3 4 |
hadoop-user@coder_telekom:~$ ssh localhost The authenticity of host 'localhost (::1)' can't be established. Last login: $ |
Disable IPV6
Hadoop and IPV6 do not agree on the meaning of 0.0.0.0
address,
LOOK AT : ??>>>>> /etc/sysctl.conf
1 2 3 |
net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 |
REBOOT SYSTEM
CONTROL
1 2 |
cat /proc/sys/net/ipv6/conf/all/disable_ipv6 # should be 1, #1 meaning that IPV6 is actually disabled. |
Hadoop
Download and install Hadoop
Download hadoop-2.4.0.tar.gz, unpack it and move the results in /usr/local
, adding a symlink using the more friendly name hadoop
and changing ownership of the directory contents to the hadoop-user
user:
1 2 3 4 5 6 7 |
kutayzorlu@coder_telekom:~$ wget wget http://apache.mirrors.pair.com/hadoop/common/hadoop-2.4.0/hadoop-2.4.0.tar.gz ... kutayzorlu@coder_telekom:~$ tar -xzvf hadoop-2.4.0.tar.gz kutayzorlu@coder_telekom:~$ sudo mv hadoop-2.4.0 /usr/local kutayzorlu@coder_telekom:~$ cd /usr/local kutayzorlu@coder_telekom:/usr/local$ sudo ln -s hadoop-2.4.0 hadoop kutayzorlu@coder_telekom:/usr/local$ sudo chown -R hadoop-user:hadoop hadoop-2.4.0 |
Setup the dedicated user environment
Switch to the hadoop-user
user and add the following lines at the end of ~/.bashrc
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
# Set Hadoop-related environment variables export HADOOP_PREFIX=/usr/local/hadoop export HADOOP_HOME=/usr/local/hadoop export HADOOP_MAPRED_HOME=${HADOOP_HOME} export HADOOP_COMMON_HOME=${HADOOP_HOME} export HADOOP_HDFS_HOME=${HADOOP_HOME} export YARN_HOME=${HADOOP_HOME} export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop # Native Path export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_PREFIX}/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib" #Java path export JAVA_HOME='/usr/local/java/jdk1.7.0_45' # Add Hadoop bin/ directory to PATH export PATH=$PATH:$HADOOP_HOME/bin:$JAVA_PATH/bin:$HADOOP_HOME/sbin |
To have the new environment variables in place,
1- reload .bashrc
through “source .bashrc"
2- then open /usr/local/hadoop/etc/hadoop/hadoop-env.sh
,
3- uncomment the line setting JAVA_HOME
and set its value to the jdk directory:
1 |
export JAVA_HOME=/usr/local/java/jdk1.7.0_45 |
Configure Hadoop
Before being able to actually use the hadoop file system it is necessary to modify some configuration files inside /usr/local/hadoop/etc/hadoop
All such files follow the an XML format, and the updates should concern the top-level node configuration
(likely empty after the hadoop installation). Specifically:
- in
yarn-site.xml
:
123456yarn.nodemanager.aux-servicesmapreduce_shuffleyarn.nodemanager.aux-services.mapreduce.shuffle.classorg.apache.hadoop.mapred.ShuffleHandler
- look in
core-site.xml
:
12fs.default.namehdfs://localhost:9000
- look in
mapred-site.xml
12mapreduce.framework.nameyarn
- nano
hdfs-site.xml
:
12345678910dfs.replication1dfs.namenode.name.dirfile:/usr/local/hadoop/yarn_data/hdfs/namenodedfs.datanode.data.dirfile:/usr/local/hadoop/yarn_data/hdfs/datanode
Run these commands.
1 2 |
hadoop-user@coder_telekom:~$ mkdir -p /usr/local/hadoop/yarn_data/hdfs/namenode hadoop-user@coder_telekom:~$ mkdir -p /usr/local/hadoop/yarn_data/hdfs/datanode |
Formatting the distributed file system
USER should be hadoop user
1 2 |
hadoop-user@coder_telekom:~$ hdfs namenode -format ... |
Find the these 2 file and run
start-dfs.sh
start-yarn.sh
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
hadoop-user@coder_telekom:~$ start-dfs.sh 8/08/10 13:18:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Starting namenodes on [localhost] localhost: starting namenode, logging to /usr/local/hadoop-2.4.0/logs/hadoop-hadoop-user-namenode-coder_telekom.out localhost: starting datanode, logging to /usr/local/hadoop-2.4.0/logs/hadoop-hadoop-user-datanode-coder_telekom.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.4.0/logs/hadoop-hadoop-user-secondarynamenode-coder_telekom.out 8/08/10 13:18:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable hadoop-user@coder_telekom:~$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-user-resourcemanager-coder_telekom.out localhost: starting nodemanager, logging to /usr/local/hadoop-2.4.0/logs/yarn-hadoop-user-nodemanager-coder_telekom.out .... hadoop-user@coder_telekom:~$ hdfs dfs -mkdir /user hadoop-user@coder_telekom:~$ hdfs dfs -mkdir /user/hadoop-user ... hadoop-user@coder_telekom:~$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar pi 10 1000 ... Job Finished in 22.136 seconds Estimated value of Pi is 3.142857.... |
1 2 3 4 |
... # For stop daemons $ stop-dfs.sh $ stop-yarn.sh. |