Installing hadoop 2.4.0 on Ubuntu server
Installing hadoop 2.4.0
First of all, THESE ARE IMPORTANT FOR INDRODUCTION
*Ubuntu server 14.04
*VM in VirtualBox with 1GB RAM
*SWAP SPACE : SELECT more than 2 GB
Up-to-date system.
[crayon-669e1d93cdec3561914008/]
Moreover, it is advisable not to run Hadoop services through a general-purpose user, so the next step consists in adding a group hadoop
and a user hadoop-user
belonging to that group
[crayon-669e1d93cdeca298352458/]
Installing Java
The mentioned tutorials suggest a potentially unsafe procedure in order to install the jdk through apt-get
,
[crayon-669e1d93cdecc430629671/]
Finally, a couple of environment variables should be set up so that the java executables are in $PATH
and hadoop knows where java has been installed: this is easily accomplished adding
[crayon-669e1d93cdecf642948686/]
at the end of /etc/profile <<< We need to edit that to set automatically JAVA PATH
[crayon-669e1d93cded1822160755/]
Setup SSH
All communications with Hadoop are encrypted via SSH, thus the corresponding server should be installed:
[crayon-669e1d93cded3588356474/]
and the hadoop-user
must be associated to a key pair and subsequently granting its access to the local machine:
[crayon-669e1d93cded5060250181/]
Now hadoop-user
should be able to access via ssh to localhost
without ( WE SET THE PASS = “null”)providing a password:
[crayon-669e1d93cded7965021387/]
Disable IPV6
Hadoop and IPV6 do not agree on the meaning of 0.0.0.0
address,
LOOK AT : ??>>>>> /etc/sysctl.conf
[crayon-669e1d93cded9576624144/]
REBOOT SYSTEM
CONTROL
[crayon-669e1d93cdedb946969571/]
Hadoop
Download and install Hadoop
Download hadoop-2.4.0.tar.gz, unpack it and move the results in /usr/local
, adding a symlink using the more friendly name hadoop
and changing ownership of the directory contents to the hadoop-user
user:
[crayon-669e1d93cdedd943038810/]
Setup the dedicated user environment
Switch to the hadoop-user
user and add the following lines at the end of ~/.bashrc
:
[crayon-669e1d93cdedf134570468/]
To have the new environment variables in place,
1- reload .bashrc
through “source .bashrc"
2- then open /usr/local/hadoop/etc/hadoop/hadoop-env.sh
,
3- uncomment the line setting JAVA_HOME
and set its value to the jdk directory:
[crayon-669e1d93cdee2481880721/]
Configure Hadoop
Before being able to actually use the hadoop file system it is necessary to modify some configuration files inside /usr/local/hadoop/etc/hadoop
All such files follow the an XML format, and the updates should concern the top-level node configuration
(likely empty after the hadoop installation). Specifically:
- in
yarn-site.xml
:
[crayon-669e1d93cdee4926871250/]
- look in
core-site.xml
:
[crayon-669e1d93cdee6487571519/]
- look in
mapred-site.xml
[crayon-669e1d93cdee8145099540/]
- nano
hdfs-site.xml
:
[crayon-669e1d93cdeea821159560/]
Run these commands.
[crayon-669e1d93cdeec675939163/]
Formatting the distributed file system
USER should be hadoop user
[crayon-669e1d93cdeee567295102/]
Find the these 2 file and run
start-dfs.sh
start-yarn.sh
[crayon-669e1d93cdef0158225198/]
[crayon-669e1d93cdef3166745375/]