Installing hadoop 2.4.0 on Ubuntu server

First of all, THESE ARE IMPORTANT FOR INDRODUCTION

*Ubuntu server 14.04
*VM in VirtualBox with 1GB RAM
*SWAP SPACE :   SELECT  more than 2 GB

Up-to-date system.

Moreover, it is advisable not to run Hadoop services through a general-purpose user, so the next step consists in adding a group hadoop and a user hadoop-user belonging to that group

Installing Java
The mentioned tutorials suggest a potentially unsafe procedure in order to install the jdk through apt-get,

Finally, a couple of environment variables should be set up so that the java executables are in $PATH and hadoop knows where java has been installed: this is easily accomplished adding

at the end of /etc/profile  <<< We need to edit that to set automatically JAVA PATH 

Setup SSH

All communications with Hadoop are encrypted via SSH, thus the corresponding server should be installed:

 

and the hadoop-user must be associated to a key pair and subsequently granting its access to the local machine:

 

Now hadoop-user should be able to access via ssh to localhost without ( WE SET THE PASS = “null”)providing a password:

 

Disable IPV6

Hadoop and IPV6 do not agree on the meaning of 0.0.0.0 address,

LOOK AT : ??>>>>> /etc/sysctl.conf

REBOOT SYSTEM

CONTROL

Hadoop

Download and install Hadoop

Download hadoop-2.4.0.tar.gz, unpack it and move the results in /usr/local, adding a symlink using the more friendly name hadoop and changing ownership of the directory contents to the hadoop-user user:

Setup the dedicated user environment
Switch to the hadoop-user user and add the following lines at the end of ~/.bashrc:

To have the new environment variables in place,

1- reload .bashrc   through “source .bashrc"

2- then open /usr/local/hadoop/etc/hadoop/hadoop-env.sh ,

3- uncomment the line setting JAVA_HOME and set its value to the jdk directory:

Configure Hadoop

Before being able to actually use the hadoop file system it is necessary to modify some configuration files inside /usr/local/hadoop/etc/hadoop
All such files follow the an XML format, and the updates should concern the top-level node configuration
(likely empty after the hadoop installation). Specifically:

  • in yarn-site.xml :
  • look in core-site.xml:
  • look in mapred-site.xml
  • nano hdfs-site.xml:

Run these commands.

Formatting the distributed file system

USER should be  hadoop user 

Find the  these 2 file  and  run 

start-dfs.sh

start-yarn.sh 

 

Hadoop security File and directory permissions

Permissions for both HDFS and local fileSystem paths

The following table lists various paths on HDFS and local filesystems (on all nodes) and recommended permissions:

Filesystem Path User:Group Permissions
local dfs.namenode.name.dir hdfs:hadoop drwx------
local dfs.datanode.data.dir hdfs:hadoop drwx------
local $HADOOP_LOG_DIR hdfs:hadoop drwxrwxr-x
local $YARN_LOG_DIR yarn:hadoop drwxrwxr-x
local yarn.nodemanager.local-dirs yarn:hadoop drwxr-xr-x
local yarn.nodemanager.log-dirs yarn:hadoop drwxr-xr-x
local container-executor root:hadoop --Sr-s--*
local conf/container-executor.cfg root:hadoop r-------*
hdfs / hdfs:hadoop drwxr-xr-x
hdfs /tmp hdfs:hadoop drwxrwxrwxt
hdfs /user hdfs:hadoop drwxr-xr-x
hdfs yarn.nodemanager.remote-app-log-dir yarn:hadoop drwxrwxrwxt
hdfs mapreduce.jobhistory.intermediate-done-dir mapred:hadoop drwxrwxrwxt
hdfs mapreduce.jobhistory.done-dir mapred:hadoop drwxr-x---

For More please Visit that page :
http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-common/SecureMode.html#Proxy_user