Spark实践笔记2:Hadoop安装

  1. Hadoop下载:我选择了hadoop2.7.0

  2. 将hadoop-2.7.0.tar 上传到ubuntu1里去,并将hadoop解压到/usr/local/hadoop

1
2
3
4

mkdir /usr/local/hadoop

tar xvf hadoop-2.7.0.tar -C /usr/local/hadoop/
  1. 修改 ~/.bashrc 文件,增加环境变量:
1
2
3
4

export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.0

export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH

导入配置使其生效:source ~/.bashrc

  1. 创建必要的目录
1
2
3
4
5
6
7
8

mkdir ${HADOOP_HOME}/dfs

mkdir ${HADOOP_HOME}/tmp

mkdir ${HADOOP_HOME}/dfs/data

mkdir ${HADOOP_HOME}/dfs/name
  1. 修改配置文件,配置文件目录:${HADOOP_HOME}/etc/hadoop
  • 检查配置文件hadoop-env.sh , yarn-env.sh , mapred-env.sh里的JAVA_HOME配置,未配置或者配置不正确则修改成正确的。此处JAVA_HOME都应配绝对路径

  • 修改配置文件slaves,将其修改为正确的从节点名

1
2
3
4
5
6
7
8
9
10

root@ubuntu1:/usr/local/hadoop/hadoop-2.7.0/etc/hadoop# vi slaves

root@ubuntu1:/usr/local/hadoop/hadoop-2.7.0/etc/hadoop# more slaves

ubuntu2

ubuntu3

root@ubuntu1:/usr/local/hadoop/hadoop-2.7.0/etc/hadoop#
  • 修改core-site.xml,增加配置:以下是最小化配置,所有配置项含义参考官方文档
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://ubuntu1:9000/</value>

<description>The name of the default file system</description>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/usr/local/hadoop/hadoop-2.7.0/tmp</value>

<description>A base for other temporary directories</description>

</property>

</configuration>
  • 修改hdfs-site.xml,增加配置:以下是最小化配置,所有配置项含义参考官方文档
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

<configuration>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>/usr/local/hadoop/hadoop-2.7.0/dfs/name</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/usr/local/hadoop/hadoop-2.7.0/dfs/data</value>

</property>

</configuration>

d-default.xml)

cp mapred-site.xml.template mapred-site.xml

1
2
3
4
5
6
7
8
9
10
11
12

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>
  • 修改yarn-site.xml,增加配置:以下是最小化配置,所有配置项含义参考官方文档
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

<configuration>

<!-- Site specific YARN configuration properties -->

<property>

<name>yarn.resourcemanager.hostname</name>

<value>ubuntu1</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>
  1. 发布到另外两台机器:
1
2
3
4

scp -r /usr/local/hadoop ubuntu2:/usr/local

scp -r /usr/local/hadoop ubuntu3:/usr/local

同样修改ubuntu2,ubuntu3的环境变量,增加HADOOP_HOME及PATH,参看 二.3

  1. 启动hadoop
  • 格式化namenode,在ubuntu1上执行

root@ubuntu1:/usr/local/hadoop/hadoop-2.7.0/bin# hadoop namenode -format

  • 在${HADOOP_HOME}/sbin执行./start-dfs.sh ,执行后jps一下,在ubuntu1上发现有namenode,datanode,在ubuntu2,ubuntu3上又datanode进程出现。

在执行./start-dfs.sh时发现有告警:

WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

解决办法是手工编译hadoop并用编译生成的native-hadoop library替换自带的native-hadoop library。

通过 http://10.211.55.22:8042/ http://10.211.55.23:8042/ 查看对应主机上的nodemanager运行状况

  • 执行 ./mr-jobhistory-daemon.sh start historyserver,启动历史任务服务,启动后通过 http://10.211.55.21:19888 任务历史信息
  1. 通过wordcount验证hadoop
  • 在HDFS里创建输入输出目录:
1
2
3
4

hadoop fs -mkdir -p /data/wordcount

hadoop fs -mkdir -p /output/
  • 将HADOOP的配置文件当做是wordcount的对象放到HDFS里,

hadoop fs -put ${HADOOP_HOME}/etc/hadoop/*.xml /data/wordcount/

  • 执行自带的wordcount

hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar wordcount /data/wordcount /output/wordcount

  • 执行成功,查看执行结果

hadoop fs -cat /output/wordcount/part-r-00000

至此,HADOOP安装完毕!