Hadoop3.3.5完全分布式搭建

发布时间 2023-07-10 16:51:12作者: 突破铁皮

首先在之前的伪分布基础上克隆两台机器

这样一共三台虚拟机

为这三台虚拟机设置三个不同的静态ip地址和主机名

我的是

billsaifu 192.168.15.130

hadoop1 192.168.15.131

hadoop2 192.168.15.132

静态IP设置

#先进入root

vim /etc/sysconfig/network-scripts/ifcfg-ens33

#修改为如下配置

TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static

#下面需要自己配置对应的IP地址
IPADDR=192.168.15.130
NETMASK=255.255.255.0
GATEWAY=192.168.15.2
DNS1=192.168.15.2

#-----------------------
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=73e49046-f307-4320-9b82-fc9108ff2f23
DEVICE=ens33
ONBOOT=yes
IPV6_PRIVACY=no
PREFIX=24
DEFROUTE=yes

对应下hadoop的各项设置

core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://billsaifu:9000</value>
</property>
<property>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>bill</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-3.3.5/tmp</value>
</property>
<property>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
<dedication> Datanode 有一个同时处理文件的上限,至少要有4096</dedication>
</property>
</configuration>

hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.replication.min</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/mnt/data01/hadoop</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/mnt/data01/hdfs_dndata</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>freedom</value>
</property>
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.datanode.use.datanode.hostname</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.clusterID</name>
<value>hadoopMaster</value>
</property>
<!-- nn web 端访问地址-->
<property>
<name>dfs.namenode.http-address</name>
<value>billsaifu:9870</value>
</property>
<!-- 2nn web 端访问地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop2:9868</value>
</property>
</configuration>

yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>
JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME
</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/mnt/data01/yarn_nmdata</value>
</property>
<!-- 开启日志聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://billsaifu:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为 7 天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
<!-- 指定 ResourceManager 的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1</value>
</property>
</configuration>

mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx2560M</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hdfs://billsaifu:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>http://billsaifu:19888</value>
</property>
</configuration>

这里我以billsaifu为master主机,hadoop1为资源管理,hadoop2为2nn

并且创建了一个新组为hadoop的超级组,组名为freedom

为freedom创建一个新角色bill,并且把bill加入该组

为给组赋予hadoop文件夹权限

sudo groupadd freedom

sudo useradd -m bill

sudo passwd bill

sudo usermod -aG freedom bill

mkdir -p /home/bill/bin

#改变data01所有组为freedom:bill

sudo chown -R freedom:bill /mnt/data01

sudo chmod -R 755  /mnt/data01

另外还需要配置相应的主机名

vi /etc/hostname #分别设置为billsaifu,hadoop1,hadoop2

#设置主机映射

vi /etc/hosts


127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.15.130  billsaifu
192.168.15.131  hadoop1
192.168.15.132  hadoop2

以下是一些常用脚本

xsync分发脚本

#!/bin/bash
#1. 判断参数个数
if [ $# -lt 1 ]
then
 echo Not Enough Arguement!
 exit;
fi
#2. 遍历集群所有机器
for host in billsaifu hadoop1 hadoop2
do
 echo ==================== $host ====================
 #3. 遍历所有目录,挨个发送
 for file in $@
 do
 #4. 判断文件是否存在
 if [ -e $file ]
 then
 #5. 获取父目录
 pdir=$(cd -P $(dirname $file); pwd)
 #6. 获取当前文件的名称
 fname=$(basename $file)
 ssh $host "mkdir -p $pdir"
 rsync -av $pdir/$fname $host:$pdir
 else
 echo $file does not exists!
 fi
 done
done

myhadoop启动脚本

if [ $# -lt 1 ]; then
  echo "No Args Input..."
  exit
fi
case $1 in
"start")
  echo " =================== 启动 hadoop 集群 ==================="
  echo " --------------- 启动 hdfs ---------------"
  ssh billsaifu "/usr/local/hadoop-3.3.5/sbin/start-dfs.sh"
  echo " --------------- 启动 yarn ---------------"
  ssh hadoop1 "/usr/local/hadoop-3.3.5/sbin/start-yarn.sh"
  echo " --------------- 启动 historyserver ---------------"
  ssh billsaifu "/usr/local/hadoop-3.3.5/bin/mapred --daemon start historyserver"
  ;;
"stop")
  echo " =================== 关闭 hadoop 集群 ==================="
  echo " --------------- 关闭 historyserver ---------------"
  ssh billsaifu "/usr/local/hadoop-3.3.5/bin/mapred --daemon stop historyserver"
  echo " --------------- 关闭 yarn ---------------"
  ssh hadoop1 "/usr/local/hadoop-3.3.5/sbin/stop-yarn.sh"
  echo " --------------- 关闭 hdfs ---------------"
  ssh billsaifu "/usr/local/hadoop-3.3.5/sbin/stop-dfs.sh"
  ;;
*)
  echo "Input Args Error..."
  ;;
esac

jpsall检查脚本

#!/bin/bash

declare -a hosts=("billsaifu" "hadoop1" "hadoop2")

for host in "${hosts[@]}"
do
  echo "=============== $host ==============="
  ssh -t  "$host" /usr/local/java8/jdk1.8.0_371/bin/jps
done

现在配置完毕启动一下,注意对应freedom组下面的角色bill,对于这些脚本和hadoop文件需要有对应的权限,不如hadoop无法启动,建议把对应文件的组都修改为freedom:bill

可以看到有三个存活的结点