HDFS集群搭建:完全分布式

发布时间 2023-07-05 16:08:07作者: 等不到的口琴

主要区别一就是各个角色在哪儿启动,完全分布式也就是各个角色分布在不通的节点上

1、基础环境:部署配置

NN:core-site.xml

DN: workers:node01

SNN:hdfs-site.xml dfs.namenode.secondary.http.address node01:50090

2、角色启动时细节配置:

dfs.namenode.name.dir

dfs.datanode.data.dir

3、初始化&启动

格式化 FsImage Version

start-dfs.xml

现在我这边一共四个节点

node01:192.168.121.111

node02:192.168.121.112

node03:192.168.121.113

node04:192.168.121.114

检查能否互相PING通

[root@node03 ~]# ping node04
PING node04 (192.168.182.114) 56(84) bytes of data.
64 bytes from node04 (192.168.182.114): icmp_seq=1 ttl=64 time=0.799 ms
64 bytes from node04 (192.168.182.114): icmp_seq=2 ttl=64 time=0.815 ms
^C

网络通了之后将本地一些文件分发给其他机器

scp ./jdk  node02:/root/

为三台机器配置profile等文件

SSH免密:

1、启动start-dfs.sh ===> 公钥分发

2、分发公钥

scp id_rsa.pub   node02:/root/.ssh/node01.pub
scp id_rsa.pub   node03:/root/.ssh/node01.pub
scp id_rsa.pub   node04:/root/.ssh/node01.pub
for in 

分别去node02、node03、node04追加秘钥进鉴权文件

cat /root/.ssh/node01.pub >> /root/.ssh/authorized_keys 

将node01中的hdfs-site.xml 进行配置修改

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/var/bigdata/hadoop/full/dfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/var/bigdata/hadoop/full/dfs/data</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>node02:50090</value>
    </property>
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>/var/bigdata/hadoop/full/dfs/secondary</value>
    </property>
</configuration>

将修改后的hadoop有关文件分发给其余节点

  scp -r hadoop-3.1.3/  node02:`pwd`
  scp -r hadoop-3.1.3/  node03:`pwd`
  scp -r hadoop-3.1.3/  node04:`pwd`

先看看相关目录,理论上没有我们新建的full文件夹

[root@localhost hadoop]# ll
total 0
drwxr-xr-x. 3 root root 17 Feb 28 05:37 local

格式化文件

hdfs namenode -format

启动hdfs,可以发现分别在不同的节点启动不同的角色,同时在不同的节点上创建了/var/bigdata/hadoop/full/dfs/data/current文件夹

[root@localhost hadoop]# start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [node01]
Last login: Sun Mar  5 03:15:48 PST 2023 from 192.168.182.217 on pts/4
Starting datanodes
Last login: Sun Mar  5 04:23:48 PST 2023 on pts/1
Starting secondary namenodes [node02]
Last login: Sun Mar  5 04:23:50 PST 2023 on pts/1

同时可以去各个节点上看看启动的进程:

[root@localhost current]# jps
12113 NameNode
12945 Jps
[root@node02 var]# jps
3600 SecondaryNameNode
3513 DataNode
3771 Jps
[root@node03 current]# jps
3544 DataNode
3721 Jps
[root@node04 hadoop]# jps
3923 Jps
3765 DataNode

前端界面

架构存在的问题

主从集群:结构相对简单,主与从协作

主:单点,数据一致性好掌握

问题:

  • 单点故障导致集群整体不可用

    如NameNode宕机、网络连接失败、以及内存溢出等情况都可能导致单点故障。而所有HDFS的读写流程第一步都要先去访问NameNode上的元数据,NameNode一挂,HDFS就挂了
    
  • 压力过大,内存受限

    首先说一下什么是内存受限,NameNode的元数据是存储在内存中的,那么内存的大小就决定了NameNode所能存储的元数据的量的大小,假设我们有一个很大的集群,但是NameNode内存只有一小块,当NameNode存储的元数据到达了内存上限时,便不能再往DataNode里存储数据了,而此时集群的DataNode中还有大量的存储空间,这样就导致因为NameNode的内存受限而限制了HDFS的存储性能。其次解释一下压力过大,在大规模集群中,如果只有一台NameNode,那么所有的客户端进行读写操作时,都会与NameNode进行连接,以及数据包的请求、校验给NameNode带来了很大的压力,此时一台单点的NameNode的性能就会成为整个集群的瓶颈。

HDFS给出的解决方案

单点故障

高可用方案:HA(High Available)

多个NN,主备切换,主

压力过大、内存受限

联邦机制:Federation(元数据分片)

多个NN,管理不通的元数据

单点故障:HDFS-HA解决方案

简化思路

分布式节点是否明确、节点权重是否明确、强一致性破坏可用性、过半通过可以中和强一致性与高可用性、最简单的自我协调实现:主从

主的选举:明确节点数量和权重

主的职能:

主:增删改查

从:查询,增删传递给主

主与从:过半数据同步

针对HDFS集群的NameNode单点故障问题,HDFS给出了一系列的解决方案,简单来说就是NN主备,2.X的NN最多支持两个NN,即一主一备,3.X的时候支持多个,推荐三个,毕竟节点太多的话节点通讯等网络消耗以及不确定性就很高。

问题回顾

现在为了方便讨论,我们假设我们我们的NN有两台,即一主一备,当我们的客户端请求到其中的NN(标记为active,简称a),假如当前的NN无法提供服务,那么需要NN(standby,简称s)无缝衔接复活对Client提供服务,那么有个问题,如果想Active NameNode进程发生异常时,Standby NameNode能无缝衔接,意味着Standby NameNode要保持与Active NameNode一致的数据,这种中间涉及到一个Standby NameNode与Active NameNode数据同步的问题,NN中存储两种元数据信息,一种是文件系统在DN中的block分布情况,另外一种是Client对DN的操作信息(如 hdfs -dfs -mkdir a),当我们Client与其中一台NN进行数据操作时,数据同步到Standby NameNode期间是阻塞还是不阻塞呢,如果是阻塞的,会极大影响性能,即强一致性会破坏可用性(参考CAP理论),如果同步非阻塞当Standby NameNode写失败时Active NameNode需不需要回滚?因此引出了JournalNode(3台,类似中间件)来辅助Active NameNode与Standby NameNode进行数据同步,当我们对Active NameNode操作时,相关信息有Active NameNode与JNN进行同步,当JNN有过半的节点确认这个消息之后即将确认信息返回给Active NameNode,也就是用JNN实现了数据一致性的解耦

Journal Node

Journal Node是一个有主从机制的消息中间件,一开始是无主状态,在JournalNode启动时会选举出主JournalNode,选举策略默认是选举ID最大的JournalNode作为主节点。NameNode与主JournalNode通信,将请求写入JournalNode,然后由主JournalNode将请求更新到其他节点,为保证更新的请求为最新,JournalNode采用了Paxos算法(与Zookeeper类似),简单来说,就是当集群中超过一半的节点都更新完这条请求,就认为这条请求写入成功。这样就保证了Standby NameNode访问JournalNode时,可以访问到最新的元数据并进行同步。

ZKFC

此时有个新的问题,Standby NameNode如何启动成为Active NameNode呢? ===> 需要手动,这就很操蛋了,所以引出了后面的机制来解决这个手动切换Standby NameNode为Active NameNode的问题

这个机制就是借助ZK的ZKFC(Zookeeper FailoverController Active&standby),ZKFC来实现自动切换Active NameNode。FailoverController和NameNode放到一台物理机上,也就是仅仅是跨进程调用而非跨网络调用,如果放到不同的物理机,那么它们沟通就需要通过网络传输来进行,就会产生很多的不可靠性,加大了监控的风险。

注意:ZKFC需要免密

ZKFC功能

  • 监控NameNode:监控NameNode的存活状态、操作系统和硬件信息。
  • 连接Zookeeper并创建Active NameNode:Zookeeper中有目录树,当所有NameNode 和 ZKFC启动时,ZKFC首先向Zookeeper申请一个创建锁,谁拿到了创建锁,谁就能创建一个临时子节点,这意味着2个ZKFC只有一个能创建子节点,这个被ZKFC创建成功的子节点中包含的就是Active NameNode。
  • 降级Active NameNode,升级Standby NameNode:假设Active NameNode存活但是ZKFC进程异常退出,这时候就会触发此机制,Zookeeper的临时子节点会被删除,然后触发callback机制,第二个ZKFC就会抢到创建锁,并自己将Active NameNode降级成Standby NameNode,升级Standby NameNode为Active NameNode。

Active NameNode切换过程

  • 首先,当所有NameNode 和 ZKFC启动时,ZKFC首先向Zookeeper申请一个创建锁,谁拿到了创建锁,谁就能创建一个临时子节点,这意味着2个ZKFC只有一个能创建子节点,这个被ZKFC创建成功的子节点中包含的就是Active NameNode。ZKFC在Zookeeper中创建的节点是一个临时节点(区别于zk的持久节点),当ZKFC和Zookeeper保持连接时,节点会一直存在,一旦会话连接断开,节点就会被删除,节点删除会触发回调机制(callback),
  • 此时若Active NameNode挂了,这时候第一个ZKFC将Active NameNode降级成Standby NameNode并将Zookeeper的临时子节点删除,然后触发callback机制,第二个ZKFC就会抢到创建锁,并去查看Active NameNode是否真的挂了,如果挂了,就升级Standby NameNode为Active NameNode。
  • 若Active NameNode还存活,但是当前Active NameNode主机的ZKFC挂了,Zookeeper的临时子节点会被删除,然后触发callback机制,第二个ZKFC就会抢到创建锁,并自己将Active NameNode降级成Standby NameNode,升级Standby NameNode为Active NameNode。
  • 极端现象:Active NameNode和ZKFC都没有问题,并且Active NameNode也能和DataNode通信,但是ZKFC和Zookeeper不能通信,且第二个ZKFC也不能访问Active NameNode来判断Active NameNode是否真的挂了,所以也不会把自己监控的NameNode升级为Active NameNode。此时Zookeeper的子节点被删除,又不能创建新节点,导致程序出错。这种情况出现的概率极其低,所以解决办法也较极端,一种办法是遇到此情况时触发一种机制将当前的Active NameNode的电源切断。

问题 : 与事件回调类似的功能实现还有哪些?

注意:HA模式下没有SNN角色,即非HA模式才有SNN

PAXOS算法

暂不详述,参见《lamport-part-time-parliament》

HA环境搭建

HOST NN JNN DN ZKFC ZK
node01
node02
node03
node04

hdfs-site.xml

<property>
  <name>dfs.nameservices</name>
  <value>mycluster</value>
</property>
<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>nn1,nn2</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn1</name>
  <value>node01:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  <value>node02:8020</value>
</property>

<property>
  <name>dfs.namenode.http-address.mycluster.nn1</name>
  <value>node01:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.mycluster.nn2</name>
  <value>node02:50070</value>
</property>

<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://node01:8485;node02:8485;node03:8485/mycluster</value>
</property>
<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/var/bigdata/hadoop/ha/dfs/jn</value>
</property>

<property>
  <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
    <name>dfs.ha.fencing.methods</name>
    <value>sshfence</value>
</property>

<property>
    <name>dfs.ha.fencing.ssh.private-key-files</name>
    <value>/root/.ssh/id_rsa</value>
</property>

 <property>
   <name>dfs.ha.automatic-failover.enabled</name>
   <value>true</value>
 </property>

core-site.xml

<property>
  <name>fs.defaultFS</name>
  <value>hdfs://mycluster</value>
</property> 
<property>
   <name>ha.zookeeper.quorum</name>
   <value>node02:2181,node03:2181,node04:2181</value>
 </property>

部署细节

After all of the necessary configuration options have been set, you must start the JournalNode daemons on the set of machines where they will run. This can be done by running the command “hdfs --daemon start journalnode” and waiting for the daemon to start on each of the relevant machines.

Once the JournalNodes have been started, one must initially synchronize the two HA NameNodes’ on-disk metadata.

  • If you are setting up a fresh HDFS cluster, you should first run the format command (hdfs namenode -format) on one of NameNodes.
  • If you have already formatted the NameNode, or are converting a non-HA-enabled cluster to be HA-enabled, you should now copy over the contents of your NameNode metadata directories to the other, unformatted NameNode(s) by running the command “hdfs namenode -bootstrapStandby” on the unformatted NameNode(s). Running this command will also ensure that the JournalNodes (as configured by dfs.namenode.shared.edits.dir) contain sufficient edits transactions to be able to start both NameNodes.
  • If you are converting a non-HA NameNode to be HA, you should run the command “hdfs namenode -initializeSharedEdits”, which will initialize the JournalNodes with the edits data from the local NameNode edits directories.

At this point you may start all your HA NameNodes as you normally would start a NameNode.

You can visit each of the NameNodes’ web pages separately by browsing to their configured HTTP addresses. You should notice that next to the configured address will be the HA state of the NameNode (either “standby” or “active”.) Whenever an HA NameNode starts, it is initially in the Standby state.

简而言之:

设置完所有必要的配置选项后,必须在将要启动JournalNode守护程序,运行命令 “hdfs -- daemon start journalnode” 并等待守护程序在每台相关计算机上启动来完成。
启动JournalNodes后,必须先同步两个HA namenodes的磁盘元数据。

  • 如果要设置新的HDFS群集,则应首先在其中一个namenode上运行format命令 (hdfs namenode -format)。

  • 如果您已经格式化了NameNode,或者正在将未启用HA的群集转换为启用HA,则现在应该将NameNode元数据目录的内容复制到其他目录中,通过在未格式化的名称节点上运行命令 “hdfs名称节点-bootstrapStandby”,实现未格式化的名称节点。运行此命令还将确保JournalNodes (y dfs.name node.shared.Edities.dir配置为b) 包含足够的编辑事务,以便能够启动两个namenode。

  • 如果要将非HA NameNode转换为HA,则应运行命令 “hdfs namenode-initializeharedits”,该命令将使用本地NameNode编辑目录中的编辑数据初始化journalnode。

    在这一点上,可以启动所有HA NameNode,就像通常会启动NameNode一样。
    您可以通过浏览其配置的HTTP地址来分别访问每个namenodes的网页。您应该注意到,配置的地址旁边将是NameNode的HA状态 (“备用” 或 “活动”)。每当HA NameNode启动时,它最初处于待机状态。

启动流程

  • 先启动JN hadoop-daemon.sh start journalnode
  • 选择一个NN做格式化:hdfs namenode -format
  • 启动这个格式化的NN,以被另外一台同步 hadoop-daemon.sh start namenode
  • 在另一台NN中执行:hdfs namenode -bootstrapStandby
  • 格式化Zookeepr: hdfs zkfc -formatZK
  • start-dfs.sh

实操细节

1、停掉原本运行的hdfs

[root@localhost current]# jps
12113 NameNode
129274 Jps
[root@localhost current]# stop-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Stopping namenodes on [node01]
Last login: Sun Mar  5 04:23:54 PST 2023 on pts/1
Stopping datanodes
Last login: Wed Mar  8 00:17:54 PST 2023 on pts/1
Stopping secondary namenodes [node02]
Last login: Wed Mar  8 00:17:56 PST 2023 on pts/1
[root@localhost current]# jps
129769 Jps

2、部署zookeeper

  • 安装JDK
  • 为ZK配置JDK

配置文件

[root@node02 conf]# vim zoo.cfg 
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/lib/zookeeper/zookeeperData
clientPort=2181
server.1=192.168.182.111:2888:3888
server.2=192.168.182.112:2888:3888
server.3=192.168.182.113:2888:3888
server.4=192.168.182.114:2888:3888

2181:客户端通讯

2888:通讯

3888:选举

在文件目录配置选举权重值myid

[root@node02 zookeeper]# cd zookeeperData/
[root@node02 zookeeperData]# ll
total 4
-rw-r--r--. 1 root root  2 Nov 29 05:08 myid
drwxr-xr-x. 2 root root 65 Nov 29 05:29 version-2
[root@node02 zookeeperData]# cat myid
1

启动zk

[root@node02 zookeeperData]# zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/lib/zookeeper/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[root@node02 zookeeperData]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/lib/zookeeper/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Error contacting service. It is probably not running. #己启动了但是无法连接其他节点
[root@node02 zookeeperData]# jps
30323 Jps

怀疑端口被占用

[root@node02 zookeeperData]# netstat -apn | grep 2181
tcp6       0      0 :::2181                 :::*                    LISTEN      31023/java          
unix  2      [ ACC ]     STREAM     LISTENING     21815    673/VGAuthService    /var/run/vmware/guestServicePipe
unix  2      [ ]         DGRAM                    21810    673/VGAuthService    
unix  3      [ ]         STREAM     CONNECTED     21819    1/systemd            /run/systemd/journal/stdout
[root@node02 zookeeperData]# kill -9 31023

发现还是报错,审视一下zk启动的日志

[root@node02 zookeeperData]# cat zookeeper-root-server-node04.out 
java.net.NoRouteToHostException: No route to host (Host unreachable)
	at java.net.PlainSocketImpl.socketConnect(Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:656)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:713)
	at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:626)
	at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)

一检查,防火墙没关闭

[root@node02 zookeeperData]# systemctl status firewalld
● firewalld.service - firewalld - dynamic firewall daemon
   Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
   Active: active (running) since Sun 2023-03-05 02:00:41 PST; 3 days ago
     Docs: man:firewalld(1)
 Main PID: 759 (firewalld)
    Tasks: 2
   CGroup: /system.slice/firewalld.service
           └─759 /usr/bin/python2 -Es /usr/sbin/firewalld --nofork --nopid

Mar 05 02:00:41 localhost.localdomain systemd[1]: Starting firewalld - dynamic firewall daemon...
Mar 05 02:00:41 localhost.localdomain systemd[1]: Started firewalld - dynamic firewall daemon.
[root@node02 zookeeperData]# systemctl stop firewalld  关闭防火墙
[root@node02 ~]# systemctl disable firewalld  # 设置为下次不启动

检查ZK

[root@node04 logs]# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/lib/zookeeper/apache-zookeeper-3.5.7-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: leader

修改配置文件,然后分发:

[root@localhost hadoop]# scp core-site.xml hdfs-site.xml  node02:`pwd`
core-site.xml                                                                                   100% 1001   840.7KB/s   00:00 
hdfs-site.xml                                                                                   100% 2466     2.0MB/s   00:00 
[root@localhost hadoop]# scp core-site.xml hdfs-site.xml  node03:`pwd`
core-site.xml                                                                                   100% 1001     1.1MB/s   00:00 
hdfs-site.xml                                                                                   100% 2466     1.0MB/s   00:00 
[root@localhost hadoop]# scp core-site.xml hdfs-site.xml  node04:`pwd`
core-site.xml                                                                                   100% 1001   635.0KB/s   00:00 
hdfs-site.xml  

启动JN

[root@localhost hadoop]# hadoop-daemon.sh start journalnode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.

此时在node03新开一个窗口监视journalnode日志情况

[root@node03 logs]# tail -f hadoop-root-journalnode-node03.log 
2023-03-08 23:11:54,403 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@21e360a{/logs,file:///usr/local/lib/hadoop/hadoop-3.1.3/logs/,AVAILABLE}
2023-03-08 23:11:54,404 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.s.ServletContextHandler@58d75e99{/static,file:///usr/local/lib/hadoop/hadoop-3.1.3/share/hadoop/hdfs/webapps/static/,AVAILABLE}
2023-03-08 23:11:54,511 INFO org.eclipse.jetty.server.handler.ContextHandler: Started o.e.j.w.WebAppContext@710c2b53{/,file:///usr/local/lib/hadoop/hadoop-3.1.3/share/hadoop/hdfs/webapps/journal/,AVAILABLE}{/journal}
2023-03-08 23:11:54,540 INFO org.eclipse.jetty.server.AbstractConnector: Started ServerConnector@18302335{HTTP/1.1,[http/1.1]}{0.0.0.0:8480}
2023-03-08 23:11:54,540 INFO org.eclipse.jetty.server.Server: Started @3010ms
2023-03-08 23:11:54,676 INFO org.apache.hadoop.hdfs.qjournal.server.JournalNode: RPC server is binding to 0.0.0.0:8485
2023-03-08 23:11:54,751 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 500 scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler
2023-03-08 23:11:54,767 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8485
2023-03-08 23:11:54,969 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2023-03-08 23:11:54,970 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8485: starting

此时格式化namenode, 任选其中一台

[root@localhost hadoop]# hdfs namenode -format 

查看新格式化的VERSION

[root@localhost current]# cat VERSION 
#Wed Mar 08 23:27:46 PST 2023
namespaceID=1867933108
clusterID=CID-7e565f8d-877d-4e71-807e-2cc1933bbc50
cTime=1678346866613
storageType=JOURNAL_NODE
layoutVersion=-64

格式化ZKFC

格式化ZK,在此之前我们先看看zk的节点状况

[root@node03 ~]# zkCli.sh
[zk: localhost:2181(CONNECTED) 1] ls /
[zookeeper]

在活跃的NameNode上格式化ZK

[root@localhost current]# hdfs zkfc -formatZK
......
2023-03-08 23:40:23,830 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.

此时zk的节点状况

[root@node03 ~]# zkCli.sh
[zk: localhost:2181(CONNECTED) 1] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 2] ls /
[hadoop-ha, zookeeper]
[zk: localhost:2181(CONNECTED) 3] ls /hadoop-ha
[mycluster]

HA模式HDFS启动

启动hdfs

[root@localhost current]# start-dfs.sh 
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [node01 node02]
Last login: Wed Mar  8 22:15:57 PST 2023 from 192.168.182.112 on pts/4
node01: namenode is running as process 84532.  Stop it first.
Starting datanodes
Last login: Wed Mar  8 23:46:27 PST 2023 on pts/1
Starting journal nodes [node01 node02 node03]
ERROR: Attempting to operate on hdfs journalnode as root
ERROR: but there is no HDFS_JOURNALNODE_USER defined. Aborting operation.
Starting ZK Failover Controllers on NN hosts [node01 node02]
ERROR: Attempting to operate on hdfs zkfc as root
ERROR: but there is no HDFS_ZKFC_USER defined. Aborting operation.

报错没有有关用户权限

vi ~/.bash_profile

export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HDFS_ZKFC_USER=root

. ~/.bash_profile

再次启动, perfect !

[root@localhost current]# start-dfs.sh 
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [node01 node02]
Last login: Wed Mar  8 23:51:28 PST 2023 on pts/1
node02: namenode is running as process 45336.  Stop it first.
node01: namenode is running as process 84532.  Stop it first.
Starting datanodes
Last login: Wed Mar  8 23:52:12 PST 2023 on pts/1
node04: datanode is running as process 41819.  Stop it first.
node02: datanode is running as process 45434.  Stop it first.
node03: datanode is running as process 39678.  Stop it first.
Starting journal nodes [node01 node02 node03]
Last login: Wed Mar  8 23:52:12 PST 2023 on pts/1
node03: journalnode is running as process 39315.  Stop it first.
node02: journalnode is running as process 44913.  Stop it first.
node01: journalnode is running as process 83718.  Stop it first.
Starting ZK Failover Controllers on NN hosts [node01 node02]
Last login: Wed Mar  8 23:52:14 PST 2023 on pts/1

ZK抢锁演示

此时去ZK查看锁的获取情况

[zk: localhost:2181(CONNECTED) 9] get /hadoop-ha/mycluster/ActiveStandbyElectorLock
	myclusternn1node01 �>(�>

此时访问两个NameNode的前端,可以发现NameNode01为活跃状态,Node02为standby状态,即谁抢到锁谁就是Active状态

image-20230309155932453

image-20230309160310115

NameNode 主备自动切换

[root@localhost current]# jps
84532 NameNode
87559 DFSZKFailoverController
83718 JournalNode
94438 Jps
[root@localhost current]# kill -9 84532

[zk: localhost:2181(CONNECTED) 10] get /hadoop-ha/mycluster/ActiveStandbyElectorLock
	myclusternn2node02 �>(�>

image-20230309173808977

[root@localhost current]# hadoop-daemon.sh start namenode
WARNING: Use of this script to start HDFS daemons is deprecated.
WARNING: Attempting to execute replacement "hdfs --daemon start" instead.

image-20230309173925680