HBase 分布式部署(进阶中级)

发布时间 2023-07-06 18:49:23作者: 雙_木

1. HBase 分布式部署(进阶中级)

1. 实验任务一:部署前期准备

1.1. 步骤一:安装部署 hadoop ha 分布式环境
1.2. 步骤二:解压安装文件
[root@master ~]# cd
[root@master ~]# ls
anaconda-ks.cfg         jdk-8u152-linux-x64.tar.gz
hadoop-2.7.1.tar.gz     zookeeper-3.4.8.tar.gz
hbase-1.2.1-bin.tar.gz
[root@master ~]# tar xf hbase-1.2.1-bin.tar.gz -C /usr/local/src
[root@master ~]# cd /usr/local/src
[root@master src]# ls
hadoop  hbase-1.2.1  jdk  zookeeper
[root@master src]# mv hbase-1.2.1/ hbase
#配置环境变量
[root@master src]# vi /etc/profile.d/hbase.sh

export HBASE_HOME=/usr/local/src/hbase
export PATH=$PATH:$HBASE_HOME/bin
#分发环境变量
[root@master src]# scp -r /etc/profile.d/hbase.sh root@slave1:/etc/profile.d/
hbase.sh                          100%   74    69.5KB/s   00:00    
[root@master src]# scp -r /etc/profile.d/hbase.sh root@slave2:/etc/profile.d/
hbase.sh                          100%   74    80.8KB/s   00:00  

2. 实验任务二:修改配置文件(master 节点)

2.1. 步骤一:conf 下文件修改
HBase 的配置文件放置在安装目录下的 conf 文件夹内,切换到该目录首先修改 HBase 环境配置文件 hbase-env.sh,设置 JAVA_HOME 为自己安装的版本。将以下配置信息添加到 hbase-env.sh 的末尾。
[root@master conf]# vi hbase-env.sh
export JAVA_HOME=/usr/local/src/jdk
export HADOOP_HOME=/usr/local/src/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HBASE_MANAGES_ZK=false 
export HBASE_LOG_DIR=${HBASE_HOME}/logs
export HBASE_PID_DIR=${HBASE_HOME}/pid
#修改配置文件 hbase-site.xml,添加相关信息。将以下配置信息添加到hbase-site.xml 文件<configuration>与</configuration>之间。
[root@master conf]# vi hbase-site.xml
<configuration>
    <property>
        <name>hbase.rootdir</name>
         <value>hdfs://master:8020/hbase</value>
    </property>
    <property>
         <name>hbase.master.info.port</name>
         <value>16010</value>
    </property>
    <property>
         <name>hbase.zookeeper.property.clientPort</name>
         <value>2181</value>
    </property>
    <property>
        <name>hbase.tmp.dir</name>
        <value>/usr/local/src/hbase/tmp</value>
    </property>
    <property>
         <name>zookeeper.session.timeout</name>
         <value>120000</value>
    </property>
    <property>
         <name>hbase.cluster.distributed</name>
         <value>true</value>
    </property>
    <property>
         <name>hbase.zookeeper.quorum</name>
         <value>master,slave1,slave2</value>
    </property>
    <property>
         <name>hbase.zookeeper.property.dataDir</name>
        <value>/usr/local/src/hbase/tmp/zookeeper-hbase</value>
    </property>
</configuration>

hbase.rootdir:指定 HBase 的存储目录。

hbase.master.info.port:浏览器的访问端口 hbase.zookeeper.property.clientPort: 指定 zk 的连接端口。

hbase.tmp.dir:指定 hbase 在本地下生成文件路径,类似于 hadoop.tmp.dir。 zookeeper.session.timeout:RegionServer 与 Zookeeper 间的连接超时时间。当超时 时间到后,ReigonServer 会被 Zookeeper 从 RS 集群清单中移除,HMaster 收到移除通知后, 会对这台 server 负责的 regions 重新 balance,让其他存活的 RegionServer 接管.

hbase.cluster.distributed:HBase 是否为分布式模式。

hbase.zookeeper.quorum:默认值是 localhost,列出 zookeepr ensemble 中的 servers。 hbase.zookeeper.property.dataDir:这里表示HBase在ZooKeeper上存储数据的位置。

[root@master conf]# vi regionservers

slave1
slave2
#为了让 Hbase 读取到 hadoop 的配置,将 core-site.xml 和 hdfs-site.xml 两个文件拷贝到 $HBASE_HOME/conf/ 目录下。
[root@master conf]# cp /usr/local/src/hadoop/etc/hadoop/core-site.xml /usr/local/src/hbase/conf/
[root@master conf]# cp /usr/local/src/hadoop/etc/hadoop/hdfs-site.xml /usr/local/src/hbase/conf/

2.2. 步骤二:集群分发
将 master 节点配置好的 HBase 安装包分发给 slave1,slave2 节点。
[root@master ~]# scp -r /usr/local/src/hbase root@slave1:/usr/local/src # 从 master #远程拷贝 hbase 文件到 slave1 节点
[root@master ~]# scp -r /usr/local/src/hbase root@slave2:/usr/local/src # 从 master #远程拷贝 hbase 文件到 slave2 节点
[root@master ~]# chown -R hadoop:hadoop /usr/local/src/
[root@master ~]# ll /usr/local/src
total 4
drwxr-xr-x. 11 hadoop hadoop  172 May 23 23:21 hadoop
drwxr-xr-x.  7 hadoop hadoop  160 Jun  3 03:37 hbase
drwxr-xr-x.  8 hadoop hadoop  255 Sep 14  2017 jdk
drwxr-xr-x. 12 hadoop hadoop 4096 May 23 22:08 zookeeper

[root@slave1 ~]# chown -R hadoop:hadoop /usr/local/src/
[root@slave1 ~]# ll /usr/local/src
total 4
drwxr-xr-x. 11 hadoop hadoop  172 May 23 23:21 hadoop
drwxr-xr-x.  7 hadoop hadoop  160 Jun  3 04:18 hbase
drwxr-xr-x.  8 hadoop hadoop  255 Mar 14 23:25 jdk
drwxr-xr-x. 12 hadoop hadoop 4096 May 23 22:25 zookeeper

[root@slave2 ~]# chown -R hadoop:hadoop /usr/local/src/
[root@slave2 ~]# ll /usr/local/src
total 4
drwxr-xr-x. 11 hadoop hadoop  172 May 23 23:21 hadoop
drwxr-xr-x.  7 hadoop hadoop  160 Jun  3 04:18 hbase
drwxr-xr-x.  8 hadoop hadoop  255 Mar 14 23:25 jdk
drwxr-xr-x. 12 hadoop hadoop 4096 May 23 22:25 zookeeper
[root@master ~]# su hadoop
[root@slave1 ~]# su hadoop
[root@slave2 ~]# su hadoop
2.3. 步骤三:HBase 集群启动
[hadoop@master ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@master ~]$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: follower
[hadoop@master ~]$ jps
1458 Jps
1401 QuorumPeerMain
[hadoop@slave1 ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[hadoop@slave1 ~]$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: leader
[hadoop@slave2 ~]$ zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... ^[[ASTARTED
[hadoop@slave2 ~]$ zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/src/zookeeper/bin/../conf/zoo.cfg
Mode: follower

[hadoop@master ~]$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [slave1 master]
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-master.out
slave1: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-namenode-slave1.out
slave1: datanode running as process 1509. Stop it first.
master: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-master.out
slave2: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-datanode-slave2.out
Starting journal nodes [master slave1 slave2]
slave1: starting journalnode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-journalnode-slave1.out
slave2: starting journalnode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-journalnode-slave2.out
master: starting journalnode, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-journalnode-master.out
Starting ZK Failover Controllers on NN hosts [slave1 master]
master: starting zkfc, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-zkfc-master.out
slave1: starting zkfc, logging to /usr/local/src/hadoop/logs/hadoop-hadoop-zkfc-slave1.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-resourcemanager-master.out
slave1: nodemanager running as process 1737. Stop it first.
slave2: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-slave2.out
master: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-hadoop-nodemanager-master.out
[hadoop@master ~]$ jps
1712 DataNode
2115 DFSZKFailoverController
2355 NodeManager
1401 QuorumPeerMain
2233 ResourceManager
3610 Jps
1916 JournalNode
1599 NameNode
[hadoop@slave1 ~]$ jps
1587 JournalNode
2277 NameNode
1846 NodeManager
2390 DataNode
1384 QuorumPeerMain
1705 DFSZKFailoverController
2027 ResourceManager
2524 Jps
1280 QuorumPeerMain
1590 NodeManager
1384 DataNode
1897 Jps
1484 JournalNode
[hadoop@master ~]$ cd /usr/local/src/hbase/bin
[hadoop@master bin]$  ./start-hbase.sh
starting master, logging to /usr/local/src/hbase/logs/hbase-hadoop-master-master.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
slave2: starting regionserver, logging to /usr/local/src/hbase/logs/hbase-hadoop-regionserver-slave2.out
slave1: starting regionserver, logging to /usr/local/src/hbase/logs/hbase-hadoop-regionserver-slave1.out
slave1: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
slave1: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
slave1: SLF4J: Class path contains multiple SLF4J bindings.
slave1: SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
slave1: SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
slave1: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
slave1: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
slave2: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
slave2: Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
slave2: SLF4J: Class path contains multiple SLF4J bindings.
slave2: SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
slave2: SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
slave2: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
slave2: SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[hadoop@master bin]$ jps
1712 DataNode
3744 HMaster
3986 Jps
2115 DFSZKFailoverController
2355 NodeManager
1401 QuorumPeerMain
2233 ResourceManager
1916 JournalNode
1599 NameNode

用 webUI 查看集群,特别强调 hbase2.0 的端口是 16010

image-20230603163107158

2. 实验二 HBase 库操作与表操作

1. 实验任务一: HBase 库操作

1.1. 步骤一: HBase 集群启动(上一个实验已经启动)

HBase 依赖 hdfs 服务,通过相互之间的依赖关系得到启动顺序为:Zookeeper > hadoop > HBase。

首先启动 Zookeeper,在所有节点上执行命令。

[hadoop@master ~]$ ./zkServer.sh start
[hadoop@slave1 ~]$ ./zkServer.sh start
[hadoop@slave2 ~]$ ./zkServer.sh start
#Zookeeper 选举机制会自动选择 Leader 节点,在 master 节点启动 hadoop 服务。
[hadoop@master ~]$ ./start-all.sh
#hadoop 从节点会自行启动。最后启动 HBase(master 节点)
[hadoop@master ~]$ ./start-hbase.sh 
[hadoop@master bin]$ jps
1712 DataNode
3744 HMaster
4064 Jps
2115 DFSZKFailoverController
2355 NodeManager
1401 QuorumPeerMain
2233 ResourceManager
1916 JournalNode
1599 NameNode

.1.2. 步骤二: HBase 动态删除节点
节点升级或者硬盘扩容在存储服务器上属于正常现象,当某存储节点需要扩容升级短暂 下线后需要该节点下线。 假设 slaves3 节点扩容升级,执行以下命令,停止该节点上 HBase 服务。
[hadoop@master bin]$ graceful_stop.sh slave2
2023-06-03T04:34:38 Disabling load balancer
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2023-06-03T04:34:41 Previous balancer state was true
2023-06-03T04:34:41 Unloading slave2 region(s)
......
zookeeper.ClientCnxn: Opening socket connection to server slave1/192.168.100.20:2181. Will not attempt to authenticate using SASL (unknown error)
2023-06-03 04:34:44,265 INFO  [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Socket connection established to slave1/192.168.100.20:2181, initiating session
2023-06-03 04:34:44,270 INFO  [main-SendThread(slave1:2181)] zookeeper.ClientCnxn: Session establishment complete on server slave1/192.168.100.20:2181, sessionid = 0x288805901e70005, negotiated timeout = 40000
Valid region move targets: 
slave1,16020,1685780992418
2023-06-03 04:34:44,588 INFO  [main] region_mover: Moving 2 region(s) from slave2,16020,1685780992274 on 1 servers using 1 threads.
2023-06-03 04:34:44,603 INFO  [main] region_mover: Waiting for the pool to complete
2023-06-03 04:34:44,605 INFO  [RubyThread-6: /usr/local/src/hbase/bin/thread-pool.rb:28] region_mover: Moving region 1588230740 (1 of 2) to server=slave1,16020,1685780992418 for slave2,16020,1685780992274
2023-06-03 04:34:44,797 INFO  [RubyThread-6: /usr/local/src/hbase/bin/thread-pool.rb:28] zookeeper.RecoverableZooKeeper: Process identifier=region_mover connecting to ZooKeeper ensemble=master:2181,slave1:2181,slave2:2181
2023-06-03 04:34:44,797 INFO  [RubyThread-6: /usr/local/src/hbase/bin/thread-pool.rb:28] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,slave1:2181,slave2:2181 sessionTimeout=120000 watcher=region_mover0x0, quorum=master:2181,slave1:2181,slave2:2181, baseZNode=/hbase
2023-06-03 04:34:44,798 INFO  [RubyThread-6: /usr/local/src/hbase/bin/thread-pool.rb:28-SendThread(master:2181)] zookeeper.ClientCnxn: Opening socket connection to server master/192.168.100.10:2181. Will not attempt to authenticate using SASL (unknown error)
2023-06-03 04:34:44,798 INFO  [RubyThread-6: /usr/local/src/hbase/bin/thread-pool.rb:28-SendThread(master:2181)] zookeeper.ClientCnxn: Socket connection established to master/192.168.100.10:2181, initiating session
2023-06-03 04:34:44,808 INFO  [RubyThread-6: /usr/local/src/hbase/bin/thread-pool.rb:28-SendThread(master:2181)] zookeeper.ClientCnxn: Session establishment complete on server master/192.168.100.10:2181, sessionid = 0x188805901e20004, negotiated timeout = 40000
2023-06-03 04:34:44,822 INFO  [RubyThread-6: /usr/local/src/hbase/bin/thread-pool.rb:28] zookeeper.ZooKeeper: Session: 0x188805901e20004 closed
2023-06-03 04:34:44,823 INFO  [RubyThread-6: /usr/local/src/hbase/bin/thread-pool.rb:28-EventThread] zookeeper.ClientCnxn: EventThread shut down
2023-06-03 04:34:46,377 INFO  [RubyThread-6: /usr/local/src/hbase/bin/thread-pool.rb:28] region_mover: Moved region hbase:meta,,1.1588230740 cost: 1.721
2023-06-03 04:34:46,378 INFO  [RubyThread-6: /usr/local/src/hbase/bin/thread-pool.rb:28] region_mover: Moving region eb81dc9fba49d197d3fe4ef209d97204 (2 of 2) to server=slave1,16020,1685780992418 for slave2,16020,1685780992274
2023-06-03 04:34:48,245 INFO  [RubyThread-6: /usr/local/src/hbase/bin/thread-pool.rb:28] region_mover: Moved region hbase:namespace,,1685780998903.eb81dc9fba49d197d3fe4ef209d97204. cost: 1.845
2023-06-03 04:34:48,246 INFO  [main] region_mover: Pool completed
2023-06-03 04:34:48,251 INFO  [main] region_mover: Wrote list of moved regions to /tmp/slave2
2023-06-03T04:34:48 Unloaded slave2 region(s)
2023-06-03T04:34:48 Stopping regionserver on slave2
slave2: stopping regionserver..........
2023-06-03T04:34:58 Restoring balancer state to true

graceful_stop.sh 脚本会自行关闭平衡器,移动 slaves2 节点上的数据到其他节点上, 此步骤会消耗大量时间等待。 同时需要 hadoop 中删除节点。在 hdfs-site.xml 中添加配置。需要新建 exclude 文件, 该文件写入删除节点名称。
[hadoop@master bin]$ vi /usr/local/src/hadoop/etc/hadoop/exclude

slave2
[hadoop@master bin]$ vi /usr/local/src/hadoop/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.hosts.exclude</name>
<value>/usr/local/src/hadoop/etc/hadoop/exclude</value>
</property>
#dfs.hosts.exclude:表示需要删除 exclude 中的节点。
#刷新配置生效
[hadoop@master bin]$ cd
[hadoop@master ~]$ hadoop dfsadmin -refreshNodes
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Refresh nodes successful for slave1/192.168.100.20:8020
Refresh nodes successful for master/192.168.100.10:8020

打开 Web UI 监控页面查看,发现此节点显示(Decommission In Progress),表示节点 正在做数据迁移,等待后节点停止,dead node 列表显示下线节点。

节点下线后需要将 slaves 与 exclude 文件中 slave2 删除,刷新 hadoop 命令,此 时全部结束。

image-20230603163954998

1.3. 步骤三: HBase 动态增加节点

集群的分布式扩展是非关系数据库与传统数据库相比最大的优点。在原有集群基础上增 第七章: 加新的节点 slave2。增加新节点首先保证新的 hadoop集群已经运行正常,不需要关闭集群, 执行以下命令即可。

在新的节点上启动服务。切换到新增节点上,使用以下命令。

[hadoop@slave2 ~]$ cd /usr/local/src/hbase/bin
[hadoop@slave2 bin]$ ./hbase-daemon.sh start regionserver
starting regionserver, logging to /usr/local/src/hbase/logs/hbase-hadoop-regionserver-slave2.out
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

image-20230603164140454

注:以上步骤的前提是此节点已增加到 hadoop 集群中,且正常使用。

2. 实验任务二:HBase 表管理

#建立表,两个列簇:name 和 num
[hadoop@master ~]$ hbase shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.2.1, r8d8a7107dc4ccbf36a92f64675dc60392f85c015, Wed Mar 30 11:19:21 CDT 2016
#建立表 student,两个列簇:name 和 num
hbase(main):001:0> create 'student',{NAME=>'name'},{NAME=>'num'}
0 row(s) in 1.4000 seconds

=> Hbase::Table - student
#新建学生表,存储姓名与学号。
#语法:create <table>, {NAME => <family>, VERSIONS => <VERSIONS>}
#查看所有表与详细信息
hbase(main):002:0> list
TABLE                                                                          
student                                                                        
1 row(s) in 0.0150 seconds

=> ["student"]
#查看建表详细信息
hbase(main):003:0> describe 'student'
Table student is ENABLED                                                       
student                                                                        
COLUMN FAMILIES DESCRIPTION                                                    
{NAME => 'name', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', K
EEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', 
COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => 
'65536', REPLICATION_SCOPE => '0'}                                             
{NAME => 'num', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KE
EP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', C
OMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '
65536', REPLICATION_SCOPE => '0'}                                              
2 row(s) in 0.0640 seconds
#修改表
hbase(main):004:0>  alter 'student' ,{NAME=>'tel'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9400 seconds
#新增加新的列 tel,alter 也可以对列删除,对属性进行修改。
hbase(main):005:0> alter 'student' ,{'NAME'=>'name',VERSIONS=>'2'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9030 seconds

hbase(main):006:0>  alter 'student',{NAME=>'tel',METHOD=>'delete'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
0 row(s) in 1.9030 seconds
#修改原 name 列的 VERSIONS 属性为 2。删除刚增加的 tel 列。
#删除表
hbase(main):007:0> disable 'student'
0 row(s) in 2.2670 seconds

hbase(main):008:0> drop 'student'
0 row(s) in 1.2490 seconds
#最后可查看数据库状态,包括正在运行的节点,死亡节点等信息。
hbase(main):009:0> status
1 active master, 0 backup masters, 2 servers, 0 dead, 1.0000 average load

3. 实验三 HBase 数据操作

1. 实验任务一:简单操作

1.1. 步骤一:插入数据和修改
#建立表 student,两个列簇:name 和 num
hbase(main):010:0> create 'student',{NAME=>'name'},{NAME=>'num'}
0 row(s) in 1.2340 seconds

=> Hbase::Table - student
hbase(main):011:0> list
TABLE                                                                          
student                                                                        
1 row(s) in 0.0070 seconds

=> ["student"]
#插入两条数据:
hbase(main):012:0> put 'student','rk1','name','Tom'
0 row(s) in 0.0530 seconds

hbase(main):013:0> put 'student','rk1','num','123456'
0 row(s) in 0.0040 seconds

hbase(main):014:0> put 'student','rk2','name','Sun'
0 row(s) in 0.0040 seconds

hbase(main):015:0> put 'student','rk2','num','123456'
0 row(s) in 0.0070 seconds

hbase(main):016:0> put 'student','rk3','name:cha','wangyu'
0 row(s) in 0.0070 seconds
1.2. 步骤二:读取指定行、指定行中的列的信息
hbase(main):017:0>  get 'student','rk1'
COLUMN               CELL                                                      
 name:               timestamp=1685782242322, value=Tom                        
 num:                timestamp=1685782242351, value=123456                     
2 row(s) in 0.0170 seconds

hbase(main):018:0>  get 'student','rk1','name'
COLUMN               CELL                                                      
 name:               timestamp=1685782242322, value=Tom                        
1 row(s) in 0.0070 seconds
1.3. 步骤三:scan 命令扫描全表

语法:scan , {COLUMNS => [ ,.... ], LIMIT => num}

注:数据导入时,要注意数据的格式,否则显示为十六进制。

hbase(main):019:0> scan 'student'
ROW                  COLUMN+CELL                                               
 rk1                 column=name:, timestamp=1685782242322, value=Tom          
 rk1                 column=num:, timestamp=1685782242351, value=123456        
 rk2                 column=name:, timestamp=1685782242368, value=Sun          
 rk2                 column=num:, timestamp=1685782242384, value=123456        
 rk3                 column=name:cha, timestamp=1685782242400, value=wangyu    
3 row(s) in 0.0220 seconds

1.1. 步骤四:删除指定行中的列、指定行,清空表。
hbase(main):020:0> delete 'student','rk2','name'
0 row(s) in 0.0240 seconds

hbase(main):021:0> deleteall 'student','rk2'
0 row(s) in 0.0070 seconds

hbase(main):022:0> truncate 'student'
Truncating 'student' table (it may take a while):
 - Disabling table...
 - Truncating table...
0 row(s) in 3.3750 seconds

语法:delete , 必须指定列名,这里需 要注意,如果该列保存有多个版本的数据,将一并被删除。

使用 deleteall 命令,删除 table_name 表中 rowkey002 这行数据。

语法:deleteall ,可以不指定列名,删 除整行数据。

使用 truncate 命令,删除 table_name 表中的所有数据。

语法:truncate ,其具体过程是:disable table -> drop table -> create table。

2. 实验任务二:模糊查询

.2.1. 步骤一:限制查询
hbase(main):026:0* put 'student','rk1','name','Tom'
0 row(s) in 0.1220 seconds

hbase(main):027:0> put 'student','rk1','num','123456'
0 row(s) in 0.0090 seconds

hbase(main):028:0> put 'student','rk2','name','Sun'
0 row(s) in 0.0050 seconds

hbase(main):029:0> put 'student','rk2','num','123456'
0 row(s) in 0.0080 seconds

hbase(main):030:0> put 'student','rk3','name:cha','wangyu'
0 row(s) in 0.0080 seconds

hbase(main):031:0> scan 'student',{COLUMNS=>'name'}
ROW                  COLUMN+CELL                                               
 rk1                 column=name:, timestamp=1685782607060, value=Tom          
 rk2                 column=name:, timestamp=1685782607104, value=Sun          
 rk3                 column=name:cha, timestamp=1685782607136, value=wangyu    
3 row(s) in 0.0110 seconds

hbase(main):032:0> scan 'student',{COLUMNS=>['name','num'],LIMIT=>2}
ROW                  COLUMN+CELL                                               
 rk1                 column=name:, timestamp=1685782607060, value=Tom          
 rk1                 column=num:, timestamp=1685782607088, value=123456        
 rk2                 column=name:, timestamp=1685782607104, value=Sun          
 rk2                 column=num:, timestamp=1685782607120, value=123456        
2 row(s) in 0.0180 seconds
hbase(main):033:0> count 'student'
3 row(s) in 0.0210 seconds

=> 3

语法:scan ,{COLUMNS=>' column '} count 对表计数时 INTERVAL: 每隔多少行显示一次 count,默认是 1000,CACHE:每 次去取的缓存区大小,默认是 10,调整该参数可提高查询速度,大表查询通过参数设置可 以加快计算速度。

语法:count , {INTERVAL => intervalNum, CACHE => cacheNum}

2.2. 步骤二:限制时间范围
hbase(main):034:0> scan 'student', {TIMERANGE => [1685782607060,1685782607120]}
ROW                  COLUMN+CELL                                               
 rk1                 column=name:, timestamp=1685782607060, value=Tom          
 rk1                 column=num:, timestamp=1685782607088, value=123456        
 rk2                 column=name:, timestamp=1685782607104, value=Sun          
2 row(s) in 0.0160 seconds
2.3. 步骤三:PrefixFilter:rowKey 前缀过滤
hbase(main):035:0>  scan 'student',{FILTER=>"PrefixFilter('rk')"}
ROW                  COLUMN+CELL                                               
 rk1                 column=name:, timestamp=1685782607060, value=Tom          
 rk1                 column=num:, timestamp=1685782607088, value=123456        
 rk2                 column=name:, timestamp=1685782607104, value=Sun          
 rk2                 column=num:, timestamp=1685782607120, value=123456        
 rk3                 column=name:cha, timestamp=1685782607136, value=wangyu    
3 row(s) in 0.0170 seconds
#同时也有 QualifierFilter:列名过滤器、TimestampsFilter:时间戳过滤器等,支持“且”操作。#ValueFilter:值确定查询(value=Tom)与模糊查询(value 包含 m)

hbase(main):036:0> scan 'student',FILTER=>"ValueFilter(=,'binary:Tom')"
ROW                  COLUMN+CELL                                               
 rk1                 column=name:, timestamp=1685782607060, value=Tom          
1 row(s) in 0.0230 seconds

hbase(main):037:0>  scan 'student',FILTER=>"ValueFilter(=,'substring:m')"
ROW                  COLUMN+CELL                                               
 rk1                 column=name:, timestamp=1685782607060, value=Tom          
1 row(s) in 0.0180 seconds


3. 实验任务三:批量导入/导出

3.1. 步骤一: ImportTsv 工具

命令:bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv

首先数据存入到.csv 文件,上传至 hdfs 服务器中。hbase 调用 MapReduce 服务,当数 据量较大时需等待。
[hadoop@master ~]$ hdfs dfs -put /home/student.csv /input
[hadoop@master ~]$ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=HBASE_ROW_KEY,name,num student /input/student.csv
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2023-06-03 05:03:49,915 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x7dcf94f8 connecting to ZooKeeper ensemble=master:2181,slave1:2181,slave2:2181
2023-06-03 05:03:49,920 INFO  [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2023-06-03 05:03:49,920 INFO  [main] zookeeper.ZooKeeper: Client environment:host.name=master
2023-06-03 05:03:49,920 INFO  [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_152
2023-06-03 05:03:49,920 INFO  [main] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
......
2023-06-03 05:03:50,468 WARN  [main] mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present.  Continuing without it.
2023-06-03 05:03:50,474 INFO  [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x38880590ddb0007
2023-06-03 05:03:50,477 INFO  [main] zookeeper.ZooKeeper: Session: 0x38880590ddb0007 closed
2023-06-03 05:03:50,477 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2023-06-03 05:03:50,601 INFO  [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2023-06-03 05:03:52,381 INFO  [main] input.FileInputFormat: Total input paths to process : 1
2023-06-03 05:03:52,442 INFO  [main] mapreduce.JobSubmitter: number of splits:1
2023-06-03 05:03:52,450 INFO  [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2023-06-03 05:03:52,947 INFO  [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1685780654813_0001
2023-06-03 05:03:53,266 INFO  [main] impl.YarnClientImpl: Submitted application application_1685780654813_0001
2023-06-03 05:03:53,287 INFO  [main] mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1685780654813_0001/
2023-06-03 05:03:53,287 INFO  [main] mapreduce.Job: Running job: job_1685780654813_0001
2023-06-03 05:03:59,375 INFO  [main] mapreduce.Job: Job job_1685780654813_0001 running in uber mode : false
2023-06-03 05:03:59,376 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2023-06-03 05:04:05,487 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2023-06-03 05:04:06,502 INFO  [main] mapreduce.Job: Job job_1685780654813_0001 completed successfully
2023-06-03 05:04:06,554 INFO  [main] mapreduce.Job: Counters: 31
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=148394
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=8795
		HDFS: Number of bytes written=0
		HDFS: Number of read operations=2
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=0
	Job Counters 
		Launched map tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=3715
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=3715
		Total vcore-seconds taken by all map tasks=3715
		Total megabyte-seconds taken by all map tasks=3804160
	Map-Reduce Framework
		Map input records=51
		Map output records=14
		Input split bytes=99
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=88
		CPU time spent (ms)=1060
		Physical memory (bytes) snapshot=167784448
		Virtual memory (bytes) snapshot=2144755712
		Total committed heap usage (bytes)=88604672
	ImportTsv
		Bad Lines=37
	File Input Format Counters 
		Bytes Read=8696
	File Output Format Counters 
		Bytes Written=0

image-20230603170812630

3.2. 步骤二: Export 数据导出

命令:bin/hbase org.apache.hadoop.hbase.mapreduce.Export

[hadoop@master ~]$ cd /usr/local/src/hbase/bin
[hadoop@master bin]$ hbase org.apache.hadoop.hbase.mapreduce.Export student /output/hbase-data-back
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2023-06-03 05:09:16,241 INFO  [main] mapreduce.Export: versions=1, starttime=0, endtime=9223372036854775807, keepDeletedCells=false
2023-06-03 05:09:16,532 WARN  [main] mapreduce.TableMapReduceUtil: The hbase-prefix-tree module jar containing PrefixTreeCodec is not present.  Continuing without it.
2023-06-03 05:09:18,172 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x2cb3d0f7 connecting to ZooKeeper ensemble=master:2181,slave1:2181,slave2:2181
2023-06-03 05:09:18,176 INFO  [main] zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2023-06-03 05:09:18,176 INFO  [main] zookeeper.ZooKeeper: Client environment:host.name=master
2023-06-03 05:09:18,176 INFO  [main] zookeeper.ZooKeeper: Client environment:java.version=1.8.0_152
2023-06-03 05:09:18,176 INFO  [main] zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
2023-06-03 05:09:18,176 INFO  [main] zookeeper.ZooKeeper: Client environment:java.home=/usr/local/src/jdk/jre
......
2023-06-03 05:09:18,176 INFO  [main] zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
2023-06-03 05:09:18,176 INFO  [main] zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
2023-06-03 05:09:18,177 INFO  [main] zookeeper.ZooKeeper: Client environment:os.name=Linux
2023-06-03 05:09:18,177 INFO  [main] zookeeper.ZooKeeper: Client environment:os.arch=amd64
2023-06-03 05:09:18,177 INFO  [main] zookeeper.ZooKeeper: Client environment:os.version=3.10.0-862.el7.x86_64
2023-06-03 05:09:18,177 INFO  [main] zookeeper.ZooKeeper: Client environment:user.name=hadoop
2023-06-03 05:09:18,177 INFO  [main] zookeeper.ZooKeeper: Client environment:user.home=/home/hadoop
2023-06-03 05:09:18,177 INFO  [main] zookeeper.ZooKeeper: Client environment:user.dir=/usr/local/src/hbase/bin
2023-06-03 05:09:18,177 INFO  [main] zookeeper.ZooKeeper: Initiating client connection, connectString=master:2181,slave1:2181,slave2:2181 sessionTimeout=120000 watcher=hconnection-0x2cb3d0f70x0, quorum=master:2181,slave1:2181,slave2:2181, baseZNode=/hbase
2023-06-03 05:09:18,187 INFO  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Opening socket connection to server slave2/192.168.100.30:2181. Will not attempt to authenticate using SASL (unknown error)
2023-06-03 05:09:18,187 INFO  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Socket connection established to slave2/192.168.100.30:2181, initiating session
2023-06-03 05:09:18,193 INFO  [main-SendThread(slave2:2181)] zookeeper.ClientCnxn: Session establishment complete on server slave2/192.168.100.30:2181, sessionid = 0x38880590ddb0009, negotiated timeout = 40000
2023-06-03 05:09:18,230 INFO  [main] util.RegionSizeCalculator: Calculating region sizes for table "student".
2023-06-03 05:09:18,438 INFO  [main] client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2023-06-03 05:09:18,439 INFO  [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x38880590ddb0009
2023-06-03 05:09:18,442 INFO  [main] zookeeper.ZooKeeper: Session: 0x38880590ddb0009 closed
2023-06-03 05:09:18,442 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2023-06-03 05:09:18,495 INFO  [main] mapreduce.JobSubmitter: number of splits:1
2023-06-03 05:09:18,505 INFO  [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2023-06-03 05:09:18,574 INFO  [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1685780654813_0002
2023-06-03 05:09:18,896 INFO  [main] impl.YarnClientImpl: Submitted application application_1685780654813_0002
2023-06-03 05:09:18,914 INFO  [main] mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1685780654813_0002/
2023-06-03 05:09:18,914 INFO  [main] mapreduce.Job: Running job: job_1685780654813_0002
2023-06-03 05:09:24,040 INFO  [main] mapreduce.Job: Job job_1685780654813_0002 running in uber mode : false
2023-06-03 05:09:24,041 INFO  [main] mapreduce.Job:  map 0% reduce 0%
2023-06-03 05:09:28,102 INFO  [main] mapreduce.Job:  map 100% reduce 0%
2023-06-03 05:09:28,121 INFO  [main] mapreduce.Job: Job job_1685780654813_0002 completed successfully
2023-06-03 05:09:28,181 INFO  [main] mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=148543
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=65
		HDFS: Number of bytes written=6354
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Rack-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=2077
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=2077
		Total vcore-seconds taken by all map tasks=2077
		Total megabyte-seconds taken by all map tasks=2126848
	Map-Reduce Framework
		Map input records=17
		Map output records=17
		Input split bytes=65
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=63
		CPU time spent (ms)=870
		Physical memory (bytes) snapshot=171892736
		Virtual memory (bytes) snapshot=2135916544
		Total committed heap usage (bytes)=91750400
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=6354

image-20230603171056675

image-20230603171032583