Springboot 系列 (27) - Springboot+HBase 大数据存储(五)| HBase REST 服务

发布时间 2023-04-02 17:38:58作者: 垄山小站


REST (Representational State Transfer) 即表述性状态传递,是 Roy Fielding 博士 2000 年在他的博士论文中提出来的一种软件架构风格。它是一种针对网络应用的设计和开发方式,可以降低开发的复杂性,提高系统的可伸缩性。

在三种主流的 Web 服务实现方案中,与复杂的 SOAP 和 XML-RPC 相比,REST 很简洁和高效,越来越多的 Web 服务开始采用 REST 风格设计和实现。

HBase 附带的 REST 服务器可以作为守护进程运行,该守护进程启动嵌入式 Jetty servlet 容器并将 Servlet 部署到其中。配置和运行 HBase 附带的 REST 服务器,可以将 HBase 的表、行、单元格和元数据公开为 URL 指定的资源。

HBase REST 相关文档:https://hbase.apache.org/book.html#_rest

HBase 的安装配置,请参考 “Springboot 系列 (24) - Springboot+HBase 大数据存储(二)| 安装配置 Apache HBase 和 Apache Zookeeper”。

HBase API 操作表的相关命令,请参考 “Springboot 系列 (25) - Springboot+HBase 大数据存储(三)| HBase Shell ”。

本文将介绍 HBase REST 的使用方式。

 

1. 系统环境

    操作系统:Ubuntu 20.04
    Java 版本:openjdk 11.0.18
    Hadoop 版本:3.2.2
    Zookeeper 版本:3.6.3

    HBase 版本:2.4.4
    HBase 所在路径:~/apps/hbase-2.4.4

    本文 HBase 在 HBase + Zookeeper (独立的) 模式下运行,Zookeeper 使用端口 2182。

 

2. 启动 REST 服务器

    $ cd ~/apps

    # 前台运行,默认端口为 8080
    $ ./hbase-2.4.4/bin/hbase rest start -p 8888

    # 后台运行
    $ ./hbase-2.4.4/bin/hbase-daemon.sh start rest -p 8888

    # 显示 HBase 版本
    $ curl -X GET -H "Accept: text/plain" "http://localhost:8888/version/cluster"

        2.4.4

    # HBase 群集状态
    $ curl -X GET -H "Accept: text/plain" "http://localhost:8888/status/cluster"

        1 live servers, 0 dead servers, 4.0000 average load

        1 live servers
            Test-Ubuntu20:16020 1679556060710
                requests=16, regions=4
                heapSizeMB=41
                maxHeapSizeMB=494
        ...


    # 停止 REST 服务器
    $ ./hbase-2.4.4/bin/hbase-daemon.sh stop rest

    注:本文使用 Ubuntu 下的 curl 作为访问 REST API 的客户端,也可以使用 wget 或 Windows 下的 Postman 等程序。

        示例的数据格式以 JSON 格式为主,比如以上 curl 命令,可以把 "Accept: text/plain" 改成 "Accept: application/json" 或 "Accept: text/xml",返回不同的数据格式。


3. Table 操作

    # 查看所有表
    $ curl -X GET -H "Accept: application/json" "http://localhost:8888"

        {"table":[{"name":"test"},{"name":"user"}]}

    $ curl -X GET -H "Accept: text/plain" "http://localhost:8888"

        test
        user


    # 创建 demo 表,一个列族 cf1
    $ curl -v -X PUT \
        -H "Accept: application/json" \
        -H "Content-Type: application/json" \
        -d '{"name":"demo","ColumnSchema":[{"name":"cf1"}]}' \
        "http://localhost:8888/demo/schema"


    # 显示表结构信息   
    $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/schema"

        {"name":"demo","ColumnSchema":[{"name":"cf1","BLOOMFILTER":"ROW","IN_MEMORY":"false","VERSIONS":"1","KEEP_DELETED_CELLS":"FALSE","DATA_BLOCK_ENCODING":"NONE","COMPRESSION":"NONE","TTL":"2147483647","MIN_VERSIONS":"0","BLOCKCACHE":"true","BLOCKSIZE":"65536","REPLICATION_SCOPE":"0"}],"IS_META":"false"}


    # 添加 2 个列族
    $ curl -v -X POST \
        -H "Accept: application/json" \
        -H "Content-Type: application/json" \
        -d '{"name":"demo","ColumnSchema":[{"name":"cf2"},{"name":"cf3"}]}' \
        "http://localhost:8888/demo/schema"


    # 删除 cf3 列族,不是使用 POST,应该使用 PUT 替换表结构
    $ curl -v -X PUT \
        -H "Accept: application/json" \
        -H "Content-Type: application/json" \
        -d '{"name":"demo","ColumnSchema":[{"name":"cf1"},{"name":"cf2"}]}' \
        "http://localhost:8888/demo/schema"


    # 修改 cf1 列族的 VERSIONS 为 2,默认是 1 (即只保留最后一个版本)
    $ curl -v -X POST \
        -H "Accept: application/json" \
        -H "Content-Type: application/json" \
        -d '{"name":"demo","ColumnSchema":[{"name":"cf1","VERSIONS":"2"}]}' \
        "http://localhost:8888/demo/schema"


    # 显示表分区
    $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/regions"

        {"name":"demo","Region":[{"id":1680327485917,"startKey":"","endKey":"","location":"Test-Ubuntu20:16020","name":"demo,,1680327485917.c9ca998d01c045d8f87535ba444de7c2."}]}


    # 删除表
    $ curl -v -X DELETE -H "Accept: application/json" "http://localhost:8888/demo/schema"

 


4. 添加数据

    以前文创建的 demo 表为例,demo 表包含 cf1,cf2 和 cf3 三个列族 (Column Family) ,我们将向 demo 表添加如下数据:

id name age job
row1 Tom 12 Student
row2 Jerry 9 Engineer
row3 Jerry 10 Engineer

    REST 接口在操作数据时,会对 key、column、value 等值进行 Base64 编解码,可以运行如下命令编解码:

        $ echo -ne "Tom" | base64   # 编码

            VG9t

        $ echo -ne "VG9t" | base64 -d   # 解码

            Tom


        注:echo 的 -n 表示不换行输出,-e 表示处理特殊字符。

    以下是操作数据时使用到的 Base64 编码列表:

原值 编码后
cf1:name Y2YxOm5hbWU=
cf1:age Y2YxOmFnZQ==
cf2:job Y2YyOmpvYg==
row1 cm93MQ==
row2 cm93Mg==
row3 cm93Mw==
Tom VG9t
Jerry SmVycnk=
12 MTI=
9 OQ==
10 MTA=
Student U3R1ZGVudA==
Engineer RW5naW5lZXI=


    示例:

        # 添加 row1 的 name、age 数据
        $ curl -v -X PUT \
            -H "Accept: application/json" \
            -H "Content-Type: application/json" \
            -d '{"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOm5hbWU=","$":"VG9t"},{"column":"Y2YxOmFnZQ==","$":"MTI="}]}]}' \
            "http://localhost:8888/demo/row1"


        # 添加 row1 的 job 数据
        $ curl -v -X PUT \
            -H "Accept: application/json" \
            -H "Content-Type: application/json" \
            -d '{"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YyOmpvYg==","$":"U3R1ZGVudA=="}]}]}' \
            "http://localhost:8888/demo/row1"


        # 添加 row2 的 name、age 和 job 数据
        $ curl -v -X PUT \
            -H "Accept: application/json" \
            -H "Content-Type: application/json" \
            -d '{"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOm5hbWU=","$":"SmVycnk="},{"column":"Y2YxOmFnZQ==","$":"OQ=="},{"column":"Y2YyOmpvYg==","$":"RW5naW5lZXI="}]}]}' \
            "http://localhost:8888/demo/row2"


        # 再次添加 row2 的 age 数据, 值为 10 (Cell 支持 2 个版本,这里不会替换原版本 9,以时间戳为区分,生成一个新版本)
        $ curl -v -X PUT \
            -H "Accept: application/json" \
            -H "Content-Type: application/json" \
            -d '{"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","$":"MTA="}]}]}' \
            "http://localhost:8888/demo/row2"


        # 添加 row3 的 name、age 和 job 数据
        $ curl -v -X PUT \
            -H "Accept: application/json" \
            -H "Content-Type: application/json" \
            -d '{"Row":[{"key":"cm93Mw==","Cell":[{"column":"Y2YxOm5hbWU=","$":"SmVycnk="},{"column":"Y2YxOmFnZQ==","$":"MTA="},{"column":"Y2YyOmpvYg==","$":"RW5naW5lZXI="}]}]}' \
            "http://localhost:8888/demo/row3"


        # 查看 row3 数据
        $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row3"

            {"Row":[{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680424477376,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680424477376,"$":"SmVycnk="},{"column":"Y2YyOmpvYg==","timestamp":1680424477376,"$":"RW5naW5lZXI="}]}]}

 

 

5. 查询数据

    1) GET 操作

        # 获取 row2 的全部数据
        $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2"

            {"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413754490,"$":"SmVycnk="},{"column":"Y2YyOmpvYg==","timestamp":1680413754490,"$":"RW5naW5lZXI="}]}]}

            注:key、column 和值是被 Base64 编码的,可以运行如下命令解码。

                $ echo -ne "cm93Mg==" | base64 -d

                    row2


        # 获取 row2 的 cf1 列族的全部数据
        $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1"

            {"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413754490,"$":"SmVycnk="}]}]}


        # 获取 row2 的 age 数据
        $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1:age"

            {"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="}]}]}

            注:MTA= 解码后的值是 10


        # 获取 row2 的 age、job 数据 (多列)
        $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1:age,cf2:job"

            {"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YyOmpvYg==","timestamp":1680413754490,"$":"RW5naW5lZXI="}]}]}t


        # 获取 row2 的 age 数据的最新 2 个版本,使用参数 v (v 参数在行和列族接口也可以使用)
        $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1:age?v=2"

            {"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOmFnZQ==","timestamp":1680416123316,"$":"OQ=="}]}]}

            注:OQ== 解码后的值是 9


         # 获取 row2 的 age 数据的第 1 个版本,使用 timestamp
         $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1:age/1680416123316"

            {"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416113142,"$":"OQ=="}]}]}


    2) 无状态 Scanner

        无状态 Scanner 不保存任何关于查询的状态,它把所有的查询条件作为参数进行一次性的查询。

        # 扫描整个表
        $ curl -X GET -H "Accept: application/json"  "http://localhost:8888/demo/*"

            {"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413723937,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413683325,"$":"VG9t"},{"column":"Y2YyOmpvYg==","timestamp":1680413707691,"$":"U3R1ZGVudA=="}]},{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413754490,"$":"SmVycnk="},{"column":"Y2YyOmpvYg==","timestamp":1680413754490,"$":"RW5naW5lZXI="}]},{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413772460,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413772460,"$":"SmVycnk="},{"column":"Y2YyOmpvYg==","timestamp":1680413772460,"$":"RW5naW5lZXI="}]}]}


        # 扫描 cf2 列簇
        $ curl -X GET -H "Accept: application/json"  "http://localhost:8888/demo/*/cf2"

            {"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YyOmpvYg==","timestamp":1680413707691,"$":"U3R1ZGVudA=="}]},{"key":"cm93Mg==","Cell":[{"column":"Y2YyOmpvYg==","timestamp":1680413754490,"$":"RW5naW5lZXI="}]},{"key":"cm93Mw==","Cell":[{"column":"Y2YyOmpvYg==","timestamp":1680413772460,"$":"RW5naW5lZXI="}]}]}


        # 扫描 cf1 和 cf2 列簇
        $ curl -X GET -H "Accept: application/json"  "http://localhost:8888/demo/*/cf1,cf2"

            {"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413723937,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413683325,"$":"VG9t"},{"column":"Y2YyOmpvYg==","timestamp":1680413707691,"$":"U3R1ZGVudA=="}]},{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413754490,"$":"SmVycnk="},{"column":"Y2YyOmpvYg==","timestamp":1680413754490,"$":"RW5naW5lZXI="}]},{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413772460,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413772460,"$":"SmVycnk="},{"column":"Y2YyOmpvYg==","timestamp":1680413772460,"$":"RW5naW5lZXI="}]}]}


        # 扫描 cf1 列簇的 age,显示最后两个版本
        $ curl -X GET -X GET -H "Accept: application/json"  "http://localhost:8888/demo/*/cf1:age?v=2"

            {"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413723937,"$":"MTA="},{"column":"Y2YxOmFnZQ==","timestamp":1680413683325,"$":"MTI="}]},{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOmFnZQ==","timestamp":1680416123316,"$":"OQ=="}]},{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413772460,"$":"MTA="}]}]}


        # 扫描整个表,限定返回行数
        $ curl -X GET -H "Accept: application/json"  "http://localhost:8888/demo/*?limit=1"

            {"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413723937,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413683325,"$":"VG9t"},{"column":"Y2YyOmpvYg==","timestamp":1680413707691,"$":"U3R1ZGVudA=="}]}]}


        # 扫描整个表,从指定行(包括该行)开始向后扫描
        $ curl -X GET -H "Accept: application/json"  "http://localhost:8888/demo/*?startrow=row2"

            {"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413754490,"$":"SmVycnk="},{"column":"Y2YyOmpvYg==","timestamp":1680413754490,"$":"RW5naW5lZXI="}]},{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413772460,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413772460,"$":"SmVycnk="},{"column":"Y2YyOmpvYg==","timestamp":1680413772460,"$":"RW5naW5lZXI="}]}]}


        # 扫描整个表,扫描到指定行(不包括该行)
        $ curl -X GET -H "Accept: application/json"  "http://localhost:8888/demo/*?endrow=row2"

            {"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413723937,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680413683325,"$":"VG9t"},{"column":"Y2YyOmpvYg==","timestamp":1680413707691,"$":"U3R1ZGVudA=="}]}]}


        # 扫描整个表,复合条件
        $ curl -X GET -H "Accept: application/json"  "http://localhost:8888/demo/*/cf1:age?v=2&limit=1&startrow=row2"

            {"Row":[{"key":"cm93MQ==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413723937,"$":"MTA="},{"column":"Y2YxOmFnZQ==","timestamp":1680413683325,"$":"MTI="}]},{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOmFnZQ==","timestamp":1680416123316,"$":"OQ=="}]},{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680413772460,"$":"MTA="}]}]}


    3) 有状态 Scanner

        略


6. 删除数据

        # 删除 row3 的 cf2 列族的 job 数据
        $ curl -v -X DELETE -H "Accept: application/json" -H "Content-Type: application/json" "http://localhost:8888/demo/row3/cf2:job/?check=delete"

        # 查看 row3 数据
        $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row3"

            {"Row":[{"key":"cm93Mw==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680423950256,"$":"MTA="},{"column":"Y2YxOm5hbWU=","timestamp":1680423950256,"$":"SmVycnk="}]}]}


        # 删除 row3 的 cf1 列族,"cf1" 的 Base64 编码值是 "Y2Yx"
        $ curl -v -X DELETE -H "Accept: application/json" -H "Content-Type: application/json" "http://localhost:8888/demo/row3/cf1/?check=delete"

        # 查看 row3 数据
        $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row3"

            Not found


        # 重新添加 row3 数据,删除 row3 整行数据
        $ curl -v -X DELETE -H "Accept: application/json" -H "Content-Type: application/json" "http://localhost:8888/demo/row3/?check=delete"

        # 查看 row3 数据
        $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row3"

            Not found


        # 查看 row2 的 age 数据的最新 2 个版本
        $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1:age?v=2"

            {"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="},{"column":"Y2YxOmFnZQ==","timestamp":1680416123316,"$":"OQ=="}]}]}

        # 删除 row2 的 age 数据的第一个版本,时间戳是 1680416123316
        $ curl -v -X DELETE -H "Accept: application/json" -H "Content-Type: application/json" "http://localhost:8888/demo/row2/cf1:age/1680416123316/?check=delete"

        # 再查看 row2 的 age 数据的最新 2 个版本
        $ curl -X GET -H "Accept: application/json" "http://localhost:8888/demo/row2/cf1:age?v=2"

            {"Row":[{"key":"cm93Mg==","Cell":[{"column":"Y2YxOmFnZQ==","timestamp":1680416162304,"$":"MTA="}]}]}

 


7. Namespace 操作

    # 创建 Namespace
    $ curl -v -X POST -H "Accept: application/json" "http://localhost:8888/namespaces/ns_test"


    # 查看所有 Namespace
    $ curl -X GET -H "Accept: application/json" "http://localhost:8888/namespaces"

        {"Namespace":["default","hbase","ns_test"]}


    # 在 ns_test 下创建 tbl_01 表,一个列族 cf1
    $ curl -v -X PUT \
        -H "Accept: application/json" \
        -H "Content-Type: application/json" \
        -d '{"name":"ns_test:tbl_01","ColumnSchema":[{"name":"cf1"}]}' \
        "http://localhost:8888/ns_test:tbl_01/schema"


    # 查看 ns_test 下的表
    $ curl -X GET -H "Accept: application/json" "http://localhost:8888/namespaces/ns_test/tables"

        {"table":[{"name":"tbl_01"}]}


    # 无法删除有表的 Namespace,需要先删除表 ns_test:tbl_01 后,再删除 ns_test
    $ curl -v -X DELETE -H "Accept: application/json" "http://localhost:8888/ns_test:tbl_01/schema"

    $ curl -v -X DELETE -H "Accept: application/json" "http://localhost:8888/namespaces/ns_test"


    # 查看所有 Namespace
    $ curl -X GET -H "Accept: application/json" "http://localhost:8888/namespaces"

        {"Namespace":["SYSTEM","default","hbase"]}