arroyo+redpanda 集成试用

发布时间 2023-04-11 07:33:40作者: 荣锋亮

arroyo 对于kafka 有着很不错的集成支持(目前版本可以说是优先支持的),使用原生kafka 是一个选择,但是部署以及管理感觉比较费事
以前简单介绍过redpanda,所以尝试下集成

环境准备

  • docker-compose
    包含了redpandadata console,connect,以及一个测试,对于arroyo 使用了与redpandadata 同一个网络
 
version: '3'
networks:
  redpanda_network:
    driver: bridge
volumes:
  redpanda: null
services:
  redpanda:
    image: docker.redpanda.com/redpandadata/redpanda:v23.1.4
    command:
      - redpanda start
      - --smp 1
      - --overprovisioned
      - --kafka-addr PLAINTEXT://0.0.0.0:29092,OUTSIDE://0.0.0.0:9092
      - --advertise-kafka-addr PLAINTEXT://redpanda:29092,OUTSIDE://localhost:9092
      - --pandaproxy-addr 0.0.0.0:8082
      - --advertise-pandaproxy-addr localhost:8082
    ports:
      - 8081:8081
      - 8082:8082
      - 9092:9092
      - 9644:9644
      - 29092:29092
    volumes:
      - redpanda:/var/lib/redpanda/data
    networks:
      - redpanda_network
 
  console:
    image: docker.redpanda.com/redpandadata/console:v2.2.3
    entrypoint: /bin/sh
    command: -c "echo \"$$CONSOLE_CONFIG_FILE\" > /tmp/config.yml; /app/console"
    environment:
      CONFIG_FILEPATH: /tmp/config.yml
      CONSOLE_CONFIG_FILE: |
        kafka:
          brokers: ["redpanda:29092"]
          schemaRegistry:
            enabled: true
            urls: ["http://redpanda:8081"]
        redpanda:
          adminApi:
            enabled: true
            urls: ["http://redpanda:9644"]
        connect:
          enabled: true
          clusters:
            - name: local-connect-cluster
              url: http://connect:8083
    ports:
      - 8080:8080
    networks:
      - redpanda_network
    depends_on:
      - redpanda
 
  owl-shop:
    image: quay.io/cloudhut/owl-shop:latest
    networks:
      - redpanda_network
    #platform: 'linux/amd64'
    environment:
      - SHOP_KAFKA_BROKERS=redpanda:29092
      - SHOP_KAFKA_TOPICREPLICATIONFACTOR=1
      - SHOP_TRAFFIC_INTERVAL_RATE=1
      - SHOP_TRAFFIC_INTERVAL_DURATION=0.1s
    depends_on:
      - redpanda
 
  connect:
    image: docker.redpanda.com/redpandadata/connectors:latest
    hostname: connect
    container_name: connect
    networks:
      - redpanda_network
    #platform: 'linux/amd64'
    depends_on:
      - redpanda
    ports:
      - "8083:8083"
    environment:
      CONNECT_CONFIGURATION: |
          key.converter=org.apache.kafka.connect.converters.ByteArrayConverter
          value.converter=org.apache.kafka.connect.converters.ByteArrayConverter
          group.id=connectors-cluster
          offset.storage.topic=_internal_connectors_offsets
          config.storage.topic=_internal_connectors_configs
          status.storage.topic=_internal_connectors_status
          config.storage.replication.factor=-1
          offset.storage.replication.factor=-1
          status.storage.replication.factor=-1
          offset.flush.interval.ms=1000
          producer.linger.ms=50
          producer.batch.size=131072
      CONNECT_BOOTSTRAP_SERVERS: redpanda:29092
      CONNECT_GC_LOG_ENABLED: "false"
      CONNECT_HEAP_OPTS: -Xms512M -Xmx512M
      CONNECT_LOG_LEVEL: info
  app:
     image: ghcr.io/arroyosystems/arroyo-single:multi-arch
     networks:
      - redpanda_network
     ports:
       - "8000:8000"
       - "8001:8001"

启动&测试

  • 启动
docker-compose up -d
  • 创建kafka topic

 

 

  • 配置arroyo connections、source 以及sink
    我们测试的时候因为部署了一个owl-shop 服务,可以直接将owl-shop 的topic 作为source,然后写入到demo topic中(sink)
    connections 配置

 

 

kafka source 配置

 

 

json schema 定义


 
{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "title": "Product",
  "type": "object",
  "properties": {
    "version": { "type": "number" },
    "id": { "type": "string" },
    "customer": {
      "type": "object",
      "properties": { "id": { "type": "string" }, "type": { "type": "string" } }
    },
    "type": { "type": "string" },
    "firstName": { "type": "string" },
    "lastName": { "type": "string" },
    "state": { "type": "string" },
    "street": { "type": "string" },
    "houseNumber": { "type": "string" },
    "city": { "type": "string" },
    "zip": { "type": "string" },
    "latitude": { "type": "number" },
    "longitude": { "type": "number" },
    "phone": { "type": "string" },
    "additionalAddressInfo": { "type": "string" },
    "createdAt": { "type": "string" },
    "revision": { "type": "number" }
  }
}

 

 


kafka sink 创建

 

 

  • 创建job (pipeline)

sql 查询

job 执行

redpanda console 效果

 

 

说明

arroyo+redpanda 的pipeline 还是一种不错的选择,至少redpanda 部署以及使用还是比较简单的,而且支持不少kafaka 周边的兼容,是一个不错的选择,
arroyo 测试目前来说支持的source 以及sink 不是很多(核心还是kafaka)但是基于sql 的处理模式真的很方便,相比kafaka 的stream 简单不少,值得尝试下

参考资料

https://github.com/rongfengliang/arroyo-redpanda-learning
https://www.cnblogs.com/rongfengliang/p/17302703.html
https://github.com/ArroyoSystems/arroyo
https://github.com/redpanda-data/redpanda
https://redpanda.com/