Spark 3.5.0 高可用部署

发布时间 2024-01-02 19:45:39作者: SpringCore

1.下载Spark 3.5.0

https://spark.apache.org/downloads.html

2.安装JDK

Linux 安装Openjdk

3.安装Hadoop

Hadoop-3.3.6分布式集群搭建步骤

4.安装Zookeeper

ZooKeeper 3.9.1 集群模式安装

5.解压

mkdir /usr/spark
tar -zxvf spark-3.5.0-bin-hadoop3.tgz -C /usr/spark/

6.配置

1.修改集群节点配置,添加节点

cd /usr/spark/spark-3.5.0-bin-hadoop3/conf
mv workers.template workers
vi workers
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# A Spark Worker will be started on each of the machines listed below.
localhost
192.168.58.131
192.168.58.132

2.配置Java环境变量

mv spark-env.sh.template spark-env.sh
vi spark-env.sh
export JAVA_HOME=/usr/java/jdk8u392-b08
#Master 监控页面默认访问端口为 8080,可能会和 Zookeeper 冲突,所以改成 8989,也可以自
定义,访问 UI 监控页面时请注意
SPARK_MASTER_WEBUI_PORT=8989
export SPARK_DAEMON_JAVA_OPTS="
-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=192.168.58.130,192.168.58.131,192.168.58.132
-Dspark.deploy.zookeeper.dir=/spark"

3.为所有节点同步配置【略】

7.启动

/usr/spark/spark-3.5.0-bin-hadoop3/sbin/start-all.sh

8.访问WebUI

http://192.168.58.130:8080/

9.跑一个内置的测试任务

bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://192.168.58.130:7077 ./examples/jars/spark-examples_2.12-3.5.0.jar 10