【Lustre相关】应用部署-01-源码编译IB驱动及lustre软件包

发布时间 2023-11-25 15:12:47作者: Luxf0

一、编译安装

系统版本:CentOS Linux release 7.9.2009 (Core)
内核版本:3.10.0-1160.el7.x86_64
网卡型号:Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
软件版本:lfs 2.12.9 ib

注:使用CentOS-7-x86_64-Everything-2009ISO,选择Minimal install安装,勾选Debugging ToolsDevelopment Tools软件包

1、安装e2fsprogs

下载地址:https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/RPMS/x86_64/

下载安装e2fsprogs软件包

[root@node91 01-e2fsprogs]# ls
e2fsprogs-1.46.6-wc1.el7.x86_64.rpm            e2fsprogs-libs-1.46.6-wc1.el7.x86_64.rpm    libcom_err-devel-1.46.6-wc1.el7.x86_64.rpm
e2fsprogs-debuginfo-1.46.6-wc1.el7.x86_64.rpm  e2fsprogs-static-1.46.6-wc1.el7.x86_64.rpm  libss-1.46.6-wc1.el7.x86_64.rpm
e2fsprogs-devel-1.46.6-wc1.el7.x86_64.rpm      libcom_err-1.46.6-wc1.el7.x86_64.rpm        libss-devel-1.46.6-wc1.el7.x86_64.rpm
[root@node91 01-e2fsprogs]# yum install *.rpm

2、安装lustre内核版本

下载地址:https://downloads.whamcloud.com/public/lustre/lustre-2.12.9-ib/el7.9.2009/server/RPMS/x86_64/

安装lustre内核版本,重启后查看当前内核版本信息为3.10.0-1160.49.1.el7_lustre.x86_64

[root@node91 02-kernel-lustre]# ls
kernel-3.10.0-1160.49.1.el7_lustre.x86_64.rpm            kernel-debuginfo-common-x86_64-3.10.0-1160.49.1.el7_lustre.x86_64.rpm  kernel-headers-3.10.0-1160.49.1.el7_lustre.x86_64.rpm
kernel-debuginfo-3.10.0-1160.49.1.el7_lustre.x86_64.rpm  kernel-devel-3.10.0-1160.49.1.el7_lustre.x86_64.rpm
[root@node91 02-kernel-lustre]# yum install *.rpm
[root@node91 02-kernel-lustre]# reboot

3、编译安装IB驱动包

下载地址:https://network.nvidia.com/products/infiniband-drivers/linux/mlnx_ofed/#tabs-1
相关版本选择如下:
Archive Version
-Version(Archive):5.8-1.1.2.1-LTS
-OS Distribution:RHEL/CentOS/Rocky
-OS Distribution Version:RHEL/CentOS 7.9
-Architecture:x86_64
-Download:MLNX_OFED_LINUX-5.8-1.1.2.1-rhel7.9-x86_64.tgz

  • 安装依赖包:
yum install libusbx pciutils lsof tcl fuse-libs tcsh tk python-devel createrepo
  • IB驱动编译安装
tar -zxvf MLNX_OFED_LINUX-5.8-1.1.2.1-rhel7.9-x86_64.tgz
cd MLNX_OFED_LINUX-5.8-1.1.2.1-rhel7.9-x86_64
 ./mlnxofedinstall --all --force --without-kmod-iser --without-xpmem-modules --without-libxpmem --add-kernel-support
dracut -f
/etc/init.d/openibd restart
  • 检查openibdopensmd服务状态
[root@node91 MLNX_OFED_LINUX-5.8-1.1.2.1-rhel7.9-x86_64]# /etc/init.d/openibd status


  HCA driver loaded


Configured IPoIB devices:
ib0 ib1


Currently active IPoIB devices:
ib0
ib1
Configured Mellanox EN devices:


Currently active Mellanox devices:
ib0
ib1


The following OFED modules are loaded:


  rdma_ucm
  rdma_cm
  ib_ipoib
  mlx5_core
  mlx5_ib
  ib_uverbs
  ib_umad
  ib_cm
  ib_core
  mlxfw

4、编译安装lustre

下载地址:https://downloads.whamcloud.com/public/lustre/lustre-2.12.9-ib/el7.9.2009/server/SRPMS/

  • 安装依赖包
yum -y install automake xmlto asciidoc elfutils-libelf-devel zlib-devel binutils-devel newt-devel python-devel libyaml-devel
yum -y install pesign numactl-devel pciutils-devel ncurses-devel libselinux-devel
yum -y install attr cifs-utils gssproxy keyutils libbasicobjects libcollection libevent libini_config libldb libnfsidmap libpath_utils libref_array libtalloc libtdb libtevent libtirpc  libverto-libevent libwbclient net-tools  nfs-utils psmisc quota quota-nls resource-agents rpcbind samba-client-libs samba-common samba-common-libs tcp_wrappers
  • 下载源码文件,编译软件包
wget https://downloads.whamcloud.com/public/lustre/lustre-2.12.9-ib/el7.9.2009/server/SRPMS/lustre-2.12.9-1.src.rpm
rpm2cpio lustre-2.12.9-1.src.rpm |cpio -div
tar -zxvf lustre-2.12.9.tar.gz
cd lustre-2.12.9
time ./configure --with-o2ib=/usr/src/ofa_kernel/default 2>&1 | tee log-configure.txt
time make -j $(nproc) rpms  2>&1 | tee log-make.txt
  • 解决dysm错误,编译安装mlnx kmod软件

Re: [lustre-discuss] ksym errors on kmod-lustre RPM after 2.12.0 build against MOFED 4.5-1

rpmbuild --rebuild --define 'KMP 1' mlnx-ofa_kernel-5.8-OFED.5.8.1.1.2.1.src.rpm
rpm -ivh /root/rpmbuild/RPMS/x86_64/kmod-mlnx-ofa_kernel-5.8-OFED.5.8.1.1.2.1.x86_64.rpm
  • 安装lustre编译软件包
[root@node91 04-lustre]# ls *.rpm
kmod-lustre-2.12.9-1.el7.x86_64.rpm              lustre-2.12.9-1.el7.x86_64.rpm            lustre-osd-ldiskfs-mount-2.12.9-1.el7.x86_64.rpm
kmod-lustre-osd-ldiskfs-2.12.9-1.el7.x86_64.rpm  lustre-debuginfo-2.12.9-1.el7.x86_64.rpm  lustre-resource-agents-2.12.9-1.el7.x86_64.rpm
kmod-lustre-tests-2.12.9-1.el7.x86_64.rpm        lustre-iokit-2.12.9-1.el7.x86_64.rpm      lustre-tests-2.12.9-1.el7.x86_64.rpm
[root@node91 04-lustre]# yum install *.rpm

二、软件部署

1、IB网络配置

  • 查看当前存在两个ib网卡
[root@node91 ~]# ibstatus 
Infiniband device 'mlx5_0' port 1 status:
    default gid:     fe80:0000:0000:0000:e8eb:d303:0032:056e
    base lid:     0xa4
    sm lid:         0x33
    state:         4: ACTIVE
    phys state:     5: LinkUp
    rate:         200 Gb/sec (4X HDR)
    link_layer:     InfiniBand


Infiniband device 'mlx5_1' port 1 status:
    default gid:     fe80:0000:0000:0000:e8eb:d303:0032:2d6a
    base lid:     0xa5
    sm lid:         0x33
    state:         4: ACTIVE
    phys state:     5: LinkUp
    rate:         200 Gb/sec (4X HDR)
    link_layer:     InfiniBand
  • 修改ib0网卡配置,重启网络服务
[root@node91 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib0 
CONNECTED_MODE=no
TYPE=InfiniBand
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ib0
UUID=32420cf2-6708-4cc7-b2b6-c27b55e3480b
DEVICE=ib0
ONBOOT=yes
IPADDR=30.6.1.147
PREFIX=16
[root@node91 ~]# systemctl restart network

2、配置lustre集群

  • 修改lustre相关配置,加载相关模块
modinfo lustre
echo "options lnet networks=o2ib(ib0)" > /etc/modprobe.d/lustre.conf
depmod -a
systemctl restart lustre
  • 关闭防火墙
systemctl disable firewalld
systemctl stop firewalld
  • 创建mdt、mgt、ost,挂载lustre集群
mkdir /lustre/mdt0 -p
mkdir /lustre/ost0 -p
mkfs.lustre --mgs --mdt --index 0 --backfstype=ldiskfs /dev/sdb 
mkfs.lustre --fsname lustre --mgs --mdt --index 0 --backfstype=ldiskfs /dev/sdb
mount -t lustre /dev/sdb /lustre/mdt0/
mkfs.lustre --fsname=lustre --ost --mgsnode=30.6.1.147@o2ib --index 0 --backfstype=ldiskfs /dev/sdc
mount /dev/sdc /lustre/ost0/
mount -t lustre /dev/sdc /lustre/ost0/

mkdir /lustrefs
mount -t lustre 30.6.1.147@o2ib:/lustre /lustrefs/