手把手教你在Ubuntu上部署中文LLAMA-2大模型-526互联

一、前言

llama2作为目前最优秀的的开源大模型，相较于chatGPT，llama2占用的资源更少，推理过程更快，本文将借助llama.cpp工具在ubuntu(x86\ARM64）平台上搭建纯CPU运行的中文LLAMA2中文模型。

二、准备工作

1、一个Ubuntu环境（本教程基于Ubuntu20 LTS版操作）

2、确保你的环境可以连接GitHub

3、建议至少60GB以上存储空间（用于存放模型文件等）

　4、建议不低于6GB内存（仅限7B_q4k量化模型）

三、开始部署

1、配置系统

　　　输入下列命令升级和安装所需依赖

sudo apt update

sudo apt-get install gcc g++ python3 python3-pip

#安装python依赖
python3 -m pip install torch numpy sentencepiece

　　2、构建llama.cpp

　　　　从GitHub拉取llama.cpp工具，并进行构建

#拉取llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git

#构建llama.cpp
cd llama.cpp/
make -j8

　　　　(注：make -j后的数字为你的设备物理核心数）

　　3、下载LLAMA2中文模型

　　　　在Chinese-LLaMA-Alpaca-2项目中下载7B/13B的指令模型（apache模型），并将模型文件解压缩放入llama.cpp/models文件夹下

https://github.com/ymcui/Chinese-LLaMA-Alpaca-2#%E6%A8%A1%E5%9E%8B%E4%B8%8B%E8%BD%BD

　　4、量化部署模型

# 安装 Python dependencies
python3 -m pip install torch numpy sentencepiece

# 生成量化模型
python3 convert.py models/前面放入的模型文件夹名称

#4-bit量化
./quantize ./models/前面放入的模型文件夹名称/ggml-model-f16.gguf ./models/7B_q4k.gguf q4k

　　顺利完成上述操作后，models文件夹下会生成一个名为7B_q4k.gguf的模型文件

　　5、启动模型

　　　　将中文llama2模型项目中的scripts/llama-cpp/chat.sh文件拷贝到llama.cpp目录下，并执行以下指令

chmod +x chat.sh

#使用以下命令启动聊天
./chat.sh models/7B_q4k.gguf '请列举5条文明乘车的建议'

llama2-chinese模型chinese之旅

llama2-chinese模型chinese项目

模型llama

模型llama2 llama 3090

词表llama2-chinese模型chinese

ziya-llama模型llama ziya

模型alpaca vicuna llama

项目llama2-chinese模型chinese