记Linux跑ChatGLM2的坑-526互联

记录一下踩过的坑…

0. 环境配置:

全程国内网 Ubuntu 20.04 with Python 3.8 and CUDA 12.2

RTX3060 Laptop (6G)

1. ChatGLM的下载:

# clone 仓库
git clone https://gitclone.com/github.com/THUDM/ChatGLM2-6B
cd ChatGLM2-6B
pip3 install -r requirements.txt #pip加速自行百度
export HF_ENDPOINT=https://hf-mirror.com #模型下载镜像加速

2. CUDA安装自行百度，安装后需要手动软链接一些库，具体如下:

cd /usr/local/cuda/lib64
ln -s libcudart.so.12 libcudart.so
ln -s libcublasLt.so.12 libcublasLt.so
ln -s libcublas.so.12 libcublas.so
ln -s /usr/lib/x86_64-linux-gnu/libcuda.so .

3. 修改模型加载逻辑，因为3060的内存只有6G，不足以加载模型，将AutoModel一行修改如下:

model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4", trust_remote_code=True, max_memory={0: "6GiB",  "cpu": "10GiB"})

之后就可以很流畅的加载模型了，显存消耗在4G左右。

4. cli_demo.py里的退出是输入stop，不要再Ctrl+C了！

本文来自博客园，作者：星如雨yu，转载请注明原文链接：https://www.cnblogs.com/tianpanyu/p/17909478.html

ChatGLM2

ChatGLM

Linux

chatglm2 chatglm linux

chatglm2 chatglm lora p40

chatglmforconditionalgeneration源码chatglm2

chatglm2b chatglm2 chatglm python

langchain chatglm2 chatglm 6b

chatglm2 centos7 chatglm centos