记录调试langchain-ChatGLM的坑-526互联

简介：

chatGPT带火了今年的AI，ChatGLM-6B是清华大学知识工程和数据挖掘小组（Knowledge Engineering Group (KEG) & Data Mining at Tsinghua University）发布的一个开源的对话机器人。根据官方介绍，这是一个千亿参数规模的中英文语言模型。并且对中文做了优化。本次开源的版本是其60亿参数的小规模版本，约60亿参数，本地部署仅需要6GB显存（INT4量化级别）。

具体落地，还得看程序员忽悠。

一：项目简介

chatchat-space/langchain-ChatGLM: langchain-ChatGLM, local knowledge based ChatGLM with langchain ｜基于本地知识库的 ChatGLM 问答 (github.com)

?️ 一种利用 langchain 思想实现的基于本地知识库的问答应用，目标期望建立一套对中文场景与开源模型支持友好、可离线运行的知识库问答解决方案。

? 受 GanymedeNil 的项目 document.ai 和 AlexZhangji 创建的 ChatGLM-6B Pull Request 启发，建立了全流程可使用开源模型实现的本地知识库问答应用。现已支持使用 ChatGLM-6B 等大语言模型直接接入，或通过 fastchat api 形式接入 Vicuna, Alpaca, LLaMA, Koala, RWKV 等模型。

✅ 本项目中 Embedding 默认选用的是 GanymedeNil/text2vec-large-chinese，LLM 默认选用的是 ChatGLM-6B。依托上述模型，本项目可实现全部使用开源模型离线私有部署。

或许这就是我们离线私有部署的智能AI知识库了。

二：硬件环境

败家之眼枪神6PLUS I9 12900H 16G 3060 6G

三：安装CUDA cuDNN

网上挺多，就不多说了。

cuda找版本高的下，cuDNN要对应版本，cuDNN解压进CUDA的安装目录

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\extras\demo_suite

用cmd，执行一下bandwidthTest，pass就是装好了。

四：开发平台

以前还白嫖过pycharm，现在没有了，VScode也挺好，免费。

VS插件安装：

AI现在是以python为主要语言，但是调试时说有坑，那就CONDA吧，据说可以自动安装依赖

Conda :: Anaconda.org

python基础3.10.11

五：创建虚拟环境

打开VScode，打开你clone的文件夹，按F1,输入create ,在搜索结果中点击 python 创建环境，再点击conda，再选择python基础版本。

当右下角这个提示消失的时候，虚拟环境就创建好了。

第一个坑，conda无法识别

尝试进入终端安装依赖

PS D:\artificialIntelligence\langchain-ChatGLM> C:/ProgramData/anaconda3/Scripts/activate
PS D:\artificialIntelligence\langchain-ChatGLM> conda activate d:\artificialIntelligence\langchain-ChatGLM\.conda
conda : 无法将“conda”项识别为 cmdlet、函数、脚本文件或可运行程序的名称。请检查名称的拼写，如果包括路径，请确保路径正确，然后再试一次。
所在位置 行:1 字符: 1
+ conda activate d:\artificialIntelligence\langchain-ChatGLM\.conda
+ ~~~~~
    + CategoryInfo          : ObjectNotFound: (conda:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException
 
PS D:\artificialIntelligence\langchain-ChatGLM>

确保添加环境变量

记得添加环境变量后要重启VScode

再次进入终端，提示 conda init，没抓图。

随便找个终端 conda init一下。

再进终端如下图，算是正常了。

GBK编码又出问题

继续环境变量，添加一个PYTHONUTF8 1

记得添加环境变量后要重启VScode

(d:\artificialIntelligence\langchain-ChatGLM\.conda) PS D:\artificialIntelligence\langchain-ChatGLM> pip install -r .\requirements.txt

好了，可以开始安装依赖了。由于有人工智能的库，全部安装完，有2.33G。

尝试运行webui.py

打开webui.py,点击右边的小三角播放按钮即可。

小坑：

pip install accelerate

(d:\artificialIntelligence\langchain-ChatGLM\.conda) PS D:\artificialIntelligence\langchain-ChatGLM> & d:/artificialIntelligence/langchain-ChatGLM/.conda/python.exe d:/artificialIntelligence/langchain-ChatGLM/webui.py
INFO  2023-07-31 20:28:03,704-1d: 
loading model config
llm device: cpu
embedding device: cpu
dir: d:\artificialIntelligence\langchain-ChatGLM
flagging username: ab285827e5eb4d5ca45f7022804603eb

load_model_config THUDM/chatglm-6b...
Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 773/773 [00:00<?, ?B/s]
D:\artificialIntelligence\langchain-ChatGLM\.conda\lib\site-packages\huggingface_hub\file_download.py:133: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users\jacka\.cache\huggingface\hub. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
Downloading (…)iguration_chatglm.py: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.28k/4.28k [00:00<?, ?B/s]
A new version of the following files was downloaded from https://huggingface.co/THUDM/chatglm-6b:
- configuration_chatglm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Loading THUDM/chatglm-6b...
Warning: self.llm_device is False.
This means that no use GPU  bring to be load CPU mode

Downloading (…)/modeling_chatglm.py: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57.6k/57.6k [00:00<00:00, 147kB/s]
A new version of the following files was downloaded from https://huggingface.co/THUDM/chatglm-6b:
- modeling_chatglm.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Traceback (most recent call last):
  File "d:\artificialIntelligence\langchain-ChatGLM\webui.py", line 334, in <module>
    model_status = init_model()
  File "d:\artificialIntelligence\langchain-ChatGLM\webui.py", line 106, in init_model
    llm_model_ins = shared.loaderLLM()
  File "d:\artificialIntelligence\langchain-ChatGLM\models\shared.py", line 39, in loaderLLM
    loaderCheckPoint.reload_model()
  File "d:\artificialIntelligence\langchain-ChatGLM\models\loader\loader.py", line 453, in reload_model
    self.model, self.tokenizer = self._load_model()
  File "d:\artificialIntelligence\langchain-ChatGLM\models\loader\loader.py", line 252, in _load_model
    model = LoaderClass.from_pretrained(checkpoint, **params).to(self.llm_device, dtype=float)
  File "D:\artificialIntelligence\langchain-ChatGLM\.conda\lib\site-packages\transformers\models\auto\auto_factory.py", line 462, in from_pretrained
    return model_class.from_pretrained(
  File "D:\artificialIntelligence\langchain-ChatGLM\.conda\lib\site-packages\transformers\modeling_utils.py", line 2184, in from_pretrained
    raise ImportError(
ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`

`pip install accelerate` 看提示挺简单

小坑：check failed

OSError: Consistency check failed: file should be of size 1980385902 but has size 1253815098 ((…)l-00003-of-00008.bin).

下载出错，再来一遍。

小坑：远程主机强迫关闭了一个现有的连接

requests.exceptions.ConnectionError: (ProtocolError('Connection aborted.', ConnectionResetError(10054, '远程主机强迫关闭了一个现有的连接。', None, 10054, None)), '(Request ID: a4081b67-8dc6-4854-b815-0b9d7f72cd3a)')

没有魔法的时候，不跟中国人玩啊。老实魔法吧。

或者看说明，手动加载离线模型。

小坑：not enough memory

RuntimeError: [enforce fail at ..\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 2138570752 bytes.

内存不足？

更换模型

使用默认模型ChatGML-6B,提示内存不足，也不知道是为什么，反正ChatGML2也发布了，换个模型试试。

修改configs/model_confit.py,其中llm_model_dict，就是模型列表，LLM_MODEL=配置使用的模型，改为chatglm2-6b-int4。继续运行webui.py

小坑：'gcc' 不是内部或外部命令，也不是可运行的程序

没有安装GCC

https://sourceforge.net/projects/mingw/ 去这个网站下载mingw，下载后的exe运行即可。

Basic Setup 右侧 mingw32-gcc-g++ 右键 Mark for Installation, 然后点上边的Installation，选Apply Changes

配置环境变量：添加 MingGW\bin

使用GPU

还是不行啊，还是内存不足，但是另一条信息是

loading model config
llm device: cpu
embedding device: cpu

没有使用GPU，仅使用了CPU，先换GPU操作再说吧。

Start Locally | PyTorch https://pytorch.org/get-started/locally/

选一选，复制命令，就可以装cuda11.8版本的torch了。

老有人建议离线安装，可能是需要魔法吧。

cuda12.1也有预览版了。96

跑一个简单的测试

import torch
import torchvision

print(torch.__version__)
print(torch.cuda.is_available())

2.0.1
True

True，就是可以使用cuda版的torch了。

langchain-chatglm简易langchain过程

langchain-chatglm langchain chatglm

langchain-chatchat langchain chatchat chatglm

chatglm_langchain_demo

chatglm_langchain

langchain-chatchat langchain chatchat chatglm3

langchain chatglm2 chatglm 6b

langchain chatglm2模型chatglm

知识库langchain chatglm知识