在Mac上安装lxml

发布时间 2023-10-11 21:12:54作者: jrleader

最近想开始学习一下爬虫,用来截取一些网页中的段落文字、列表、表格等信息。联想到HTML的DOM树结构,就想是不是用XPath来解析会比较合适。于是自己想从Python结合XPath的方向入手来实现网页内容解析。

提到Python与XPath结合,就要用到lxml这个包了。它是一款由Stefan Behnel等开发者发起的一个开源项目,可以用于简化XML解析的流程,其中的etree模块适合与XPath结合实现对xml的解析。

不料,在用pip安装lxml时遇到问题,终端输出如下 (系统环境为MacOS X 13.2.1):

building 'lxml.etree' extension
creating build/temp.macosx-11.1-arm64-3.9
creating build/temp.macosx-11.1-arm64-3.9/src
creating build/temp.macosx-11.1-arm64-3.9/src/lxml
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/xiaoming/miniconda3/envs/datasci/include -arch arm64 -I/Users/xiaoming/miniconda3/envs/datasci/include -fPIC -O2 -isystem /Users/xiaoming/miniconda3/envs/datasci/include -arch arm64 -DCYTHON_CLINE_IN_TRACEBACK=0 -I/usr/include -Isrc -Isrc/lxml/includes -I/Users/xiaoming/miniconda3/envs/datasci/include/python3.9 -c src/lxml/etree.c -o build/temp.macosx-11.1-arm64-3.9/src/lxml/etree.o -w -flat_namespace
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
Compile failed: command '/usr/bin/clang' failed with exit code 1
cc -I/usr/include -I/usr/include/libxml2 -c /var/folders/5l/1dwkjn917h51_g_2x3wf8v240000gn/T/xmlXPathInitppw2l733.c -o var/folders/5l/1dwkjn917h51_g_2x3wf8v240000gn/T/xmlXPathInitppw2l733.o
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
*********************************************************************************
Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?
Perhaps try: xcode-select --install
*********************************************************************************
error: command '/usr/bin/clang' failed with exit code 1

在上网搜索后发现问题来自MacOS上的C编译器未被正确设置。

解决方案:

1. 打开终端

2. 输入gcc --version检查C编译器版本,如已安装则输出如下:

Apple clang version 14.0.3 (clang-1403.0.22.14.1)

Target: arm64-apple-darwin22.3.0

Thread model: posix

InstalledDir: /Library/Developer/CommandLineTools/usr/bin

如未安装则输出如下:

xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun

3. 如未安装,则在终端再运行 xcode-select --install 命令。

4. 安装完毕后,再运行gcc --version,结果应为2.中已安装的输出

5. (如已安装conda)开启相应的conda环境,并在该环境内运行

pip install --user lxml -i https://pypi.tuna.tsinghua.edu.cn/simple --progress-bar on
 
6. 运行import lxml; lxml.__version__, 确认无误后问题解决。