最近想开始学习一下爬虫,用来截取一些网页中的段落文字、列表、表格等信息。联想到HTML的DOM树结构,就想是不是用XPath来解析会比较合适。于是自己想从Python结合XPath的方向入手来实现网页内容解析。
提到Python与XPath结合,就要用到lxml这个包了。它是一款由Stefan Behnel等开发者发起的一个开源项目,可以用于简化XML解析的流程,其中的etree模块适合与XPath结合实现对xml的解析。
不料,在用pip安装lxml时遇到问题,终端输出如下 (系统环境为MacOS X 13.2.1):
building 'lxml.etree' extension
creating build/temp.macosx-11.1-arm64-3.9
creating build/temp.macosx-11.1-arm64-3.9/src
creating build/temp.macosx-11.1-arm64-3.9/src/lxml
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/xiaoming/miniconda3/envs/datasci/include -arch arm64 -I/Users/xiaoming/miniconda3/envs/datasci/include -fPIC -O2 -isystem /Users/xiaoming/miniconda3/envs/datasci/include -arch arm64 -DCYTHON_CLINE_IN_TRACEBACK=0 -I/usr/include -Isrc -Isrc/lxml/includes -I/Users/xiaoming/miniconda3/envs/datasci/include/python3.9 -c src/lxml/etree.c -o build/temp.macosx-11.1-arm64-3.9/src/lxml/etree.o -w -flat_namespace
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
Compile failed: command '/usr/bin/clang' failed with exit code 1
cc -I/usr/include -I/usr/include/libxml2 -c /var/folders/5l/1dwkjn917h51_g_2x3wf8v240000gn/T/xmlXPathInitppw2l733.c -o var/folders/5l/1dwkjn917h51_g_2x3wf8v240000gn/T/xmlXPathInitppw2l733.o
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
*********************************************************************************
Could not find function xmlCheckVersion in library libxml2. Is libxml2 installed?
Perhaps try: xcode-select --install
*********************************************************************************
error: command '/usr/bin/clang' failed with exit code 1
在上网搜索后发现问题来自MacOS上的C编译器未被正确设置。
解决方案:
1. 打开终端
2. 输入gcc --version检查C编译器版本,如已安装则输出如下:
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
如未安装则输出如下:
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
3. 如未安装,则在终端再运行 xcode-select --install 命令。
4. 安装完毕后,再运行gcc --version,结果应为2.中已安装的输出
5. (如已安装conda)开启相应的conda环境,并在该环境内运行