python使用lxml获取所有href标签

发布时间 2023-04-18 21:40:39作者: ministep88

lxml 获取或有标签

from lxml import etree
xhtmle= etree.HTML(text)
eles = xhtmle.xpath("//div[@class='sidebar']/ul/li")
for ele in eles:
    href = ele.xpath(".//a/@href")
    print(href)
    print('- - - '*30)

lxml的href短链转换成链接

from urllib.parse import urlparse, parse_qs
url ="https://developer.work.weixin.qq.com/document/path/97108"
netloc = urlparse(url).netloc
url = netloc+href

参考:

python - 通过 beautifulsoup 从 href 获取 url 链接,无需重定向链接 - 爱编程的大狗
web scraping - retrieve links from web page using python and BeautifulSoup - Stack Overflow
Python lxml/beautiful soup to find all links on a web page - Stack Overflow