猫眼电影爬虫

发布时间 2023-11-15 13:12:21作者: darling1004

步骤
首先利用pip指令安装所需要的soup以及request库(pip下载速度慢可使用pip镜像,更改下载路径到国内网站)
然后对猫眼电影网站进行分析,利用request进行信息的获取,利用soup库进行信息查找和整理。最后进行输出,写入txt文件中
代码的实现如下
import requests
from bs4 import BeautifulSoup

def movie(url):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/58.0.3029.110 Safari/537.3",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"
}

#print("craw html:", url)

def write(file_name, data):
with open(file_name, "w", encoding="utf-8") as file:
file.write(data)

response = requests.get(url, headers=headers)
response.encoding = 'utf-8'
html_content = response.text
soup = BeautifulSoup(html_content, 'html.parser')
outputs = soup.find_all('p', {'class': 'name'})
#print(outputs)
with open("output.txt", "a", encoding="utf-8") as file:
for output in outputs:
file.write(output.text + "\n")


for a in range(0, 100, 10):
url = f"https://www.maoyan.com/board/4?offset={a}"
movie(url)

遇到的问题

电影榜单需要翻页,找出每页的网址的规律,利用函数进行循环,依次打出10页的内容,完成输出。