scrapy框架的user-agent替换列表

发布时间 2023-09-12 17:43:38作者: 章叁理寺
在我们请求的时候会遇见ua反爬我们可以用一个ua的列表来更换实现反扒
class RandomUADownloaderMiddleware:

    def process_request(self, request, spider):
        ua_list = [
            "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36",
            "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0",
            "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/537.75.14",
            "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; Win64; x64; Trident/6.0)",
            "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.63 Safari/537.36"
        ]
        request.headers["User-Agent"] = random.choice(ua_list)
        return None

在middlewares中自定义一个中间件

需要在settings中配置路由

DOWNLOADER_MIDDLEWARES = {
   # "scrapy_proxy.middlewares.ScrapyProxyDownloaderMiddleware": 543,"scrapy_proxy.middlewares.RandomUADownloaderMiddleware":2
}