python爬虫环境配置

发布时间 2024-01-01 20:28:59作者: Dreaife

环境配置

python3/请求库/解析库/数据库/存储库/web库/app爬取库/爬虫框架库

  • python3

    • win11下可以直接商店下载了(
    • Linux下apt-get install python3
  • 请求库

    • requests

      pip3 install requests

    • selenium

      pip install selenium

    • chromeDriver

      1. 关于查看chrome版本
      2. chromeDriver下载对应版本
      3. 将chromeDriver配置到环境变量
    • phantomJS

      新版selenium已经不支持phantomJS了,可以在chromedriver里面直接使用

      验证:

      from selenium import webdriver
      from selenium.webdriver.chrome.options import Options
      
      chrome_options = Options()
      chrome_options.add_argument('--headless')
      chrome_options.add_argument('--disable-gpu')
      driver = webdriver.Chrome(options=chrome_options)
      driver.get("https://dreaife.icu/")
      print(driver.current_url)
      
    • aiohttp

      pip install aiodns

  • 解析库

    • lxml

      pip install lxml

    • beautifulsoup4

      pip install beautifulsoup4

    • pyquery

      pip install pyquery

    • tesserocr

      • 安装tesseract

        windows

      • 安装tesserocr

        windows使用pip install <name>.whl安装

      • 验证

        image-20240101164950816

        import tesserocr
        from PIL import Image
        
        image = Image.open('G:/codeS/backOnGithub/Jupyter/spider/image.png')
        print(tesserocr.image_to_text(image))
        

        注意:如果出现File "tesserocr.pyx", line 2580, in tesserocr._tesserocr.image_to_text
        RuntimeError: Failed to init API, possibly an invalid tessdata path错误,需要先将tesseract的test_data放到错误文件夹下

  • 数据库

    • MySQL
    • MongoDB
    • Redis
  • 存储库

    • PyMySQL

      pip install pymysql

    • PyMongo

      pip install pymongo

    • redis-py

      pip install redis

    • RedisDump

      安装ruby

      gem install redis-dump

  • web库

    • Flask

      pip install flask

    • Tornado

      pip install tornado

  • app爬取库

    • charles

    • mitmproxy

      pip install mitmproxy

    • appium

  • 爬虫框架

    • pyspider

      pip install pyspider

    • scrapy

    • scrapy-splash

    • scrapy-redis