NLTK debug记录——"[nltk_data] Error loading xxx"下载数据集失败

发布时间 2023-10-26 15:08:18作者: Mju_halcyon

问题:运行nltk.download("xxx")时遇到连接下载失败Error

解决:

  1. 在gitee上下载对应的.zip词库包(如,nltk_data/pakages/copora/目录下的下载链接);
  2. NLTK下载数据集时会自动搜索某些以./nltk_data/为结尾的目录(见附注),找到一个这样的目录并确保自己有写这个目录的权限,如果上一层目录下没有nltk_data文件夹就新建一个名为nltk_data的文件夹,将1. 中下载的.zip文件上传到./nltk_data/下,重新运行代码即可。


【附注】找到nltk下载数据集时会搜索和存放的目录:

查看nltk的安装目录下的downloader.py下载代码,

vim ~/.local/lib/python3.8/site-packages/nltk/downloader.py

发现下载数据集的函数的注释中有以下备选目录:

``/usr/share/nltk_data``, ``/usr/local/share/nltk_data``,
``/usr/lib/nltk_data``, ``/usr/local/lib/nltk_data``, ``~/nltk_data``
... ...
    def default_download_dir(self):
        """
        Return the directory to which packages will be downloaded by
        default.  This value can be overridden using the constructor,
        or on a case-by-case basis using the ``download_dir`` argument when
        calling ``download()``.

        On Windows, the default download directory is
        ``PYTHONHOME/lib/nltk``, where *PYTHONHOME* is the
        directory containing Python, e.g. ``C:\\Python25``.

        On all other platforms, the default directory is the first of
        the following which exists or which can be created with write
        permission: ``/usr/share/nltk_data``, ``/usr/local/share/nltk_data``,
        ``/usr/lib/nltk_data``, ``/usr/local/lib/nltk_data``, ``~/nltk_data``.
        """
... ...

选择有写权限的目录创建并存放数据集的.zip压缩文件即可。