NLTK reuters datasets not found

Question

NLTK reuters datasets not found

2.8k Views Asked by sareem At 02 November 2018 at 16:07

I downloaded Reuters dataset from nltk using the following command:

import nltk
nltk.download('reuters')

I got a confirmation that the datset was downloaded and I can see the it under "C:/Users/username/AppData/Roaming/nltk_data".

However, when I want to read the dataset, python can't see it! I get the following error:

C:\Users\username\python\Python37-32\Lib\site-packages\sklearn\externals\joblib\externals\cloudpickle\cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
Traceback (most recent call last):
  File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\corpus\util.py", line 80, in __load
    try: root = nltk.data.find('{}/{}'.format(self.subdir, zip_name))
  File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\data.py", line 675, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource [93mreuters[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('reuters')
  [0m
  Searched in:
    - 'C:\\Users\\username/nltk_data'
    - 'C:\\nltk_data'
    - 'D:\\nltk_data'
    - 'E:\\nltk_data'
    - 'C:\\Users\\username\\python\\Python37-32\\nltk_data'
    - 'C:\\Users\\username\\python\\Python37-32\\share\\nltk_data'
    - 'C:\\Users\\username\\python\\Python37-32\\lib\\nltk_data'
    - 'C:\\Users\\username\\AppData\\Roaming\\nltk_data'
*******
During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "C:\Users\username\eclipse-workspace\ML\src\PAs\pa2\Test.py", line 17, in <module>
        from commons import util, datasets, runClassifier, mlGraphics
      File "C:\Users\username\eclipse-workspace\ML\src\commons\datasets.py", line 258, in <module>
        class Reuters:
      File "C:\Users\username\eclipse-workspace\ML\src\commons\datasets.py", line 259, in Reuters
        documents = reuters.fileids()
      File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\corpus\util.py", line 116, in __getattr__
        self.__load()
      File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\corpus\util.py", line 81, in __load
        except LookupError: raise e
      File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\corpus\util.py", line 78, in __load
        root = nltk.data.find('{}/{}'.format(self.subdir, self.__name))
      File "C:\Users\username\python\Python37-32\Lib\site-packages\nltk\data.py", line 675, in find
        raise LookupError(resource_not_found)
    LookupError: 
    *********
      Resource [93mreuters[0m not found.
      Please use the NLTK Downloader to obtain the resource:
      [31m>>> import nltk
      >>> nltk.download('reuters')
      [0m
      Searched in:
        - 'C:\\Users\\username/nltk_data'
        - 'C:\\nltk_data'
        - 'D:\\nltk_data'
        - 'E:\\nltk_data'
        - 'C:\\Users\\username\\python\\Python37-32\\nltk_data'
        - 'C:\\Users\\username\\python\\Python37-32\\share\\nltk_data'
        - 'C:\\Users\\username\\python\\Python37-32\\lib\\nltk_data'
        -C:\\Users\\username\\AppData\\Roaming\\nltk_data'

I tried to create manually a directory "C:/Users/username/nltk_data" and paste the reuters.zip there, but that didn't help! When I download it again using nltk.download(), it shows me the following:

[nltk_data] Downloading package reuters to C:\Users\username/nltk_data...
[nltk_data]   Package reuters is already up-to-date!

Any hints? I also wondering why the paths printed by python contain slashes /and backslashes \ at the same time?

Original Q&A

There are 3 best solutions below

Yawar Abbas On 02 November 2018 at 16:14

this is my code. You can get help accordingly

import nltk
#nltk.download('punkt')
#nltk.download('averaged_perceptron_tagger')
var = open("e:\Assignment\my_file.txt","r") #open file
lines = var.read() #read all lines
sentences = nltk.sent_tokenize(lines) #tokenize sentences
nouns = [] #empty to array to hold all nouns

for sentence in sentences:
     for word,pos in nltk.pos_tag(nltk.word_tokenize(str(sentence))):
         if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS'):
             nouns.append(word)


print (nouns)

Denys Filippov On 16 February 2019 at 10:30

In my case I just to go to a folder where corpus is downloaded and just unzip an archive. To see where corpus was downloaded:

nltk.download('reuters')

[nltk_data] Downloading package reuters to /home/denys/nltk_data...
[nltk_data] Package reuters is already up-to-date!

**dejanmarich** · Accepted Answer · 2018-11-03T20:34:29.403000

dejanmarich On 03 November 2018 at 20:34 BEST ANSWER

Since imp module is deprecated while using nltk with python 3.7, use import importlib instead of import imp, or try to run code with older version of python.

NLTK reuters datasets not found

There are 3 best solutions below

Related Questions in PYTHON

Related Questions in NLTK

Related Questions in REUTERS

Trending Questions

Popular # Hahtags

Popular Questions