What is wrong with following piece of code?

94 Views Asked by At

I have the following piece of code copied from book programming collective intelligence page 118, chapter "Document Filtering". This function breaks up the text into words by dividing the text on any character that isn't a letter. This leaves only actual words,all converted to lower-case.

import re                                          
import math
def getwords(doc):
    splitter=re.compile('\\W*')
    words=[s.lower() for s in splitter.split(doc) 
           if len(s)>2 and len(s)<20]
    return dict([(w,1) for w in words])

I implemented the function and got the following error:

>>> import docclas
>>> t=docclass.getwords(s)
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    t=docclass.getwords(s)
  File "docclass.py", line 6, in getwords
    words=[s.lower() for s in splitter.split(doc)
NameError: global name 'splitter' is not defined
1

There are 1 best solutions below

0
On BEST ANSWER

It works here

>>> import re
>>> 
>>> def getwords(doc):
...     splitter=re.compile('\\W*')
...     words=[s.lower() for s in splitter.split(doc) 
...            if len(s)>2 and len(s)<20]
...     return dict([(w,1) for w in words])
... 
>>> getwords ("He's fallen in the water!");
{'water': 1, 'the': 1, 'fallen': 1}

I'm gueesing you made a typo in your code, but got it right when you pasted it here.