This is weird. I'm trying to implement a text frequency calculation, and running the following code using python 2.7 in an ipython notebook. Three versions of the function.
First version, just count the instances of words in a list of strings and stick it in a dictionary:
testList = ['I', 'am', 'a', 'list', 'of', 'strings']
def tf1(listOfStrs):
thedict = dict((x,listOfStrs.count(x)) for x in set(listOfStrs))
print thedict
# produces expected output:
> {'a': 1, 'I': 1, 'am': 1, 'list': 1, 'of': 1, 'strings': 1}
Ok, that's working fine. Time to actually get the frequencies by dividing each occurrence by the total number of words. Should produce 0.16... etc.
def tf2(listOfStrs):
print len(listOfStrs)
thedict = dict((x,listOfStrs.count(x)/len(listOfStrs)) for x in set(listOfStrs))
print thedict
tf2(testList)
> 6
> {'a': 0, 'I': 0, 'am': 0, 'list': 0, 'of': 0, 'strings': 0}
"Ah!" I think. This is the easiest bug in the world to fix. I'm doing integer division. I don't want to be doing integer division. Just cast one of the terms to float. Bam.
def tf2(listOfStrs):
print len(listOfStrs)
thedict = dict((x,listOfStrs.count(x)/float(len(listOfStrs)) for x in set(listOfStrs))
print thedict
> File "<ipython-input-13-db67e35f2596>", line 3
> thedict = dict((x,listOfStrs.count(x)/float(len(listOfStrs)) for x in set(listOfStrs))
> ^
> SyntaxError: invalid syntax
???? I know there isn't a syntax error in the for statement, because it bloody well worked fine in the previous two versions. Um. So obviously casting to float broke the dict comprehension. But that seems insane. It's just casting an int to a float. It's the easiest operation in the world... how did it break a dict comprehension?
I'm totally stumped by this one... anyone have any bright ideas?
According to this meta discussion I''l answer this question on behalf of Tim Peters and mark it as community wiki.