how to perform XOR of all words in a file

1k Views Asked by At

I want to convert all words in a standard dictionary (for example : /usr/share/dict/words of a unix machine) integer and find XOR between every two words in the dictionary( ofcourse after converting them to integer) and probably store it in a new file.

Since I am new to python and because of large file sizes, the program is getting hung every now and then.

import os
dictionary = open("/usr/share/dict/words","r")
'''a = os.path.getsize("/usr/share/dict/words")
c = fo.read(a)'''
words = dictionary.readlines()

foo = open("word_integer.txt", "a")


for word in words:
    foo.write(word)
    foo.write("\t")
    int_word = int(word.encode('hex'), 16)
    '''print int_word'''
    foo.write(str(int_word))
    foo.write("\n")

foo.close()
2

There are 2 best solutions below

4
On BEST ANSWER

First we need a method to convert your string to an int, I'll make one up (since what you're doing isn't working for me at all, maybe you mean to encode as unicode?):

def word_to_int(word):
    return sum(ord(i) for i in word.strip())

Next, we need to process the files. The following works in Python 2.7 onward, (in 2.6, just nest two separate with blocks, or use contextlib.nested:

with open("/usr/share/dict/words","rU") as dictionary: 
    with open("word_integer.txt", "a") as foo:
        while dictionary:
            try:
                w1, w2 = next(dictionary), next(dictionary)
                foo.write(str(word_to_int(w1) ^ word_to_int(w2)))
            except StopIteration:
                print("We've run out of words!")
                break
4
On

This code seems to work for me. You're likely running into efficiency issues because you are calling readlines() on the entire file which pulls it all into memory at once.

This solution loops through the file line by line for each line and computes the xor.

f = open('/usr/share/dict/words', 'r')                                          

pairwise_xors = {}                                                              

def str_to_int(w):                                                              
    return int(w.encode('hex'), 16)                                             

while True:                                                                     
    line1 = f.readline().strip()                                                
    g = open('/usr/share/dict/words', 'r')                                      
    line2 = g.readline().strip()                                                

    if line1 and line2:                                                         
        pairwise_xors[(line1, line2)] = (str_to_int(line1) ^ str_to_int(line2)) 
    else:                                                                       
        g.close()                                                               
        break                                                                   

f.close()