How to search a text file table in Python?

1k Views Asked by UserBlackBox At 26 March 2019 at 21:58

I am creating a rainbow table with strings and hashes separated by spaces in a table. The rainbow table looks like this:

j)O 3be44b195706cdd25e29d2b01a0e88d4
j)P a83079350701398672677a9ffe07108c
j)Q 2952c4654c127f2bb1086b75d8f1f986
j)R 6621ec6e1ba3c3669259894db8cde339
j)S 0442a2ee045e1913cd2eb094e8945399

I want to know how I can make a python program to search for a string and find a hash or vice versa.

I have made it search the whole document, but I want it to only search a specific column.

I used panda and I can make it search now in a specific column but I want it only to find exact matchs:

working_table = pd.read_csv('rainbow_table_md5.txt', sep = ' ', names=["string", "hash"])
print(working_table['hash'].where(working_table['string'] == input(colored("String: ", 'cyan'))))

The code right now outputs this:

String: a
0           0cc175b9c0f1b6a831c399e269772661
1                                        NaN
2                                        NaN

                          ...               
14094701                                 NaN
14094702                                 NaN

Name: hash, Length: 14094731, dtype: object

I don't need all the other lines other than the match in row 0

Ideally I only need the hash as the output.

Original Q&A

There are 1 best solutions below

J_H On 27 March 2019 at 01:29

You want "lookup" rather than "search", since only an exact match matters. Pandas might be overkill for this application. A pair of dictionaries suffices:

class Rainbow:

    def __init__(self, infile, k=20):
        self.s_to_hash = {s: hash
                          for s, hash in self._read_tuples(infile)}
        self.hash_to_s = {hash[:k]: s
                          for s, hash in self.s_to_hash.items()}
        self.k = k

    @staticmethod
    def _read_tuples(infile):
        with open(infile) as fin:
            for line in fin:
                s, hash = line.strip().split()
                yield s, hash

Choosing k < 32 is an attempt to save some memory, at the (small) risk of having hashes collide based on their common prefix. Tune it up or down to taste, based on your memory, table size, and appetite for collision risk. Consider writing a getter function and then making hash_to_s private.

Storing bytes would be twice as memory efficient compared to storing ascii hex nybbles.

How to search a text file table in Python?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in FULL-TEXT-SEARCH

Related Questions in TEXT-SEARCH

Related Questions in RAINBOWTABLE

Trending Questions

Popular # Hahtags

Popular Questions