Python algorithm speed up / Perfomance tips

123 Views Asked by At

I'm working with big file manipulation (over 2Gb) and I have a lot of processing functions to deal with the data. My problem is that it is taking a lot (A LOT) of time to finish the processing. From all function the one that seems to take longer is this one:

 def BinLsb(data):
        Len = len(data)
        databin = [0] * (int(Len))
        num_of_bits = 8
        ###convert to bin the octets and LSB first
        for i in range(Len):
            newdatabin = bin(int(data[i], 16))[2:].zfill(num_of_bits)[::-1]
            databin[i] = newdatabin
        ###group the 14bit and LSB again
        databin = ''.join(databin)
        composite_list = [databin[x:x + 14] for x in range(0, len(databin), 14)]
        LenComp = len(composite_list)
        for i in range(LenComp):
            composite_list[i] = (int(str(composite_list[i])[::-1], 2))
        return composite_list

I'd really appreciate some performance tips / another approach to this algorithm in order to save me some time. Thanks in advance!

2

There are 2 best solutions below

7
On BEST ANSWER

basic analysis of your function: time complexity: 3O(n) space complexity: 3O(n). because your loop 3 times; my suggestion is loop once, use generator, which will cost 1/3 of time and space.

I upgraded your code and remove some useless variable using a generator:

def binLsb(data):
    databin = ""
    num_of_bits = 8
    for i in range(len(data)):
        newdatabin = bin(int(data[i], 16))[2:].zfill(num_of_bits)[::-1]
        while len(str(databin)) > 14:
            yield (int(str(databin[:14])[::-1], 2))
            databin = databin[14:]
        databin += str(newdatabin)

enjoy

Oliver

0
On

You can hunt performance issues by profiling the software, but you'll probably be well-served by using logic which takes advantage of a faster language wrapped by Python. This could look like using a scientific library like numpy, using some FFI (foreign function interface), or creating and calling a custom program.

More specifically, Python is natively very slow in computing terms, as each operation carries a lot of baggage with it (such as the infamous GIL). Passing this work off to another language lets you pay this overhead cost less often, rather than at every possible point in every loop!

Scientific libraries can do this for you by at least

  • behaving like Python logic (which is friendly for you!) while doing many known steps per action (rather than one at a time)
  • may be able to vectorize operations (takes more advantage of processing time by performing many actions in the same processor step)