Make this faster. (Min, Max in same iteration using a condition)

312 Views Asked by At

I would like to ask if/how could I rewrite those lines below, to run faster.

*(-10000, 10000) is just a range where I can be sure my numbers are between.

    first = 10000
    last = -10000

    for key in my_data.keys():
        if "LastFirst_" in key:  # In my_data there are many more keys with lots of vals.
            first = min(first, min(my_data[key]))
            last = max(last, max(my_data[key]))

    print first, last

Also, is there any pythonic way to write that (even if that wouldn't mean it will run faster)?

Thx

5

There are 5 best solutions below

7
On
values = [my_data[k] for k in my_data if 'LastKey_' in k]
flattened = [item for sublist in values for item in sublist]
min(first, min(flattened))
max(last, max(flattened))

or

values = [item for sublist in (j for a, j in d.iteritems() if 'LastKey_' in a) for item in sublist]
min(first, min(values))
max(last, max(values))

I was running some benchmarks and it seems that the second solution is slightly faster than the first. However, I also compared these two versions with the code posted by other posters.

solution one:  0.648876905441
solution two:  0.634277105331
solution three (TigerhawkT3):  2.14495801926
solution four (Def_Os):  1.07884407043
solution five (leewangzhong):  0.635314941406

based on a randomly generated dictionary of 1 million keys. I think that leewangzhong's solution is really good. Besides the timing shown above, in the next experiments it's resulting slightly faster than my second solution (we are talking about milliseconds, though), like:

solution one:  0.678879022598
solution two:  0.62641787529
solution three:  2.15943193436
solution four:  1.05863213539
solution five:  0.611482858658

Itertools is really a great module!

6
On

You could use some comprehensions to simplify the code.

first = min(min(data) for (key, data) in my_data.items() if "LastFirst_" in key)
last = max(max(data) for (key, data) in my_data.items() if "LastFirst_" in key)
9
On

Use the * operator to unpack the values:

>>> my_data = {'LastFirst_1':[1, 4, 5], 'LastFirst_2':[2, 4, 6]}
>>> d = [item for k,v in my_data.items() if 'LastFirst_' in k for item in v]
>>> first = 2
>>> last = 5
>>> min(first, *d)
1
>>> max(last, *d)
6
4
On

The min and max functions are overloaded to take either multiple values (as you use it), or one sequence of values, so you can pass in iterables (e.g. lists) and get the min or max of them.

Also, if you're only interested in the values, use .values() or itervalues(). If you're interested in both, use .items() or .iteritems(). (In Python 3, there is no .iter- version.)

If you have many sequences, you can use itertools.chain to make them one long iterable. You can also manually string them along using multiple for in a single comprehension, but that can be distasteful.

import itertools

def flatten1(iterables):
    # The "list" is necessary, because we want to use this twice
    # but `chain` returns an iterator, which can only be used once.
    return list(itertools.chain(*iterables))

# Note: The "(" ")" indicates that this is an iterator, not a list.
valid_lists = (v for k,v in my_data.iteritems() if "LastFirst_" in k)
valid_values = flatten1(valid_lists)
# Alternative: [w for w in v for k,v in my_data.iteritems() if "LastFirst_" in k]  

first = min(valid_values)
last = max(valid_values)

print first, last

If the maximum and minimum elements are NOT in the dict, then the coder should decide what to do, but I would suggest that they consider allowing the default behavior of max/min (probably a raised exception, or the None value), rather than try to guess the upper or lower bound. Either one would be more Pythonic.

In Python 3, you may specify a default argument, e.g. max(valid_values, default=10000).

4
On
my_data = {'LastFirst_a': [1, 2, 34000], 'LastFirst_b': [-12000, 1, 5]}

first = 10000
last = -10000

# Note: replace .items() with .iteritems() if you're using Python 2.
relevant_data = [el for k, v in my_data.items() for el in v if "LastFirst_" in k]
# maybe faster:
# relevant_data = [el for k, v in my_data.items() for el in v if k.startswith("LastFirst_")]

first = max(first, max(relevant_data))
last = min(last, min(relevant_data))

print(first, last)