What is the best way to make branchless counters?

196 Views Asked by At

I am working on a counter in Python3 for devices in the field. It tries to contact the device and returns the total number that are online and offline.

def count_online(mac_list):
    online = 0
    offline = 0

    for mac in mac_list:

        is_online = poll_device(mac) # ask the device if it is online, return bool

        if is_online:
            online += 1
        else:
            offline += 1

    return online, offline

I have seen "Branchless method to convert false/true to -1/+1?" as well as "Why "If" is Sloowww", however I need a count of both the online and offline devices. I also appreciate that in Python, trying to go branchless is arguably a bit pointless as it's not a low level language like C. On the other hand, the Hopper Necklace reminds us not to waste our microseconds, so is there a more efficient way to write this? Would a branchless approach decrease run time?

3

There are 3 best solutions below

0
Tomerikoo On

You can use the fact that isinstance(True, int) and use pure mathematical operations with one go on your input iterator (which doesn't have to be a list):

def count_online(mac_list):
    online_cnt = sum(poll_device(mac) for mac in mac_list)
    offline_cnt = len(mac_list) - online_cnt

    return online_cnt, offline_cnt
1
Patrick Artner On

You can simply add up the Trues and subtract them from the length - this will give you a branchless code:

def poll_device(_):
    poll_device.state = not poll_device.state
    return poll_device.state

poll_device.state = False

def count_online(mac_list):
    online = 0
    offline = 0
    for mac in mac_list:
        online += poll_device(mac) # ask the device if it is online, return bool

    return online, len(mac_list)-online

on,off = count_online(range(10))

print(on,off)

import dis

dis.dis(count_online)

Output:

5 5

# disassembly
  9           0 LOAD_CONST               1 (0)
              2 STORE_FAST               1 (online)

 10           4 LOAD_CONST               1 (0)
              6 STORE_FAST               2 (offline)

 11           8 SETUP_LOOP              24 (to 34)
             10 LOAD_FAST                0 (mac_list)
             12 GET_ITER
        >>   14 FOR_ITER                16 (to 32)
             16 STORE_FAST               3 (mac)

 12          18 LOAD_FAST                1 (online)
             20 LOAD_GLOBAL              0 (poll_device)
             22 LOAD_FAST                3 (mac)
             24 CALL_FUNCTION            1
             26 INPLACE_ADD
             28 STORE_FAST               1 (online)
             30 JUMP_ABSOLUTE           14
        >>   32 POP_BLOCK

 14     >>   34 LOAD_FAST                1 (online)
             36 LOAD_GLOBAL              1 (len)
             38 LOAD_FAST                0 (mac_list)
             40 CALL_FUNCTION            1
             42 LOAD_FAST                1 (online)
             44 BINARY_SUBTRACT
             46 BUILD_TUPLE              2
             48 RETURN_VALUE

Versus your codes disassembly:

 21           0 LOAD_CONST               1 (0)
              2 STORE_FAST               1 (online)

 22           4 LOAD_CONST               1 (0)
              6 STORE_FAST               2 (offline)

 24           8 SETUP_LOOP              42 (to 52)
             10 LOAD_FAST                0 (mac_list)
             12 GET_ITER
        >>   14 FOR_ITER                34 (to 50)
             16 STORE_FAST               3 (mac)

 26          18 LOAD_GLOBAL              0 (poll_device)
             20 LOAD_FAST                3 (mac)
             22 CALL_FUNCTION            1
             24 STORE_FAST               4 (is_online)

 28          26 LOAD_FAST                4 (is_online)
             28 POP_JUMP_IF_FALSE       40

 29          30 LOAD_FAST                1 (online)
             32 LOAD_CONST               2 (1)
             34 INPLACE_ADD
             36 STORE_FAST               1 (online)
             38 JUMP_ABSOLUTE           14

 31     >>   40 LOAD_FAST                2 (offline)
             42 LOAD_CONST               2 (1)
             44 INPLACE_ADD
             46 STORE_FAST               2 (offline)
             48 JUMP_ABSOLUTE           14
        >>   50 POP_BLOCK

 33     >>   52 LOAD_FAST                1 (online)
             54 LOAD_FAST                2 (offline)
             56 BUILD_TUPLE              2
             58 RETURN_VALUE

But I am not quite sure that kind of "optimization" beforehand is wise to to - I would simply go for the if ... else ... approach until this code part is a bottleneck (measuring it)... the call towards poll_device(mac) is 100-1000 times slower then this branchless construct and that is eating your performance.

0
Steven Rumbalski On
def count_online(mac_list):
    online = sum(map(poll_device, mac_list))
    return online, len(mac_list) - online