How to improve performance of many appends to a list?

324 Views Asked by At

How to improve a performance of the following code?

BANNED_DOORBOTS = {...}

async def execute_query(self, query):
    async with self.pool.acquire() as conn:
        async with conn.cursor() as cur:
            await cur.execute(query)
            records = []
            async for row in cur:
                if row[0] not in BANNED_DOORBOTS:
                    records.append({
                        'key1': row[0],
                        'key2': row[1]
                    })

    return records

I don't want to check each time if row[0] not in BANNED_DOORBOTS. How to avoid this?

Usually, I have more than hundred (up to 20 000) elements in records. Maybe I can pre-allocate some space to avoid reallocations?

1

There are 1 best solutions below

6
On BEST ANSWER

You are rebuilding a list each time, from a database query.

I'd ask the database to not return records that are banned:

from psycopg2 import sql  # safe SQL composing

# Add a NOT IN clause to filter out banned doorbots, generating a
# separate parameter placeholder per value
query = sql.SQL(query) + sql.SQL(' WHERE ding_id NOT IN ({})').format(
    sql.SQL(', ').join([sql.Placeholder()] * len(BANNED_DOORBOTS)))
await cur.execute(query, BANNED_DOORBOTS)

I used the psycopg.sql framework to do the compositing here, but you could get away with string formatting too (use '%s' for placeholders).

Consider putting the BANNED_DOORBOTS set in a table in the database instead so you can use a WHERE ding_id NOT IN (SELECT id from BANNED_DOORBOTS WHERE id IS NOT NULL) subquery. That way you get better performance still (the database can optimise for this), and you don't have to generate placeholders.

Next, use a list comprehension to build the list. This is faster because it avoids repeated list.append lookups and method calls. Define your column names as a tuple and zip that together with each row:

keys = ('ding_id', 'doorbot_id', 'created_at', 'address', 'latitude', 
        'longitude', 'ding_kind')
return [dict(zip(keys, row)) async for row in cur]

The async for list comprehension syntax requires Python 3.6 or newer.

The aiopg driver lets you configure an alternative cursor factory, one that already produces dictionaries, which may be faster still. You then don't have to use any list comprehension at all:

from psycopg2.extras import RealDictCursor

# configure cursor to yield dictionaries rather than tuples
async with conn.cursor(cursor_factory=RealDictCursor) as cur:
    await cur.execute(query, BANNED_DOORBOTS)
    # directly return the cursor; have the caller do the async iteration
    return cur

If you don't want to make the caller responsible for looping, but have to produce a list, use the cursor.fetchall() method to produce that list; each element will be a dictionary:

# configure cursor to yield dictionaries rather than tuples
async with conn.cursor(cursor_factory=RealDictCursor) as cur:
    await cur.execute(query, BANNED_DOORBOTS)
    return await cur.fetchall()