Optimization of bulk update/insert

539 Views Asked by At

I'm writing a web application that is going to show player statistics for an online game, using Django 1.6 and PostgreSQL 9.1. I've created a script using django-extensions "runscript" which fetches all players that are online and insert/updates into my table. This script is executed 4 times per hour using cron. I need to either insert or update since the player already could be in the table (and thus should be updated) or not be in the table.

To my problem: there is around 25,000 players online at peak hours and I'm not really sure how I should optimize this (minimize hdd i/o). This is how I've done so far:

@transaction.commit_manually
def run():
    for fetched_player in FetchPlayers():
        defaults = {
            'level': fetched_player['level'],
            'server': fetched_player['server'],
            'last_seen': date.today(),
        }
        player, created = Player.objects.get_or_create(name=fetched_player['name'], defaults)
        if not created:
            player.level = fetched_player['level']
            if player.server != fetched_player['server']:
                # I save this info to another table
            player.server = fetched_player['server']
            player.last_seen = date.today()
            player.save()
    transaction.commit()

Would it (considerably) faster to bypass Django and access the database using psycopg2 or similar? Would Django be confused when 'someone else' is modifying the database? Note that Django only reads the database, all writes are done by this script.

What about to (using either Django or psycopg2) bulk fetch players from the database, update those that was found, and then insert those players that were not found? If this is even possible? The query would get huge: 'SELECT * FROM player WHERE name = name[0] OR name = name[1] OR ... OR name[25000]'. :)

1

There are 1 best solutions below

0
On

If you want to reduce the number of queries, here is what I suggest: Call update() directly for each player, which returns the number of rows updated, if the count is 0 (meaning the player is new), put the player data in a temporary list. When you are done with all fetched players, use a bulk_create() to insert all new players with one SQL statement.

Assume you have M+N players (M new, N updated), the number of queries:

Before: (M+N) selects + M inserts + N updates

After: (M+N) updates + 1 bulk insert.