Access GAE datastore from background thread

132 Views Asked by At

I'm writing a web app through Google App Engine and I'd like to have a script frequently update user profiles based on live, temporary information I'm getting from an XML feed. I'm doing this with a GAE background_thread so the site can continue to operate while this runs.

Outside this background thread, users can still navigate the website and thereby make changes to their profile.

The background thread does exactly what it should, updating user profiles based on the live XML data and re-entering the profile to the datastore. However when a user makes a change to their profile, the background thread is not picking up on the changes. The returned list from the ndb datastore query does not reflect the changes users make.

The curious detail is that it DOES reflect the correct changes if a new user is added to the datastore, it just doesn't reflect changes if a preexisting user profile is modified. I should be able to query/put the datastore from a background thread right?

The meat of the background thread:

def update_accounts():
    while True:
        # Get data from XML feed.
        info_dict = get_live_data()

        # Get all the users from the GAE database
        gprofiles = mUserStats.query()

            for profile in gprofiles:

                # This isn't the actual condition but there's a condition here.
                if needs_update in profile.m_needsUpdate: 

                    # Modify the current profile. 
                    profile.make_change(info_dict)
                    # Re enter into database.
                    profile.put()

        # Add a sleep time as this doesn't need to run that frequently.
        time.sleep(20)              

class updateAccounts():

def start_thread(self):
    t =background_thread.start_new_background_thread(target=update_accounts())

This is where profiles are modified:

def post(self):
        session = get_current_session()
        user_key = mUserStats_key(session['me'].m_email)
        curr_user = mUserStats.get_by_id(session['me'].m_email, user_key)
        curr_user.change_profile() 
        curr_user.put()
1

There are 1 best solutions below

0
On

Just some random thoughts, don't really know which would work best (if any at all):

  1. Instead of doing profile.put() inside the loop maybe you could store changed entities in a list and do some ndb.put_multi() calls after the loop? This would reduce the number of datastore calls by the number of mUserStats entities you have thus reducing the execution time and leaving less chances for a profile to be changed by a user while the background task is running.

  2. If the gprofiles = mUserStats.query() line actually fetches whole entities maybe you could try doing keys_only=True and get each mUserStats entity individually inside the loop. This will increase the execution time and number of datastore calls by the number of mUserStats entities but there will be a lot less chances that an entity was changed by a user during the time it was fetched by the background task.

  3. Are the properties updated by the XML feed the same properties updated by user? If not - maybe they could be stored in different models.

You could also take a look at query's cursors and iterators which might be helpful to automate suggestions 1 & 2.