Datajoint locking the table? Populating on multiple machines

62 Views Asked by At

I have a table that runs some heavier computation (process length ~ 5 minutes per key). I want to reserve jobs and run it on multiple machines. I noticed that computers get locked out from the table as soon as one machine starts processing a job - they effectively have to wait until one of the jobs finished before it starts its own, or gets a chance to grab a job. Where does this behavior stem from? I seem to run into "Lock wait timeout exceeded errors" on other machines then the one that is currently processing a job when the job is taking too long.

@schema
class HeavyComputation(dj.Computed):
    definition = """
    # ... 
    -> Table1
    class_label      :    varchar(25)      
    -> Table2.proj(somekey2="somekey")
    ---
    analyzed  :    longblob         

I am running .populate() on the table with

settings = {"display_progress": True, 
            "reserve_jobs": True,
            "suppress_errors": True,
            "order": "random"}
2

There are 2 best solutions below

1
Dimitri Yatsenko On

Yes, this is a tricky problem with how transaction serialization works. I will explain in a bit more detail and provide additional background but the solution is to reorder the primary key attributes in the table:

@schema
class HeavyComputation(dj.Computed):
    definition = """
    # ... 
    -> Table1
    -> Table2.proj(somekey2="somekey")
    class_label      :    varchar(25)      
    ---
    analyzed  :    longblob

Again, I will provide a detailed explanation later since it will take some time to write up. I did not want to make you wait.

1
Horst On

The problem turned out to be a .delete() call inside a sub function of my make function. I am taking track of temporary files inside another (unrelated) table and wanted things to be cleaned once the make routine finishes. However, this .delete was running into a table lock and thereby prevented the .populate call to finish.