How to more efficiently test if data anomalies occur in transaction (Django)

198 Views Asked by At

I want to test if data anomalies such as dirty read, non-repeatable read, phantom read, lost update and so on occur in transaction.

Actually, I used person table which has id and name as shown below.

person table:

id name
1 John
2 David

Then, I tested non-repeatable read with test view below and one command prompt. *During sleep(10), one command prompt updates "David" to "Tom" and commits:

# "store/views.py"

from .models import Person
from django.http import HttpResponse
from django.db import transaction
from time import sleep

@transaction.atomic
def test(request):
    print(Person.objects.get(pk=2)) # "David"
    sleep(10) # Update "David" to "Tom" and commit by one command prompt.
    print(Person.objects.get(pk=2)) # "Tom"

    return HttpResponse("Test")

But, every time I test data anomalies, I manually need to run test view and update and commit with one command prompt which takes much time.

So, how can I more efficiently test if data anomalies occur in transaction?

1

There are 1 best solutions below

0
On

With threads, you can more efficiently test if data anomalies occur in transaction.

I created 5 sets of code with threads to test 5 common data anomalies dirty read, non-repeatable read, phantom read, lost update and write skew with the Django's default isolation level READ COMMITTED and PostgreSQL as shown below. *Lost update or write skew occurs by race condition and I used PostgreSQL.

I explain about:

  • dirty read => Here
  • non-repeatable read and phantom read => Here
  • lost update and write skew => Here
  • which anomaly occurs in which isolation level => Here

<Dirty read>, <Non-repeatable read> and <Phantom read>

First, I created store_person table with id and name with models.py as shown below:

store_person table:

id name
1 John
2 David
# "store/models.py"

from django.db import models

class Person(models.Model):
    name = models.CharField(max_length=30)

Then, I created and ran the test code of dirty read as shown below:

# "store/views.py"

from django.db import transaction
from time import sleep
from .models import Person
from threading import Thread
from django.http import HttpResponse

@transaction.atomic
def transaction1(flow):
    while True:
        while True:
            if flow[0] == "Step 1":
                sleep(0.1)
                print("<T1", flow[0] + ">", "BEGIN")
                flow[0] = "Step 2"
                break
        
        while True:
            if flow[0] == "Step 2":
                sleep(0.1)
                print("<T1", flow[0] + ">", "SELECT")
                person = Person.objects.get(id=2)
                print(person.id, person.name)
                flow[0] = "Step 3"
                break

        while True:
            if flow[0] == "Step 5":
                sleep(0.1)
                print("<T1", flow[0] + ">", "SELECT")
                person = Person.objects.get(id=2)
                print(person.id, person.name)
                flow[0] = "Step 6"
                break
        
        while True:
            if flow[0] == "Step 6":
                sleep(0.1)
                print("<T1", flow[0] + ">", "COMMIT")
                flow[0] = "Step 7"
                break
        break

@transaction.atomic
def transaction2(flow):
    while True:
        while True:
            if flow[0] == "Step 3":
                sleep(0.1)
                print("<T2", flow[0] + ">", "BEGIN")
                flow[0] = "Step 4"
                break

        while True:
            if flow[0] == "Step 4":
                sleep(0.1)
                print("<T2", flow[0] + ">", "UPDATE")
                Person.objects.filter(id=2).update(name="Tom")
                flow[0] = "Step 5"
                break
        
        while True:
            if flow[0] == "Step 7":
                sleep(0.1)
                print("<T2", flow[0] + ">", "COMMIT")
                break
        break

def call_transcations(request): 
    flow = ["Step 1"]

    thread1 = Thread(target=transaction1, args=(flow,), daemon=True)
    thread2 = Thread(target=transaction2, args=(flow,), daemon=True)

    thread1.start()
    thread2.start()

    thread1.join()
    thread2.join()

    return HttpResponse("Call_transcations")

Then, dirty read didn't occur according to the result belew on console because in any isolation levels in PostgreSQL, dirty read doesn't happen:

<T1 Step 1> BEGIN
<T1 Step 2> SELECT
2 David # Here
<T2 Step 3> BEGIN
<T2 Step 4> UPDATE
<T1 Step 5> SELECT
2 David # Here
<T1 Step 6> COMMIT
<T2 Step 7> COMMIT

And also, I could get the PostgreSQL query logs below. You can check how to log PostgreSQL queries:

[23576]: BEGIN
[23576]: SELECT "store_person"."id", "store_person"."name" 
         FROM "store_person" 
         WHERE "store_person"."id" = 2 
         LIMIT 21
[8600]: BEGIN
[8600]: UPDATE "store_person" SET "name" = 'Tom' 
        WHERE "store_person"."id" = 2
[23576]: SELECT "store_person"."id", "store_person"."name" 
         FROM "store_person" 
         WHERE "store_person"."id" = 2 
         LIMIT 21
[23576]: COMMIT
[8600]: COMMIT

And, this table below shows the flow and SQL query logs of PostgreSQL above:

Flow Transaction 1 (T1) Transaction 2 (T2) Explanation
Step 1 BEGIN; T1 starts.
Step 2 SELECT "store_person"."id", "store_person"."name" FROM "store_person" WHERE "store_person"."id" = 2 LIMIT 21;

2 David
T1 reads David.
Step 3 BEGIN; T2 starts.
Step 4 UPDATE "store_person" SET "name" = 'Tom' WHERE "store_person"."id" = 2; T2 updates David to Tom.
Step 5 SELECT "store_person"."id", "store_person"."name" FROM "store_person" WHERE "store_person"."id" = 2 LIMIT 21;

2 David
T1 reads David instead of Tom before T2 commits.

*Dirty read doesn't occur!!

Step 6 COMMIT; T1 commits.
Step 7 COMMIT; T2 commits.

Next, I created and ran the test code of non-repeatable read as shown below:

# "store/views.py"

# ...

@transaction.atomic
def transaction1(flow):
    while True:
        while True:
            if flow[0] == "Step 1":
                sleep(0.1)
                print("<T1", flow[0] + ">", "BEGIN")
                flow[0] = "Step 2"
                break
        
        while True:
            if flow[0] == "Step 2":
                sleep(0.1)
                print("<T1", flow[0] + ">", "SELECT")
                person = Person.objects.get(id=2)
                print(person.id, person.name)
                flow[0] = "Step 3"
                break

        while True:
            if flow[0] == "Step 6":
                sleep(0.1)
                print("<T1", flow[0] + ">", "SELECT")                
                person = Person.objects.get(id=2)
                print(person.id, person.name)
                flow[0] = "Step 7"
                break
        
        while True:
            if flow[0] == "Step 7":
                sleep(0.1)
                print("<T1", flow[0] + ">", "COMMIT")
                break
        break

@transaction.atomic
def transaction2(flow):
    while True:
        while True:
            if flow[0] == "Step 3":
                sleep(0.1)
                print("<T2", flow[0] + ">", "BEGIN")
                flow[0] = "Step 4"
                break

        while True:
            if flow[0] == "Step 4":
                sleep(0.1)
                print("<T2", flow[0] + ">", "UPDATE")
                Person.objects.filter(id=2).update(name="Tom")
                flow[0] = "Step 5"
                break
        
        while True:
            if flow[0] == "Step 5":
                sleep(0.1)
                print("<T2", flow[0] + ">", "COMMIT")
                flow[0] = "Step 6"
                break
        break

# ...

Then, non-repeatable read occurred according to the result belew on console because in READ COMMITTED isolation level in PostgreSQL, non-repeatable read occurs:

<T1 Step 1> BEGIN
<T1 Step 2> SELECT
2 David # Here
<T2 Step 3> BEGIN
<T2 Step 4> UPDATE
<T2 Step 5> COMMIT
<T1 Step 6> SELECT
2 Tom # Here
<T1 Step 7> COMMIT

And also, I could get the PostgreSQL query logs below:

[23128]: BEGIN
[23128]: SELECT "store_person"."id", "store_person"."name" 
         FROM "store_person" 
         WHERE "store_person"."id" = 2 
         LIMIT 21
[6368]: BEGIN
[6368]: UPDATE "store_person" SET "name" = 'Tom' 
        WHERE "store_person"."id" = 2
[6368]: COMMIT
[23128]: SELECT "store_person"."id", "store_person"."name" 
         FROM "store_person" 
         WHERE "store_person"."id" = 2 
         LIMIT 21
[23128]: COMMIT

And, this table below shows the flow and SQL query logs of PostgreSQL above:

Flow Transaction 1 (T1) Transaction 2 (T2) Explanation
Step 1 BEGIN; T1 starts.
Step 2 SELECT "store_person"."id", "store_person"."name" FROM "store_person" WHERE "store_person"."id" = 2 LIMIT 21;

2 David
T1 reads David.
Step 3 BEGIN; T2 starts.
Step 4 UPDATE "store_person" SET "name" = 'Tom' WHERE "store_person"."id" = 2; T2 updates David to Tom.
Step 5 COMMIT; T2 commits.
Step 6 SELECT "store_person"."id", "store_person"."name" FROM "store_person" WHERE "store_person"."id" = 2 LIMIT 21;

2 Tom
T1 reads Tom instead of David after T2 commits.

*Non-repeatable read occurs!!

Step 7 COMMIT; T1 commits.

Next, I created and ran the test code of phantom read as shown below:

# "store/views.py"

# ...

@transaction.atomic
def transaction1(flow):
    while True:
        while True:
            if flow[0] == "Step 1":
                sleep(0.1)
                print("<T1", flow[0] + ">", "BEGIN")
                flow[0] = "Step 2"
                break
        
        while True:
            if flow[0] == "Step 2":
                sleep(0.1)
                print("<T1", flow[0] + ">", "SELECT")
                persons = Person.objects.all()
                for person in persons:
                    print(person.id, person.name)
                flow[0] = "Step 3"
                break

        while True:
            if flow[0] == "Step 6":
                sleep(0.1)
                print("<T1", flow[0] + ">", "SELECT")                
                persons = Person.objects.all()
                for person in persons:
                    print(person.id, person.name)
                flow[0] = "Step 7"
                break
        
        while True:
            if flow[0] == "Step 7":
                sleep(0.1)
                print("<T1", flow[0] + ">", "COMMIT")
                break
        break

@transaction.atomic
def transaction2(flow):
    while True:
        while True:
            if flow[0] == "Step 3":
                sleep(0.1)
                print("<T2", flow[0] + ">", "BEGIN")
                flow[0] = "Step 4"
                break

        while True:
            if flow[0] == "Step 4":
                sleep(0.1)
                print("<T2", flow[0] + ">", "INSERT")
                Person.objects.create(id=3, name="Tom")
                flow[0] = "Step 5"
                break
        
        while True:
            if flow[0] == "Step 5":
                sleep(0.1)
                print("<T2", flow[0] + ">", "COMMIT")
                flow[0] = "Step 6"
                break
        break

# ...

Then, phantom read occurred according to the result belew on console because in READ COMMITTED isolation level in PostgreSQL, phantom read occurs:

<T1 Step 1> BEGIN
<T1 Step 2> SELECT
1 John # Here
2 David # Here
<T2 Step 3> BEGIN
<T2 Step 4> INSERT
<T2 Step 5> COMMIT
<T1 Step 6> SELECT
1 John # Here
2 David # Here
3 Tom # Here
<T1 Step 7> COMMIT

And also, I could get the PostgreSQL query logs below:

[15912]: BEGIN
[15912]: SELECT "store_person"."id", "store_person"."name" 
         FROM "store_person"
[2098]: BEGIN
[2098]: INSERT INTO "store_person" ("id", "name") 
        VALUES (3, 'Tom') 
        RETURNING "store_person"."id"
[2098]: COMMIT
[15912]: SELECT "store_person"."id", "store_person"."name" 
         FROM "store_person"
[15912]: COMMIT

And, this table below shows the flow and SQL query logs of PostgreSQL above:

Flow Transaction 1 (T1) Transaction 2 (T2) Explanation
Step 1 BEGIN; T1 starts.
Step 2 SELECT "store_person"."id", "store_person"."name" FROM "store_person";

1 John
2 David
T1 reads 2 rows.
Step 3 BEGIN; T2 starts.
Step 4 INSERT INTO "store_person" ("id", "name") VALUES (3, 'Tom') RETURNING "store_person"."id"; T2 inserts the row with 3 and Tom to person table.
Step 5 COMMIT; T2 commits.
Step 6 SELECT "store_person"."id", "store_person"."name" FROM "store_person";

1 John
2 David
3 Tom
T1 reads 3 rows instead of 2 rows after T2 commits.

*Phantom read occurs!!

Step 7 COMMIT; T1 commits.

<Lost update>

First, I created store_product table with id, name and stock with models.py as shown below:

store_product table:

id name stock
1 Apple 10
2 Orange 20
# "store/models.py"

from django.db import models

class Product(models.Model):
    name = models.CharField(max_length=30)
    stock = models.IntegerField()
# "store/views.py"

# ...

@transaction.atomic
def transaction1(flow):
    while True:
        while True:
            if flow[0] == "Step 1":
                sleep(0.1)
                print("T1", flow[0], "BEGIN")
                flow[0] = "Step 2"
                break
        
        while True:
            if flow[0] == "Step 2":
                sleep(0.1)
                print("T1", flow[0], "SELECT")
                product = Product.objects.get(id=2)
                print(product.id, product.name, product.stock)
                flow[0] = "Step 3"
                break

        while True:
            if flow[0] == "Step 5":
                sleep(0.1)
                print("T1", flow[0], "UPDATE")
                Product.objects.filter(id=2).update(stock=13)
                flow[0] = "Step 6"
                break
        
        while True:
            if flow[0] == "Step 6":
                sleep(0.1)
                print("T1", flow[0], "COMMIT")
                flow[0] = "Step 7"
                break
        break

@transaction.atomic
def transaction2(flow):
    while True:
        while True:
            if flow[0] == "Step 3":
                sleep(0.1)
                print("T2", flow[0], "BEGIN")
                flow[0] = "Step 4"
                break

        while True:
            if flow[0] == "Step 4":
                sleep(0.1)
                print("T2", flow[0], "SELECT")
                product = Product.objects.get(id=2)
                print(product.id, product.name, product.stock)
                flow[0] = "Step 5"
                break
        
        while True:
            if flow[0] == "Step 7":
                sleep(0.1)
                print("T2", flow[0], "UPDATE")
                Product.objects.filter(id=2).update(stock=16)
                flow[0] = "Step 8"
                break

        while True:
            if flow[0] == "Step 8":
                sleep(0.1)
                print("T2", flow[0], "COMMIT")
                break
        break

# ...

Then, lost update occurred according to the result belew on console because in READ COMMITTED isolation level in PostgreSQL, lost update occurs:

T1 Step 1 BEGIN
T1 Step 2 SELECT # Reads the same row
2 Orange 20
T2 Step 3 BEGIN
T2 Step 4 SELECT # Reads the same row
2 Orange 20
T1 Step 5 UPDATE # Writes "stock"
T1 Step 6 COMMIT # And commits the write
T2 Step 7 UPDATE # Overwrites "stock"
T2 Step 8 COMMIT # And commits the overwrite

And also, I could get the PostgreSQL query logs below:

[20504]: BEGIN
[20504]: SELECT "store_product"."id", "store_product"."name", "store_product"."stock" 
         FROM "store_product" 
         WHERE "store_product"."id" = 2 
         LIMIT 21
[3840]: BEGIN
[3840]: SELECT "store_product"."id", "store_product"."name", "store_product"."stock" 
        FROM "store_product" 
        WHERE "store_product"."id" = 2 
        LIMIT 21
[20504]: UPDATE "store_product" SET "stock" = 13 
         WHERE "store_product"."id" = 2
[20504]: COMMIT
[3840]: UPDATE "store_product" SET "stock" = 16 
        WHERE "store_product"."id" = 2
[3840]: COMMIT

And, this table below shows the flow and SQL query logs of PostgreSQL above:

Flow Transaction 1 (T1) Transaction 2 (T2) Explanation
Step 1 BEGIN; T1 starts.
Step 2 SELECT "store_product"."id", "store_product"."name", "store_product"."stock" FROM "store_product" WHERE "store_product"."id" = 2 LIMIT 21;

2 Orange 20
T1 reads 20 which is updated later to 13 because a customer buys 7 oranges.
Step 3 BEGIN; T2 starts.
Step 4 SELECT "store_product"."id", "store_product"."name", "store_product"."stock" FROM "store_product" WHERE "store_product"."id" = 2 LIMIT 21;

2 Orange 20
T2 reads 20 which is updated later to 16 because a customer buys 4 oranges.
Step 5 UPDATE "store_product" SET "stock" = 13 WHERE "store_product"."id" = 2; T1 updates 20 to 13.
Step 6 COMMIT; T1 commits.
Step 7 UPDATE "store_product" SET "stock" = 16 WHERE "store_product"."id" = 2; T2 updates 13 to 16 after T1 commits.
Step 8 COMMIT; T2 commits.

*Lost update occurs.

<Write skew>

First, I created store_doctor table with id, name and on_call with models.py as shown below:

store_doctor table:

id name on_call
1 John True
2 Lisa True
# "store/models.py"

from django.db import models

class Doctor(models.Model):
    name = models.CharField(max_length=30)
    on_call = models.BooleanField()
# "store/views.py"

# ...

@transaction.atomic
def transaction1(flow):
    while True:
        while True:
            if flow[0] == "Step 1":
                print("T1", flow[0], "BEGIN")
                flow[0] = "Step 2"
                break
        
        while True:
            if flow[0] == "Step 2":
                print("T1", flow[0], "SELECT")
                doctor_count = Doctor.objects.filter(on_call=True).count()
                print(doctor_count)
                flow[0] = "Step 3"
                break

        while True:
            if flow[0] == "Step 5":
                print("T1", flow[0], "UPDATE")
                Doctor.objects.filter(id=1).update(on_call=False)
                flow[0] = "Step 6"
                break
        
        while True:
            if flow[0] == "Step 6":
                print("T1", flow[0], "COMMIT")
                flow[0] = "Step 7"
                break
        break

@transaction.atomic
def transaction2(flow):
    while True:
        while True:
            if flow[0] == "Step 3":
                print("T2", flow[0], "BEGIN")
                flow[0] = "Step 4"
                break

        while True:
            if flow[0] == "Step 4":
                print("T2", flow[0], "SELECT")
                doctor_count = Doctor.objects.filter(on_call=True).count()
                print(doctor_count)
                flow[0] = "Step 5"
                break
        
        while True:
            if flow[0] == "Step 7":
                print("T2", flow[0], "UPDATE")
                Doctor.objects.filter(id=2).update(on_call=False)
                flow[0] = "Step 8"
                break

        while True:
            if flow[0] == "Step 8":
                print("T2", flow[0], "COMMIT")
                break
        break

# ...

Then, write skew occurred according to the result belew on console because in READ COMMITTED isolation level in PostgreSQL, write skew occurs:

T1 Step 1 BEGIN
T1 Step 2 SELECT # Reads the same data
2 
T2 Step 3 BEGIN
T2 Step 4 SELECT # Reads the same data
2
T1 Step 5 UPDATE # Writes 'False' to John's "on_call" 
T1 Step 6 COMMIT # And commits the write
T2 Step 7 UPDATE # Writes 'False' to Lisa's "on_call" 
T2 Step 8 COMMIT # And commits the write

And also, I could get the PostgreSQL query logs below:

[11252]: BEGIN
[11252]: SELECT COUNT(*) AS "__count" 
         FROM "store_doctor" 
         WHERE "store_doctor"."on_call"
[2368]: BEGIN
[2368]: SELECT COUNT(*) AS "__count" 
        FROM "store_doctor" 
        WHERE "store_doctor"."on_call"
[11252]: UPDATE "store_doctor" 
         SET "on_call" = false 
         WHERE "store_doctor"."id" = 1
[11252]: COMMIT
[2368]: UPDATE "store_doctor" 
        SET "on_call" = false 
        WHERE "store_doctor"."id" = 2
[2368]: COMMIT

And, this table below shows the flow and SQL query logs of PostgreSQL above:

Flow Transaction 1 (T1) Transaction 2 (T2) Explanation
Step 1 BEGIN; T1 starts.
Step 2 SELECT COUNT(*) AS "__count" FROM "store_doctor" WHERE "store_doctor"."on_call";

2
T1 reads 2 so John can take a rest.
Step 3 BEGIN; T2 starts.
Step 4 SELECT COUNT(*) AS "__count" FROM "store_doctor" WHERE "store_doctor"."on_call";

2
T2 reads 2 so Lisa can take a rest.
Step 5 UPDATE "store_doctor" SET "on_call" = false WHERE "store_doctor"."id" = 1; T1 updates True to False which means John takes a rest.
Step 6 COMMIT; T1 commits.
Step 7 UPDATE "store_doctor" SET "on_call" = false WHERE "store_doctor"."id" = 2; T2 updates True to False which means Lisa takes a rest.
Step 8 COMMIT; T2 commits.

John and Lisa both take a rest.

*Write skew occurs.