Situation: I have Model have a relation 1-1, sample:
class User(models.Model):
user_namme = models.CharField(max_length=40)
type = models.CharField(max_length=255)
created_at = models.DatetimeField()
...
class Book(models.Model):
user = models.OneToOneField(User, on_delete=models.CASCADE)
And I have a around 200,000 records.
- Languague: Python
- Framework: Django
- Database: Postgres
Question:
- How can I delete 200,000 records above with minimal cost?
Solution I have tried:
user_ids = Users.objects.filter(type='sample', created_date__gte='2022-11-15 08:00', created_date__lt="2022-11-15 08:30").values_list('id',flat=True)[:200000] # Fetch 200,000 user ids.
for i, _ in enumerate(user_ids[:: 1000]):
with transaction.atomic():
batch_start = i * self.batch_size
batch_end = batch_start + self.batch_size
_, deleted = Users.objects.filter(id__in=user_ids[batch_start,batch_end]
With this solution, my server use arround:
- 600MB CPU
- 300MB RAM
- Take more 15 minutes to finish workload.
I wonder do anyone have a better solution?
By first principles, nothing beats raw(Django query) SQL in terms of speed because it operates closest to the database!
cursor.execute(“DELETE FROM DB WHERE Column = %s”)
Or else you can do it by:
Variable = Model.objects.filter(variable=variable)