I have two models with M2M field. Because there wont be any update or deletion (just need to read data from db) I'm looking to have single db hit to retrieve all the required data. I used prefetch_related with Prefetch to be able to filter data and also have filtered objects in a cached list using to_attr. I tried to achieve the same result using annotate along with Subquery. but here I can't understand why the annotated filed contains only one value instead of a list of values. let's review the code I have:
- some Routes may have more than one special point (Point instances with is_special=True).
models.py
class Route(models.Model):
indicator = models.CharField()
class Point(models.Model):
indicator = models.CharField()
route = models.ManyToManyField(to=Route, related_name="points")
is_special=models.BooleanField(default=False)
views.py
routes = Route.objects.filter(...).prefetch_related(
Prefetch(
"points",
queryset=Point.objects.filter(is_special=True),
to_attr="special_points",
)
)
this will work as expected but it will result in a separate database querying to fetch the points data. in the following code I tried to use Subquery instead to have a single database hit.
routes = Route.objects.filter(...).annotate(
special_points=Subquery(
Point.objects.filter(route=OuterRef("pk"), is_special=True).values("indicator")
)
the problem is in the second approach will have either one or none special-point indicator when printing route_instance.special_points even if when using prefetch the printed result for the same instance of Route shows that there are two more special points.
I know in the first approach
route_instance.special_pointswill contains the Point instances and not their indicators but that is the problem.I checked the SQL code of the Subquery and there is no sign of limitation in the query as I did not used slicing in the python code as well. but again the result is limited to either one (if one or more exists) or none if there isn't any special point.
This is how I check db connection
# Enable query counting
from django.db import connection
connection.force_debug_cursor = True
route_analyzer(data, err)
# Output the number of queries
print(f"Total number of database queries: {len(connection.queries)}")
for query in connection.queries:
print(query["sql"])
# Disable query counting
connection.force_debug_cursor = False
with the help from GPT, I have raw sql code that gives the result:
- it is based on some python code so it's not clean template.
SELECT "general_route"."id", "general_route"."indicator",
(SELECT GROUP_CONCAT(U0."indicator", ', ')
FROM "points_point" U0
INNER JOIN "points_point_route" U1 ON (U0."id" = U1."point_id")
WHERE (U1."route_id" = "general_route"."id" AND U0."is_special")
) AS "special_points",
(SELECT GROUP_CONCAT(U0."indicator", ', ')
FROM "points_point" U0
INNER JOIN "points_point_route" U1 ON (U0."id" = U1."point_id")
WHERE (U1."route_id" = "general_route"."id" AND U0."indicator" IN ('CAK', 'NON'))
) AS "all_points"
FROM "general_route"
WHERE ("general_route"."indicator" LIKE 'OK%' OR "general_route"."indicator" LIKE 'OI%')
ORDER BY "general_route"."indicator" ASC
Answer to my question
After reviewing the generated SQL code of the django-ORM I tried to create a custom version of GroupConcat method so to concatenate the
indicatorfield of all filtered objects together (then simply use split to generate a list of them). here I realized that inheritance fromAggregate(from django.db.models), will cause the generated SQL code to have an unwantedGROUP BYstatement which also includes all the fields of the target model so the output will include only one value(if any) or nothing at all. even if theGroupConcatis well implemented, that group-by will end up kinda loop that will restarts GroupConcat and feed it only one value per rosw, so there is nothing from previous rows left to be concatenated with new row. But inheritance fromFunc(from django.db.models) dropped that extra and unwantedGROUP BYso the query will fetch all the rows and feeds the expected column of all rows to theGroupConcatat once and at the end of day the expected result is produced.