Django query not respecting order for ltree field subquery (Postgresql)

81 Views Asked by At

Background I have an application that uses a team model that includes a field using the ltree postgres extension, which is a field that allows you to store the entire ancestral chain of an object including itself in a single field (the representation I see is . separated id values). This field type is not supported by Django out of the box, so some custom extensions were added to add proper support for the ancestor/descendant checks that I think are based on some packages available that do some of the same. This ltree field is being used as a path column in our Team table in the database to store team hierarchies.

The Problem There is an export option that will generate a CSV copy of the team structure, and as part of that, a given team will have the ancestors display each in their own columns next to other information about the team. When some larger teams are exported, the order of the ancestors is scrambled. Some digging shows that the subquery used to create the list annotation that explodes the path into individual teams is not respecting the ordering provided in the subquery.

The basic query that is being used is as follows:

ancestors_qs = Team.objects.filter(path__ancestorsof=OuterRef("path")).order_by("path")
teams = Team.objects.all()
teams = teams.annotate(ancestors=Func(Subquery(ancestors_qs.values("name")), function="array"))
teams.get(id=5).ancestors

where the ancestorsof query method is defined as below:

@LtreeField.register_lookup
class Ancestor(models.Lookup):
    """Lookup to find ancestors of current node using GiST @> operator."""

    lookup_name = "ancestorsof"

    def as_sql(self, compiler, connection):
        """Return the SQL generated for ancestorsof lookup on ltree field."""
        lhs, lhs_params = self.process_lhs(compiler, connection)
        rhs, rhs_params = self.process_rhs(compiler, connection)
        params = lhs_params + rhs_params
        return f"{lhs} operator(public.@>) {rhs}", params

This query for that team returns the ancestor teams, but does not return them in the order expected (this team is part of a 5 team chain 1-5). The ordering received back is [4, 1, 2, 3, 5] for the last bit of that query instead of the expected [1,2,3,4,5] and forcing Django to print the query it generates from that ORM call it returned the following:

SELECT
  "core_team"."id",
  "core_team"."name",
  "core_team"."path",
  array(
    (SELECT
      U0."name"
    FROM
      "core_team" U0
    WHERE
      U0."path" operator(public.@>) "core_team"."path")
  ) AS "ancestors" FROM "core_team"

Building the SQL by hand, I was able to get the proper output order with the following:

SELECT * FROM core_team WHERE path @> '1.2.3.4.5' ORDER BY path;

Looking at the Django output, I believe there is a missing ORDER BY that would be inside of the array function call and I don't know why that part is not included because the order_by is defined in the original ancestors_qs subquery.

1

There are 1 best solutions below

5
Muhammad Sarmad On

Because of the way Django handles subqueries, you're having trouble with the ancestors' ordering in your query. Django's annotate method with a subquery does not ensure that the results of the subquery will appear in the correct order. The subquery that obtains the ancestors isn't expressly ordered in your example even if you are annotating each team with its ancestors. As a result, the outcomes aren't in the order that was anticipated. You can change your query to take care of this issue and guarantee that the ancestors are arranged properly. Create a subquery by specifically include a order_by phrase in order to fetch the ancestors in the appropriate order. Assign a row number based on the desired order and then use Django's "Window" capabilities to correctly reorder the ancestors. Finally, use a filter to find the particular team you desire, then obtain the sorted ancestors from the ancestors_ordered field. The problem you encountered is resolved by using this method, which makes sure the ancestors are arranged correctly in the output.

Hope this helps!

**Below attached is the code you requested, keep this in mind this just an example code for what you want to do. Do let me know if you need any help.

from django.db.models import Window, F
from django.db.models.functions import RowNumber

# Create a subquery to fetch ancestors with the desired order
ancestors_qs = Team.objects.filter(path__ancestorsof=OuterRef("path")).order_by("path")

# Add row numbering based on the desired order
ancestors_qs = ancestors_qs.annotate(row_number=Window(
    expression=RowNumber(),
    order_by=F("desired_order_field")  # Replace with your actual desired order field
))

# Annotate the main query with the ordered ancestors
teams = Team.objects.all().annotate(
    ancestors_ordered=Subquery(ancestors_qs.values("name"), output_field=ArrayField(models.CharField()))
)

# Filter for the specific team you need
desired_team = teams.get(id=5)

# Retrieve the sorted ancestors from the ancestors_ordered field
sorted_ancestors = desired_team.ancestors_ordered