Load filtered hierarchy from self-referencing table in EF Core

786 Views Asked by At

I have a self-referencing entity Comment in my project:

public class Comment
{
    public Guid CommentId { get; set; }
    
    public string Content { get; set; }
        
    public Guid? ParentCommentId { get; set; }
    public virtual Comment ParentComment { get; set; }
        
    public virtual ICollection<Comment> Children { get; set; }
}

I am trying to do the following:

await context.Comment
    .Where(c => c.ParentCommentId == null)
    .Include(c => c.Children)
    .ToListAsync();

I want to get all root comments (comments that don't have parents) and load the entire hierarchy of comments.

Want do I want to see in the result?

I have the following comments in the database:

To make it more readable, I represent it in hierarchically order (only Content property):

  • Hello World!
    • Are you a programmer?
      • Sure
      • What?
  • I wanna go to Mars too!
    • See you on the Moon :)

When executing the query, shown above, I want to get something like this (in JSON):

[
   {
      "commentId":"be02742a-9170-4335-afe7-3c7c22684424",
      "content":"Hello World!",
      "parentCommentId":null,
      "children":[
         {
            "commentId":"59656765-d1ed-4648-8696-7d576ab7419f",
            "content":"Are you a programmer?",
            "parentCommentId":"be02742a-9170-4335-afe7-3c7c22684424",
            "children":[
               {
                  "commentId":"0bb77a43-c7bb-482f-9bf8-55c4050974da",
                  "content":"Sure",
                  "parentCommentId":"59656765-d1ed-4648-8696-7d576ab7419f",
                  "children":[
                     
                  ]
               },
               {
                  "commentId":"b8d61cfd-d274-4dae-a2be-72e08cfa9066",
                  "content":"What?",
                  "parentCommentId":"59656765-d1ed-4648-8696-7d576ab7419f",
                  "children":[
                     
                  ]
               }
            ]
         }
      ]
   },
   {
      "commentId":"cfe126b3-4601-4432-8c87-445c1362a225",
      "content":"I wanna go to Mars too!",
      "parentCommentId":null,
      "children":[
         {
            "commentId":"ab6d6b49-d772-48cd-9477-8d40f133c37a",
            "content":"See you on the Moon :)",
            "parentCommentId":"cfe126b3-4601-4432-8c87-445c1362a225",
            "children":[
               
            ]
         }
      ]
   }
]

But what do I get?

When I execute this query I have the following result:

[
   {
      "commentId":"be02742a-9170-4335-afe7-3c7c22684424",
      "content":"Hello World!",
      "postId":"69f3ca3a-66fc-4142-873d-01e950d83adf",
      "post":null,
      "parentCommentId":null,
      "parentComment":null,
      "commentRates":[
         
      ],
      "inverseParentComment":[
         {
            "commentId":"59656765-d1ed-4648-8696-7d576ab7419f",
            "content":"Are you a programmer?",
            "postId":"69f3ca3a-66fc-4142-873d-01e950d83adf",
            "post":null,
            "parentCommentId":"be02742a-9170-4335-afe7-3c7c22684424",
            "commentRates":[
               
            ],
            "inverseParentComment":[
               
            ]
         }
      ]
   },
   {
      "commentId":"cfe126b3-4601-4432-8c87-445c1362a225",
      "content":"I wanna go to Mars too!",
      "postId":"69f3ca3a-66fc-4142-873d-01e950d83adf",
      "post":null,
      "parentCommentId":null,
      "parentComment":null,
      "commentRates":[
         
      ],
      "inverseParentComment":[
         {
            "commentId":"ab6d6b49-d772-48cd-9477-8d40f133c37a",
            "content":"See you on the Moon :)",
            "postId":"69f3ca3a-66fc-4142-873d-01e950d83adf",
            "post":null,
            "parentCommentId":"cfe126b3-4601-4432-8c87-445c1362a225",
            "commentRates":[
               
            ],
            "inverseParentComment":[
               
            ]
         }
      ]
   }
]

So, I get the following hierarchy:

  • Hello World!
    • Are you a programmer?
  • I wanna go to Mars too!
    • See you on the Moon :)

Instead of this:

  • Hello World!
    • Are you a programmer?
      • Sure
      • What?
  • I wanna go to Mars too!
    • See you on the Moon :)

Why does it work so? How can I fix this problem?

Working, but dirty solution

List<Comment> allComments = await context.Comment
    .Include(c => c.Children)
    .ToListAsync();

List<Comment> filteredComments = allComments.Where(c => c.ParentCommentId == null);

This works, but it's ugly - we load the full hierarchy of comments from the database into the memory and than filter them. It'll work really slow if the database contains, e.g., 1 million comments.

Update

In my case there is no need to use recursive CTE. See update for this question. It looks like the idea introduced by Greg Ogle, but more simplified.

1

There are 1 best solutions below

1
On

If you simply load the flat data from the Comment table, it is reasonably fast. To assist in lookups, you can use a Dictionary<{Guid or whatever type makes sense}, {Comment object type}>. The lookup on Dictionary uses a HashTable (I'm pretty sure) and is fast. So, once you have your root Comments, the subsequent lookups will be fast.

Here is code for a Console app to demonstrate. It does take a few seconds for it to run. Perhaps further optimization is possible.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Timers;

namespace DictionaryHierarchyTest
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            var stopwatch = new Stopwatch();
            stopwatch.Start();

            var commentsFlat = new List<CommentFlat>();
            var parentLookup = new Dictionary<Guid, List<CommentFlat>>();

            for (var i = 0; i < 100000; i++)
            {
                var id = Guid.NewGuid();
                commentsFlat.Add(new CommentFlat
                {
                    Id = id,
                    ParentId = Guid.Empty,
                    Text = $"Some text for ID:{id}"
                });

                for (var j = 0; j < 5; j++)
                {
                    var childId = Guid.NewGuid();
                    commentsFlat.Add(new CommentFlat
                    {
                        Id = childId,
                        ParentId = id,
                        Text = $"Some text for ID:{childId} Parent ID:{id}"
                    });

                    
                    for (var k = 0; k < 5; k++)
                    {
                        var grandChildId = Guid.NewGuid();
                        commentsFlat.Add(new CommentFlat
                        {
                            Id = grandChildId,
                            ParentId = childId,
                            Text = $"Some text for ID:{grandChildId} Parent ID:{childId}"
                        });
                    }

                }
            }

            foreach (var cf in commentsFlat)
            {
                if (!parentLookup.ContainsKey(cf.ParentId))
                {
                    parentLookup.Add(cf.ParentId, new List<CommentFlat>());
                }

                parentLookup[cf.ParentId].Add(cf);
            }
            stopwatch.Stop();
            var ts = stopwatch.Elapsed;
            Console.WriteLine($"There are {commentsFlat.Count} comments");
            Console.WriteLine($"{String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds / 10)} to randomly populated.");
            stopwatch.Reset();
            stopwatch.Start();

            var commentHierarchy = new List<Comment>();

            foreach (var cf in commentsFlat)
            {
                if (cf.ParentId == Guid.Empty)
                {
                    var comment = new Comment
                    {
                        Id = cf.Id,
                        ParentId = cf.ParentId,
                        Children = BuildHierarchy(cf.Id, parentLookup),
                        Text = cf.Text
                    };
                    commentHierarchy.Add(comment);
                }
            }

            stopwatch.Stop();
            ts = stopwatch.Elapsed;
            Console.WriteLine($"{String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds / 10)} to make hierarchical");
            //foreach (var c in commentHierarchy) {
            //    PrintHiearchy(c, 1);
            //}
        }
        static string Tabs(int n)
        {
            return new String('\t', n);
        }
        private static void PrintHiearchy(Comment comment, int level) {
            Console.WriteLine($"{Tabs(level)}{comment.Text}");
            foreach(var child in comment.Children){ PrintHiearchy(child, level + 1); }
        }

        private static void OnTimedEvent(Object source, ElapsedEventArgs e)
        {
            Console.WriteLine("The Elapsed event was raised at {0:HH:mm:ss.fff}",
                              e.SignalTime);
        }

        private static List<Comment> BuildHierarchy(Guid parentId, Dictionary<Guid, List<CommentFlat>> flatComments)
        {
            if (flatComments.ContainsKey(parentId))
            {
                return flatComments[parentId].Select(c => new Comment
                {
                    Id = c.Id,
                    ParentId = c.ParentId,
                    Text = c.Text,
                    Children = BuildHierarchy(c.Id, flatComments)
                }).ToList();
            }
            else
            {
                return new List<Comment>();
            }
        }

    }

    public class Comment
    {
        public Guid Id { get; set; }
        public Guid ParentId { get; set; }
        public string Text { get; set; }
        public List<Comment> Children { get; set; }
    }

    public class CommentFlat
    {
        public Guid Id { get; set; }
        public Guid ParentId { get; set; }
        public string Text { get; set; }
    }
}