We have two types of documents: books and book sections. We use TDE to define views for the two types.
Schema (relevant part):
- view books: id, title
- view booksections: id, bookid
The use case is to list the books with more than 5000 sections. For each book, title and number of sections should be returned. Using the Optic API, the query with group-by is like this:
op:from-view("myschema", "books") => op:join-inner(op:from-view("myschema", "booksections"), op:on( op:view-col("books", "id"), op:view-col("booksections", "bookid"))) => op:group-by( (op:view-col("books", "title")), (op:count("count", op:view-col("booksections", "id")))) => op:where(op:ge(op:col("count"), 5000)) => op:select((op:view-col("books", "title"), "count")) => op:order-by(op:desc("count")) => op:result()
The query returns a small result set: 4 books. Now the interesting thing is that this query needs 5 seconds to complete, and if I remove the op:order-by statement, only 3 seconds. Somehow 2 seconds are spent to order the 4 books in the result.
Is there anything I can do to speed up the ordering (except for doing the ordering as a post-processing step)?
The times were measured with warm triple caches. op:explain shows the order-by operation as the outer-most operation, indicating that the ordering is applied to the small set of 4 books. Using SQL has resulted in the same run times and the same acceleration without order-by.
You will probably be better off contacting MarkLogic Support with your test case. I'd say that Ramesh is right, and that the query optimizer is picking a sub-optimal query plan for your query.