I use a query in the Stack Exchange Data Explorer (SEDE).
This is my query:
SELECT A.Id
, A.PostTypeId
, A.Title
, A.Body
, A.ParentId
, A.Tags
, A.CreationDate
FROM posts A
LEFT JOIN users U
ON A.OwnerUserId = U.id
WHERE U.Id = ##UserId##
AND A.PostTypeId = 1
UNION
SELECT A.Id
, A.PostTypeId
, A.Title
, A.Body
, A.ParentId
, B.Tags
, A.CreationDate
FROM posts A
LEFT JOIN users U
ON A.OwnerUserId = U.id
RIGHT JOIN posts B
ON A.ParentId = B.Id
WHERE U.Id = ##UserId##
AND A.PostTypeId = 2
In the code above, posts in Stack Overflow have 2 types: question and answer. Questions(PostTypeId
is 1 in database schema) have the tags, but the answers(PostTypeId
is 2 in database schema) do not have the tags.
Answers belong to questions through the ParentId
.
But the efficiency of the my query above is too low, I only can get some (using user id) posts' tags.
How can I get all users' posts' tags within the SEDE time out?
Several things:
Users
table but ID, then don't include that table. It chews up cycles andPosts.OwnerUserId
is the same thing.UNION
statements if possible (it is in this case).UNION
statements, useUNION ALL
if possible (it is in this case). This saves the engine from having to do duplicate checks.So, here is the execution plan for the original query:
Here is a streamlined plan:
And the query that corresponds to it:
-- which also gives more readable results -- especially when the
WHERE
clause is removed.But, if you can limit by, say, user before hand; you get an even more efficient query:
(This query adds a convenient hyperlink to the user id.)
Note that just the top 10 users have more than 50K posts.