Accessing Separate Upvote and Downvote Counts for Questions and Answers Using Stack Exchange Data Explorer (SEDE)

48 Views Asked by At

I'm conducting research on user engagement and satisfaction within specific tags on Stack Overflow. I'm utilizing the Stack Exchange Data Explorer (SEDE) to extract data, particularly the voting patterns of users on questions and answers.

While I can access the net vote score of questions and answers using SEDE, I'm interested in obtaining more detailed information, specifically the separate counts of upvotes and downvotes for both questions and answers. This level of granularity would provide deeper insights into user interactions and sentiment.

Is there a way to retrieve separate upvote and downvote counts for questions and answers using SEDE queries? If so, could you please provide guidance or point me in the right direction on how to modify my queries to access this information?

Any assistance or insights on this matter would be greatly appreciated. Thank you!

I've explored SEDE to analyze user engagement in specific tags on Stack Overflow. I've been unable to find a way to access separate upvote and downvote counts for questions and answers using existing queries. I'm seeking guidance on how to modify queries or access this data to gain deeper insights into user interactions. Thank you!

here is my SQL code:

SELECT
    Q.*,
    A.*,
    U.*  -- User information for all answers
FROM
    (SELECT *
    FROM Posts
    WHERE Tags LIKE '%google-maps-api-3%'
        OR Tags LIKE '%google-maps-api-3%'
        AND PostTypeId = 1
        AND AcceptedAnswerId IS NOT NULL) AS Q
JOIN
    Posts AS A ON Q.Id = A.ParentId
JOIN
    Users AS U ON A.OwnerUserId = U.Id;
1

There are 1 best solutions below

3
Charlieface On

Yes, you can use the Votes table. You need to pre-aggregate it before joining. An APPLY is the easiest way to do this.

  • Since you want votes for both questions and answers, it makes sense to do this join in a CTE and refer twice to it.
  • Your subquery is unnecessary
  • Your AND OR logic is almost certainly wrong, due to logical precedence, and needs parenthesis.
WITH PostsWithVotes AS (
    SELECT
      p.*,
      v.*
    FROM Posts AS p
    CROSS APPLY (
        SELECT
          COUNT(CASE WHEN v.VoteTypeId = 2 THEN 1 END) AS UpvoteCount,
          COUNT(CASE WHEN v.VoteTypeId = 3 THEN 1 END) AS DownvoteCount
        FROM Votes v
        WHERE v.PostId = p.Id
    ) v
)
SELECT
    Q.*,
    A.*,
    U.*  -- User information for all answers
FROM PostsWithVotes AS Q
JOIN
    PostsWithVotes AS A ON Q.Id = A.ParentId
JOIN
    Users AS U ON A.OwnerUserId = U.Id
WHERE (Q.Tags LIKE '%google-maps-api-3%'
       OR Q.Tags LIKE '%google-maps-api-3%')
  AND Q.PostTypeId = 1
  AND Q.AcceptedAnswerId IS NOT NULL;

The schema docs for SEDE are here.