sample query for review for improvement on big query

17 Views Asked by At

review the code and explain what is required to fix it. new to sql and this is an review assigment. So trying to understand what is the logic being done this is to be run on a big query platform this is using CTE this is sample code with sample tables

I'm trying to figure out what is the best way to take a random sample of 100 records for each group in a table in Big Query.

For example, I have a table where column A is a unique recordID, and column B is the groupID to which the record belongs. For every distinct groupID, I would like to take a random sample of 100 recordIDs. Is there a simple way to complete this?

1

There are 1 best solutions below

0
DN Oomsoo On

Try the following:

Essentially I assigned an index number to every columnID under each recordID and ordered it by random. Then I used qualify to filter the number of rows I wanted. If qualify isn't supported, I would use a where index<=100 filter instead. This should give you random results each time.

SELECT *,
  ROW_NUMBER() OVER(PARTITION BY recordID ORDER BY RAND()) AS INDEX
FROM
  `optimal-weft-418017.78247507.78247507_01` QUALIFY INDEX<=3
ORDER BY
  recordID