How to make a random sample in Openrefine?

440 Views Asked by At

Very often we need to extract random samples of a large dataset? What is the best way to do it on openrefine? This might be useful for practitioners used to do it in R and Python.

Thanks in advance for any advice!

1

There are 1 best solutions below

2
On BEST ANSWER

Open Refine has not built-in function for that, but you can use Python/Jython to create a new column of random integers. eg, if you have 100 000 rows :

import random
return random.randint(0, 100000)

Then, you can sort this columns, reorder rows permanently and select for example the first thousand with a custom text facet :

row.index < 1000

EDIT : I forgot that this extension from @OwenStephens adds a randomNumber GREL function. Feel free to install it.

enter image description here