Very often we need to extract random samples of a large dataset
? What is the best way to do it on openrefine
? This might be useful for practitioners used to do it in R
and Python
.
Thanks in advance for any advice!
Very often we need to extract random samples of a large dataset
? What is the best way to do it on openrefine
? This might be useful for practitioners used to do it in R
and Python
.
Thanks in advance for any advice!
Copyright © 2021 Jogjafile Inc.
Open Refine has not built-in function for that, but you can use Python/Jython to create a new column of random integers. eg, if you have 100 000 rows :
Then, you can sort this columns, reorder rows permanently and select for example the first thousand with a custom text facet :
EDIT : I forgot that this extension from @OwenStephens adds a randomNumber GREL function. Feel free to install it.