Suppose I have the following DataFrame, and I want to shuffle the rows and columns of the DataFrame with a specific seed value. I tried the following to obtain shuffled indexes, but it gave me a different result every time:
julia> using Random, DataFrames, StatsBase
julia> Random.seed!(123)
julia> df = DataFrame(
col1 = [1, 2, 3],
col2 = [4, 5, 6]
);
julia> idx_row, idx_col = sample.(
[1:size(df, 1), 1:size(df, 2)],
[length(1:size(df, 1)), length(1:size(df, 2))],
replace=false
)
2-element Vector{Vector{Int64}}:
[1, 2, 3]
[2, 1]
julia> idx_row, idx_col = sample.(
[1:size(df, 1), 1:size(df, 2)],
[length(1:size(df, 1)), length(1:size(df, 2))],
replace=false
)
2-element Vector{Vector{Int64}}:
[2, 1, 3]
[2, 1]
As you can see, it's shuffling the values, but it doesn't consider the seed!. How can I shuffle rows and columns of a DataFrame in a reproducible way, like setting a specific seed?
Fortunately, you imported a helpful package named
Random. However, you didn't search for the function namedshuffle. All can be achieved by the following:The result is reproducible and won't change after each run, despite being a random process.
Additional point
Note that there is a customized dispatch of the
shufflefunction suitable for shuffling rows of a givenDataFrame:*Note that this only shuffles the rows.