Boostrap of Dataframe for Re-estimation

55 Views Asked by At

I have a given dataset and want to create a bootstrap (with resampling) of 10 replicates of the dataframe and have new ids assigned to them in the final dataset. Please guide me on this.

Random.seed!(123)
df = DataFrame(
    id = ["1","1","1","2","2","2","3","3","3"],
    time = [0,0.5,2,0,0.5,2,0,0.5,2],
    cmt = [1,0,0,1,0,0,1,0,0],
    value = [0.01,0.02,0.03,0.02,0.03,0.05,0.01,0.05,0.10]
)

Thanks a lot.

2

There are 2 best solutions below

1
On

Thanks, finally found what I was looking for. Let me know if there is a better way to do it.

df = DataFrame(
    id = ["1","1","1","2","2","2","3","3","3"],
    time = [0,0.5,2,0,0.5,2,0,0.5,2],
    evid = [1,0,0,1,0,0,1,0,0],
    dv = [missing,0.02,0.03,0.02,0.03,0.05,0.01,0.05,0.10]
)

# Sample the id column and create a vector with 3 replicates of each id
sampled_ids = vcat([sample(unique(df.id), 3) for _ in 1:3]...)

# Sample all rows of each ID using sampled_ids vector
new_df = DataFrame(id = String[], time = Float64[], evid = Int64[], new_id = Any[], dv = Vector{Union{Missing, Float64}}())

for (i, unique_id) in enumerate(sampled_ids)
    id_rows = df[df.id .== unique_id, :]
    id_rows.new_id .= i
    append!(new_df, id_rows)
end


# Print the new DataFrame
println(new_df)
0
On

I am not fully sure that this is what you want:

julia> reduce(vcat, [df[rand(1:nrow(df), nrow(df)), :] for i in 1:10], source="replicate")
90×5 DataFrame
 Row │ id      time     cmt    value    replicate
     │ String  Float64  Int64  Float64  Int64
─────┼────────────────────────────────────────────
   1 │ 2           0.5      0     0.03          1
   2 │ 3           2.0      0     0.1           1
   3 │ 2           0.5      0     0.03          1
   4 │ 3           0.0      1     0.01          1
   5 │ 1           0.0      1     0.01          1
   6 │ 2           0.5      0     0.03          1
   7 │ 1           2.0      0     0.03          1
   8 │ 3           2.0      0     0.1           1
   9 │ 3           0.0      1     0.01          1
  10 │ 2           0.5      0     0.03          2
  11 │ 3           0.5      0     0.05          2
  12 │ 3           0.5      0     0.05          2
  13 │ 2           0.0      1     0.02          2
  14 │ 2           0.5      0     0.03          2
  ⋮  │   ⋮        ⋮       ⋮       ⋮         ⋮
  78 │ 2           0.5      0     0.03          9
  79 │ 3           2.0      0     0.1           9
  80 │ 1           0.0      1     0.01          9
  81 │ 1           0.5      0     0.02          9
  82 │ 1           0.5      0     0.02         10
  83 │ 1           0.0      1     0.01         10
  84 │ 2           0.0      1     0.02         10
  85 │ 2           0.5      0     0.03         10
  86 │ 1           0.5      0     0.02         10
  87 │ 1           2.0      0     0.03         10
  88 │ 1           0.0      1     0.01         10
  89 │ 1           0.5      0     0.02         10
  90 │ 3           0.5      0     0.05         10
                                   63 rows omitted

It creates is a data frame, where for each unique replicate value you have a separate bootstrap sample of your original df data frame.