Given this:
dict = Dict(("y" => ":x / 2"))
df = DataFrame(x = [1, 2, 3, 4])
df
4×1 DataFrame
│ Row │ x │
│ │ Int64 │
├─────┼───────┤
│ 1 │ 1 │
│ 2 │ 2 │
│ 3 │ 3 │
│ 4 │ 4 │
I want to make this:
4×2 DataFrame
│ Row │ x │ y │
│ │ Int64 │ Float64 │
├─────┼───────┼─────────┤
│ 1 │ 1 │ 0.5 │
│ 2 │ 2 │ 1.0 │
│ 3 │ 3 │ 1.5 │
│ 4 │ 4 │ 2.0 │
This seems like a perfect application for DataFramesMeta
, either @with
or @eachrow
, but I haven't been able to get my expression to evaluate as expected in an environment where :x
exists.
Basically, I want to be able to iterate over (k, v)
pairs in dict
and create one new column for each Symbol(k)
with corresponding values eval(Meta.parse(v))
, or something along those lines, where the evaluation occurs such that Symbols
like :x
exist at the time of evaluation.
I didn't expect this to work, and it doesn't:
[df[Symbol(k)] = eval(Meta.parse(v)) for (k, v) in dict]
ERROR: MethodError: no method matching /(::Symbol, ::Int64)
But this illustrates the problem: I need the expressions to be evaluated in an environment where the symbols they contain exist.
However, moving it inside a @with
doesn't work:
using DataFramesMeta
@with(df, [eval(Meta.parse(v)) for (k, v) in dict])
ERROR: MethodError: no method matching /(::Symbol, ::Int64)
Using @eachrow
fails the same way:
using DataFramesMeta
@eachrow df begin
for (k, v) in dict
@newcol tmp::Vector{Float32}
tmp = eval(Meta.parse(v))
end
end
ERROR: MethodError: no method matching /(::Symbol, ::Int64)
I'm guessing I'm unclear on some key element of how DataFramesMeta
creates an environment within a DataFrame. I also don't necessarily have to use DataFramesMeta
for this, any reasonably concise option will work since I can encapsulate it in a package function.
Note: I control the format of the strings to be parsed into expressions, but I want to avoid complexity such as specifying the name of the DataFrame object in the string, or broadcasting every operation. I want the expression syntax in the initial string to be reasonably clear to non-Julia programmers.
UPDATE: I tried all three solutions in the comments on this question, and they have a problem: they don't work inside functions.
dict = Dict(("y" => ":x / 2"))
data = DataFrame(x = [1, 2, 3, 4])
function transform_from_dict(df, dict)
new = eval(Meta.parse("@transform(df, " * join(join.(collect(dict), " = "), ", ") * ")"))
return new
end
transform_from_dict(data, dict)
ERROR: UndefVarError: df not defined
Or:
function transform_from_dict!(df, dict)
[df[!, Symbol(k)] = eval(:(@with(df, $(Meta.parse(v))))) for (k, v) in dict]
return nothing
end
transform_from_dict!(data, dict)
ERROR: UndefVarError: df not defined
I have worked on this answer in parallel to @Ajar, nothing is copied from that answer nor did I know about it. I was totally new to Julia so I had to install it (because I thought the online compilers did not even know a DataFrame), later I understood that these packages must be called at start anyway, be it online or offline. I have added the package information that beginners might need to know.
The @with solution:
The @transform solution:
Thanks go to the other commentators, the essential ideas listed in @Ajar's answer.