Here's the code:
using DataFrames, DataFramesMeta
# Creating a sample DataFrame
df = DataFrame(ID = 1:5, Med1 = [0, 1, 0, 1, 0])
# Using @rsubset directly
result1 = @rsubset df :Med1 == 0
# Using a symbol
st = Symbol("Med1")
@rsubset df st == 0
# Checking if the results are the same
isequal(result1, result2)
The result is false - why?
I've been trying many different combinations of this and if I don't define the symbol directly on the expression, it never works. I'd appreciate some advice on what are the best practices for working with Dataframe's column naming conventions (I have a bunch of datasets with columns labeled with numbers like "Med1", "Med2", etc ... and I wanna iterate on those numbers, which is how I ended up trying to create Symbols)
Near the end of the Introduction section of the docs has:
So (as the comment mentions), you need
$st
to havest
's value used as the column name.The reason for this (to my understanding) doesn't have to do with any limitations or inner workings of Julia's metaprogramming, but rather with convention.
st == 0
looks like it's comparingst
's value to0
, so to have it silently compare the column whose name is contained withinst
would be unexpected and "magical". When building large codebases, this kind of magic tends to make the code less readable and maintainable. Explicitly marking the column accesses with:
or$
makes it easier to see where we're referring to a column, vs. where we're accessing a variable for its own value.(There do exist packages like Tidier.jl which trade off being somewhat more magical for the sake of convenience. For eg.
@rsubset df :Med1 == 0
would be written as@filter df Med1 == 0
in Tidier, with the name "Med1" automatically referring to the column. This is an exception that's explicitly intended to follow R's conventions rather than Julia's.)Having column access have special syntax also makes it easier to access normal variables in your code, for eg.
Here, because column access has special syntax (
$
), there's no confusion aboutx
ory
- they refer to the normal variablesx
andy
as expected.(In contrast, since Tidier doesn't require special syntax for column names, it goes the other way and has special syntax for referring to normal variables, for eg.
@filter df Med1 == !!x + !!y
.)So ultimately, it's a design decision by DataFramesMeta developers, not something intrinsic to Julia metaprogramming.