I'm starting with Polars and I'm trying to word count a list of strings in polars and get the results as a dict into a polars dataframe.
Basically, I have this input dataframe:
df_test = pl.DataFrame({'a': [['the', 'dog', 'is', 'good', 'a'], ['toto', 'tata', 'I']]})
shape: (2, 1)
┌───────────────────────┐
│ a │
│ --- │
│ list[str] │
╞═══════════════════════╡
│ ["the", "dog", … "a"] │
│ ["toto", "tata", "I"] │
└───────────────────────┘
from the command:
df_test.with_columns(
pl.col('a').arr.eval(pl.element().value_counts())
)
I get this output:
shape: (2, 1)
┌───────────────────────────────────┐
│ a │
│ --- │
│ list[struct[2]] │
╞═══════════════════════════════════╡
│ [{"is",1}, {"dog",1}, … {"good",… │
│ [{"tata",1}, {"I",1}, {"toto",1}… │
└───────────────────────────────────┘
Is there a way to get the result of value_counts like this?
Thanks by advance

You can cast the
value_countsoutput to a string, do some operations on each string (remove all{}, replace,with:),arr.jointhe list into one big string, then add back the final{}withpl.format:This works as long as no words have a
,, which for a word count seems like a safe assumption. I think there's a potentially more general answer with usingpl.formatwithinarr.eval, but I got errors when trying to usestructexpressions as arguments topl.formatwithin that context.