I have a dataset with information about books. There is a column named author where you can see the author(s) of each book. Sometimes there is more than one author, they are separated by ","
Now I need to count how many authors each book has.
My first intention was to count the commas in the column, but I just don't know how this works in Rapidminer.
I tried Generate Attributes with an if statement, but it just counts to 1 because there is no loop.
I also tried to split the column, so I have each author in one column, but I don't know how to move on.
That's what I want to have in the end.
Author: Wiebke Krabbe Count Authors: 1
Author: Claudia Fischer, Ilona Butterer Count Authors: 2
Author: Juliane Seidel, Christin Hopf, Paul Walsh Count Authors: 3
I would be so thankful if anyone could help me.
Your suggestion to split the author attribute on the comma separator was indeed accurate. The next step involves creating a count by aggregating the splits once again using the 'Generate Aggregation' operator. This operator functions 'row-wise' and is capable of counting the occurrences within the newly created author-columns.
Example RapidMiner process: