I have a CSV file with predictor variables like blood pressure (BP), heart rate (HR), weight, body surface area (BSA), body mass index (BMI), age, and gender.
There is a decision tree based algorithm for these variables that divides these patients into high risk yes/no category. So the HIGH_RISK is the last column i the CSV, and currently its empty. Now, even though I can use the algorithm for individual subjects (individual rows in the CSV file) to populate the HIGH_RISK column, but there are so many rows that doing that manually would be impractical.
If it were a simple addition, subtraction, multiplication etc, I would have done it in R and even in Excel. But since the algorithm involves a forking decision tree, I am not sure how to do it. But I am sure it is possible since R is so powerful. Any suggestions?
The decision tree is similar to this: http://www.scielo.br/img/revistas/sa/v70n6/a01fig04.jpg
You could use this helper function I wrote for you:
The general format is to pass a
data.frame
as the first argument and a nested list representing the decision tree as the second argument, in a format like this:Example
For the graph you provided, the second argument would look like:
where
1
indicates Discomfort and0
indicates Comfort.