Restructuring dataframe exported from Network Canvas

15 Views Asked by At

I am analysing the network canvas data for my project in RStudio. When exporting from NC, categorical variables export as one column per category with a Boolean value for each possible category.

There are a number of variables that I need to recode to factor variable in a single column. For example, ethnicity/gender.

The code (as printed on the NC site - copied below) has run successfully. I have defined the catToFactor function, created the variable list and applied the function to this. This has creating columns with the relevant factor structure (e.g. a “Gender” column with factors 0 male, 1 female, 2 transgender etc), but data has not been pulled through for each case it simply records an NA for each person. I can print the new column and values - all NA.

Is there any update to this code or something I have done wrong?

catToFactor <- function(dataframe,variableName) {
    fullVariableName <- paste0(variableName,"_")
    catVariables <- grep(fullVariableName, names(dataframe), value=TRUE)
    # Check if variable exists
    if (identical(catVariables, character(0))){
      stop(paste0("Cannot find variable named -",variableName,"- in the data"))
    # Check if "true" in multiple columns of a single row
    } else if (sum(apply(dataframe[,catVariables], 1, function(x) sum(x %in% "true")>1))>0) {
      stop(paste0("Your variable -",variableName,"  - appears to take multiple values.")) }
    catValues <- sub(paste0('.*',fullVariableName), '', catVariables)
    factorVariable <- c()
    for(i in 1:length(catVariables)){
      factorVariable[dataframe[catVariables[i]]=="true"] <- catValues[i]
    }
    return(factor(factorVariable,levels=catValues))
}

# List of categorical variables in our protocol to convert into factors
categoricalVariablesList <- list('Gender','Race','SexOrient')

# Iterate the list and call our catToFactor function, assigning the result to a new column in our dataframe
for (variable in categoricalVariablesList) {
  alterData[variable] <- catToFactor(alterData, variable)
}

I have tried the code across variables and looked into alternative structures to this code.

1

There are 1 best solutions below

0
CatrionaC On

New code provided by NC who will update their web platform guidance:

catToFactor <- function(dataframe,variableName) {
    fullVariableName <- paste0(variableName,"_")
    catVariables <- grep(fullVariableName, names(dataframe), value=TRUE)
    # Check if variable exists
    if (identical(catVariables, character(0))){
      stop(paste0("Cannot find variable named -",variableName,"- in the data"))
    # Check if "true" in multiple columns of a single row
    } else if (sum(apply(dataframe[,catVariables], 1, function(x) sum(x %in% "true")>1))>0) {
      stop(paste0("Your variable -",variableName,"  - appears to take multiple values.")) }
    catValues <- sub(paste0('.*',fullVariableName), '', catVariables)
    factorVariable <- c()
    for(i in 1:length(catVariables)){
      factorVariable[dataframe[catVariables[i]]=="TRUE"] <- catValues[i]
    }
    return(factor(factorVariable,levels=catValues))
}