Survey treatment with R language (NA values)

37 Views Asked by At

I'm trying to process a survey containing responses to questions. Some questions allow for multiple choices. I need to evaluate NA values for questions where respondents didn't answer. I'm unsure of how to organize the workflow. Could you please advise me on what to do and what steps to take to assess the actual number of truly missing responses (not artificial)? The survey contains over 500 responses and around 50 questions.

I would like also to check the correlation as it's women or men who mostly didn't answered for a particular question and so on.

I tried replacing such questions (with multiple choice) (they can take 3 or 5 columns) with a scoring system, where if at least one option is chosen for multiple-choice questions, it's assigned a value of 1 (via a loop, assigning NA for 0 responses). However, I struggle to replace a single question presented in multiple columns with just one column for the score, as everything gets misaligned. Moreover, there are questions where selecting an answer triggers additional questions. I'm unsure how to handle an analysis of such a survey. Also I can't just delete all the NA because it will affect negatively my analysis.

Example: (it's a one question but one column for each choiсe)

# LimeSurvey Field type: F
data[, 9] <- as.numeric(data[, 9])
attributes(data)$variable.labels[9] <- "Married"
data[, 9] <- factor(data[, 9], levels=c(1,0),labels=c("Oui", "Non sélectionné"))
names(data)[9] <- "SP01"
# LimeSurvey Field type: F
data[, 10] <- as.numeric(data[, 10])
attributes(data)$variable.labels[10] <- "Divorced"
data[, 10] <- factor(data[, 10], levels=c(1,0),labels=c("Oui", "Non sélectionné"))
names(data)[10] <- "SP02"
# LimeSurvey Field type: F
data[, 11] <- as.numeric(data[, 11])
attributes(data)$variable.labels[11] <- "Free"
data[, 11] <- factor(data[, 11], levels=c(1,0),labels=c("Oui", "Non sélectionné"))
names(data)[11] <- "SP03"
# LimeSurvey Field type: A
data[, 12] <- as.character(data[, 12])
attributes(data)$variable.labels[12] <- "Another"
names(data)[12] <- "SP0_other"


data$Family_situation <- NA
# Cycle for scoring
for (i in 1:nrow(data)) {
  Family_situation <- 0
  # Summarize
  Family_situation <- Family_situation + ifelse(data[i, "SP01"] == "Oui", 1, 0)
  Family_situation <- Family_situation + ifelse(data[i, "SP02"] == "Oui", 1, 0)
  Family_situation <- Family_situation + ifelse(data[i, "SP03"] == "Oui", 1, 0)
  if (!is.na(data[i, "SP0_other"]) && nchar(trimws(as.character(data[i, "SP0_other"]))) > 0) {
    Family_situation <- Family_situation + 1
  }
  # Create new column with score
  data[i, "Family_situation"] <- ifelse(Family_situation == 0, NA, Family_situation)
}

So I didn't delete three columns which are expressed in the new created column.

0

There are 0 best solutions below