How would I automate computing correlations within a tibble for various countries and store effectively?

Question

How would I automate computing correlations within a tibble for various countries and store effectively?

93 Views Asked by TheEconomist At 26 June 2025 at 17:40

Somewhat of a beginner in R and I am working on a relatively large dataset (for me at least) of around 500,000 rows.

I am trying to find the correlation between variables for various countries (measuring the effects of bullying specifically) for the PISA dataset (education based survey).

I am able to compute the correlation matrix for countries on a case by case basis.

I wanted to record the correlation between two variables (so not the entire matrix necessarily) across all these countries - automating this and storing the results all in a tibble so that I don’t need to spend time doing this manually.

correl_countries = tibble()

for (each in list_countries){
  countries_bullying %>% #tibble subset of the original data 
    filter(CNTRYID == each)%>%
    select(reading_score, bullied_index)%>%
    correl = cor(use = "pairwise.complete.obs") #something to store the correlation values
    correl_countries %>% add_row(x = each, y = correl) #wanted to add these results to a tibble
}

Currently nothing seems to happen and I receive this error.

Error in is.data.frame(x) : argument "x" is missing, with no default

It may have something to do with the fact that "pairwise.complete.obs" generates a correlation matrix and not a single vector.

Grateful for your recommendations!

Original Q&A

There are 2 best solutions below

Donald Seinen On 19 December 2020 at 08:29

New user here- somehow can't place comments. If I understood correctly, you want to compute the correlation between 2 variables, per country, and store it in a separate tibble. Replace "df" with the name of your dataset, and "countries" with the variable in your dataset containing all the countries. For large datasets, a more elegant solution is likely available (i.e subsetting less variables each loop).

correl_countries <- c()
vec <- unique(df$countries)
for (i in 1:length(vec)) {
    new <- df[df$countries == vec[i],]
    correl_countries[i] <- cor(new$var1, new$var2)
}
tibble(vec, correl_countries)

**meriops** · Accepted Answer

You don't really need the loop here, the tidyverse has got you covered... The following returns a tibble with 2 columns: CNTRYID and correl:

library(tidyverse)

# get only the correlations
countries_bullying %>%
  group_by(CNTRYID) %>%
  summarise(correl = cor(reading_score, bullied_index, use = "pairwise.complete.obs"))

How would I automate computing correlations within a tibble for various countries and store effectively?

There are 2 best solutions below

Related Questions in R

Related Questions in DATAFRAME

Related Questions in TIBBLE

Related Questions in USAGE-STATISTICS

Related Questions in PEARSON-CORRELATION

Trending Questions

Popular # Hahtags

Popular Questions