Find Mean of all the numeric variables of a Spark dataframe in R

Question

Find Mean of all the numeric variables of a Spark dataframe in R

534 Views Asked by Sandeep Gupta At 27 July 2025 at 16:20

I have a Spark Dataframe with the below structure present in R :-

Var1-----    Var 2-----   Var 3 -------      Var 4-----        Group  
98.64----   32.35----   11906.91--  08.65-----   A  
94.83----   29.36----   17287.57--  06.01-----   B  
99.94----   35.36----   30411.85--  08.82-----   C  
99.45----   34.58----   18267.26--  10.09-----   C  
99.93----   36.64----   23560.04--  07.34-----   A  
99.66----   48.81----   42076.44--  08.44-----   B  
99.96----   27.38----   18474.01--  11.39-----   A  
97.49----   25.28----   14615.50--  06.60-----   B  
98.98----   32.50----   10282.90--  07.71-----   C  
99.57----   31.54----   12725.56--  06.17-----   C  
99.91----   26.46----   10990.13--  06.17-----   C

This is my representative dataset, number of records are pretty huge. Similarly number of columns are more than 200 as well.

Can someone please help me with the following result set. For a local dataframe in R, doing this using DPLYR is very easy. But working on Spark Dataframe seems

Group   Average_Var1    Average_Var2    Average_Var3    Average_Var4  
A   -----    99.51  ------------    32.13   ----------    17980.34  -----    9.13  
B   -----    97.32  ------------    34.42   ----------    24659.83  -----    6.89  
C   -----    99.57  ------------    32.10   ----------    16535.54  -----    7.78

Original Q&A

There are 3 best solutions below

Imran Ali On 01 September 2017 at 10:42

base function by can be used with colMeans as follows:

by(df[, 1:4], df[,"Group"], colMeans)

output:

df[, "Group"]: A
        Var1         Var2         Var3         Var4 
   99.516118    32.130696 17980.341453     9.130542 
----------------------------------------------------------- 
df[, "Group"]: B
        Var1         Var2         Var3         Var4 
   97.328825    34.489235 24659.840630     6.874534 
----------------------------------------------------------- 
df[, "Group"]: C
        Var1         Var2         Var3         Var4 
   99.575422    32.109159 16535.543470     7.787882

Prasanna Nandakumar On 01 September 2017 at 09:55

> aggregate(df[, 1:4], list(df$Group), mean)
  Group.1     Var1    Var.2    Var.3    Var.4
1       A 99.51612 32.13070 17980.34 9.130542
2       B 97.32882 34.48923 24659.84 6.874534
3       C 99.57542 32.10916 16535.54 7.787882

**tushaR** · Accepted Answer

tushaR On 01 September 2017 at 10:47 BEST ANSWER

Using sparklyr try this:

df%>% group_by(Group)%>% summarize_all(.funs = mean)

Find Mean of all the numeric variables of a Spark dataframe in R

There are 3 best solutions below

Related Questions in R

Related Questions in SPARKR

Trending Questions

Popular # Hahtags

Popular Questions