How can I calculate the standard deviation and mean across rows of various columns?

Question

How can I calculate the standard deviation and mean across rows of various columns?

119 Views Asked by carias At 29 July 2025 at 15:48

I want to calculate the mean and standard deviation across all rows for the columns that start with Sum of EBIT [CY 2]. I was able to calculate the mean, by adding the 10 columns and dividing it by 10.

As can be seen below:

pub fn industry_beta_f(raw_data:DataFrame, marginal_tax_rate:Expr) -> DataFrame{

    let df = raw_data.clone().lazy()
                                .with_columns([
                                    ((col("Sum of EBIT [CY 2011] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2012] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2013] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2014] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2015] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2016] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2017] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2018] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2019] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2020] ($USDmm, Historical rate)")) / lit(10.0)).alias("moments_mean"),
                                    (col("Sum of EBIT [CY 2011] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2012] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2013] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2014] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2015] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2016] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2017] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2018] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2019] ($USDmm, Historical rate)") + col("Sum of EBIT [CY 2020] ($USDmm, Historical rate)")).std(1).alias("moments_std"),

                                ])
                                .with_columns([
                                    when(col("moments_mean").gt(lit(0.0)))
                                    .then(col("moments_std") / col("moments_mean"))
                                    .otherwise(f64::NAN)
                                    .alias("Standard deviation in operating income (last 10 years)")
                                ])
                                .select([col("Industry Name"),
                                        col("Number of firms"),
                                        col("Standard deviation in operating income (last 10 years)")])
                                .collect()
                                .unwrap();
    return df
}

I am having trouble calculating the standard deviation across all the rows for the columns that start with Sum of EBIT [CY 2]. Because using the std() formula, it calculates the std for each column instead of across the rows.

Current Output

Expected Output

There is a huge gap between the two outputs. Since, for the current output the std is calculated across the column, and for the expected output the std is calculated across the rows.

Original Q&A

There are 1 best solutions below

**jvanbuel** · Answer 1

You can calculate the mean using the built-in mean_horizontal method of a DataFrame.

The standard deviation is not supported out-of-the-box, and therefore a bit trickier. You first calculate the mean, then the sum of squared errors, and then divide it by the number of columns - 1, as follows:

use polars::prelude::*;

fn main() -> PolarsResult<()> {
    let mut df = df! (
        "col_1" => &[1, 2, 3, 4, 5],
        "col_2" => &[2, 4, 6, 8, 5],
        "col_3" => &[10, 8, 6, 4, 5],
    )?;

    let n = df.get_column_names().len() as i32;

    let col_mean = df.mean_horizontal(polars::frame::NullStrategy::Ignore)?;

    let df_w_mean: &mut DataFrame;
    if let Some(mean) = col_mean {
        df_w_mean = df.with_column(mean.with_name("col_mean"))?;
    } else {
        return Err(PolarsError::ComputeError("No mean can be calculated".into()));
    }
    
    let sse = df_w_mean
        .clone()
        .lazy()
        .with_column((col("*") - col("col_mean")).pow(2))
        .collect()?
        .sum_horizontal(polars::frame::NullStrategy::Ignore)?;
    
    let df_w_sse: &mut DataFrame;
    if let Some(sse) = sse {
        df_w_sse = df_w_mean.with_column(sse.with_name("col_std"))?;
    } else {
        return Err(PolarsError::ComputeError("No sse can be caculated".into()));
    }

    let result = df_w_sse.clone().lazy().with_column((col("col_std") / lit(n - 1)).sqrt()).collect()?;
    println!("{:?}", result);

    Ok(())
}

How can I calculate the standard deviation and mean across rows of various columns?

There are 1 best solutions below

Related Questions in RUST

Related Questions in RUST-POLARS

Trending Questions

Popular # Hahtags

Popular Questions