Making R Script Flexible: User Input for Data and Column Selection in Bash

29 Views Asked by At

I wanted to run statistical analysis on my dataset. I have simplified manually on the dataset, so file contained: ID | value | variable | string.

I have written workable R script, tried running it in R and everything worked fine.

cross_statF <- function(df) {
  by_cross <- df %>%
    group_by(ID) %>%
    summarize(
      count = n(),
      mean = mean(value),
      variance = var(value),
      s.deviation = sd(value),
      minimum = min(value),
      maximum = max(value)
    )
}

bycross_stat <- cross_statF(file)

Now, I wanted to allow flexible file input, in other word, I would like the statistical analysis in file applicable for different datasets. I have tried to use readline, it worked only with one line at a time.

I know how bash can parse user input into the command in the script file, but haven't successfully make it work in R.

This is how far I am in:

file.R

main <- function() {
  args <- commandArgs(trailingOnly = TRUE)
  if (length(args) != 2) {
    stop("Error: Please provide both data file and column name!")
  }

  modf_file <- args[1]
  value <- args[2]

  file <- read.csv(modf_file, header=TRUE, sep="\t", dec=".")

  if (!is.character(value) || !(value %in% names(file))) {
    stop(paste0("Error: Column '", value, "' does not exist in the data file!"))
  }

  library(dplyr)

  # Function to calculate basic statistics
  cross_statF <- function(df, value) {
    by_cross <- df %>%
      group_by_at(1)  
      summarize(
        count = n(),
        mean = mean(get(value)),
        variance = var(get(value)),
        s.deviation = sd(get(value)),
        minimum = min(get(value)),
        maximum = max(get(value))
      )
  }

  tryCatch({
    bycross_stat <- cross_statF(file, value)
    print(bycross_stat)
  }, error = function(e) {
    message(paste0("Error during analysis:", e))
  })
}

main()

I ran in bash, with command: Rscript file.R modf_dum.csv value

And the error I received:

Error in UseMethod("group_by") : 
  no applicable method for 'group_by' applied to an object of class "function"
Calls: main -> cross_statF -> %>% -> summarize -> group_by
Execution halted

This is only one statistical test so far, there are more than 5, I am planning to integrate together.

How can I correct/improve my script?

0

There are 0 best solutions below