Data I also have the total number of cancer patients (case_totals) and non-cancer patients(control_totals) which in this case is 100 and 1000 respectively.
Variant  Cancer IBD AKI CKD CCF IHD
A1         0    5   4   0   0   4
A2         0    8   5   9   0   7
A3         20   9   6   7   0   3
B5         7    2   0   6   5   4
K7         9    1   8   4   2   5
L9         0    0   6   3   3   1
Desired outcome - two tables: Table1:
 Variant     case_total not_seen_in_cases_total control_total not_seen_in_control_total
    A1             0           100                    13                  987  
    A2             0           100                    25                  975 
    A3             20          80                     25                  975
    B5             7           93                     17                  983
    K7             9           91                     20                  980
    L9             0           100                    13                  987
Table2:
case_total_in_gene  not_seen_in_gene_cases      control_total_in_gene control_total_not_in_gene
36                         64                            113                 887
I will then run a fishers across both tables to get a per variant and per gene p.value which I can do.
My issue is that I have multiple such datasets and in each the order of the columns of the input is different. At present I have been using:
ncol(dt) #to get the total number of columns as in reality the table is very large
which(colnames(dt)=='Cancer') #get the index column 
dt$control_total <- (rowSums(dt[,2:7])) - rowSums(dt[,2]) #get a control totals per row column 
And then subsetting dt and just adding in the other columns using subtraction e.g. dt$not_seen_in_control_total <- 1000 - dt$control_total
This won't work with shifting column indices and I want to run this across hundreds of files ideally using a commandArgs.
Ultimately how do I reference a column which will always have the same name but will be in different places in a function like RowSums etc?
Many thanks
                        
You can select column names by position or pattern in names or by specifying range of columns. It depends on how your data is structured.
data