How to convert outcome of table function to a dataframe

3.8k Views Asked by At
df = data.frame(table(train$department , train$outcome)) 

Here department and outcome both are factors so it gives me a dataframe which looks like in the given image

is_outcome is binary and df looks like this

containing only 2 variables(fields) while I want this department column to be a part of dataframe i.e a dataframe of 3 variables

                         0    1 
Analytics             4840  512
Finance               2330  206 
HR                    2282  136 
Legal                  986   53 
Operations           10325 1023
Procurement           6450  688
R&D                    930   69
Sales & Marketing    15627 1213 
Technology            6370  768 

One way I learnt was...

df = data.frame(table(train$department , train$is_outcome))
write.csv(df,"df.csv")
rm(df)
df = read.csv("df.csv")
colnames(df) = c("department", "outcome_0","outcome_1")

but I cannot save file in everytime in my program

is there any way to do it directly.

1

There are 1 best solutions below

0
On

When you are trying to create tables from a matrix in R, you end up with trial.table. The object trial.table looks exactly the same as the matrix trial, but it really isn’t. The difference becomes clear when you transform these objects to a data frame. Take a look at the outcome of this code:

   > trial.df <- as.data.frame(trial)
    > str(trial.df)
  ‘data.frame’: 2 obs. of 2 variables:
   $ sick  : num 34 11
    $ healthy: num 9 32

Here you get a data frame with two variables (sick and healthy) with each two observations. On the other hand, if you convert the table to a data frame, you get the following result:

  > trial.table.df <- as.data.frame(trial.table)
    > str(trial.table.df)
   ‘data.frame’: 4 obs. of 3 variables:
    $ Var1: Factor w/ 2 levels “risk”,”no_risk”: 1 2 1 2
    $ Var2: Factor w/ 2 levels “sick”,”healthy”: 1 1 2 2
     $ Freq: num 34 11 9 32

The as.data.frame() function converts a table to a data frame in a format that you need for regression analysis on count data. If you need to summarize the counts first, you use table() to create the desired table.

Now you get a data frame with three variables. The first two — Var1 and Var2 — are factor variables for which the levels are the values of the rows and the columns of the table, respectively. The third variable — Freq — contains the frequencies for every combination of the levels in the first two variables.

In fact, you also can create tables in more than two dimensions by adding more variables as arguments, or by transforming a multidimensional array to a table using as.table(). You can access the numbers the same way you do for multidimensional arrays, and the as.data.frame() function creates as many factor variables as there are dimensions.