How calculate aggregate of some data in the columns of R

122 Views Asked by At

I have a DataSet which includes some women in childbearing ages and beyond. Each woman specified with an id and DataSet includes mothers' age. For each woman we have parity. Parity1 is the first child that women have. Values in parity refers to children's age. For example, a woman with id 1 is 38 years old, his first child is 15 years old, second 13, third 10 and fourth 0 years old at the time of census.

library("tidyverse")

sample_df <- tibble(
    id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
    AGE = c(38, 39, 40, 41, 42, 43, 44, 45, 46, 47),
    parity1 = c(15, 14, 13, 12, 9, 8, 14, 13, 3, 7),
    parity2 = c(13, 9, 9, 10, 7, 4, 13, 11, NA, 5),
    parity3 = c(10, 7, 3, 3, 6, 2, 9, 15, NA, 2),
    parity4 = c( 0, NA, NA, 1, NA, 0, 0 , 1, NA, NA),
)

long story short, I want to know how many children of each age (age of child), as a total, exist in my data set based on their mothers' age. For example how many 0-year-old children we have based on their mother's age. In the end, the code should terminate to a table in which for each woman, in childbearing age, the number of their children in ages 0, 1, 2, 3, 4, 5... to be specified. The final output is supposed to seem something like this:

AGE of mother 0 1 2 3
38 1
39
40 1
41 1 1
42
43 1 1 1
44 1
45 1
46 1
47 1

Seems cruel, but I rellay appreciate any help that you can provide. Thank you so much in advance.

2

There are 2 best solutions below

0
Mohammad Haddadi On BEST ANSWER

after running the codes proposed by @Mark, try to run this code in your data. I think, this code can do the thing you mentioned above.

sample_df<-pivot_longer(sample_df,starts_with("parity"), values_drop_na =TRUE) 

sample_df<- pivot_wider(sample_df,names_from = value,names_glue = 
"year{value}", values_from = name, names_sort = TRUE, values_fn = length, 
values_fill = 0)
        
child<-aggregate(cbind(sample_df$year0, 
sample_df$year1,sample_df$year2,sample_df$year3,sample_df$year4,sample_df$year5,sample_df$year6,sample_df$year7,sample_df$year8,sample_df$year9,sample_df$year10,sample_df$year11,sample_df$year12,sample_df$year13,sample_df$year14), list(sample_df$age), sum)
8
Mark On

Steps:

  1. Make the data long, dropping NA values
  2. Make the data wide, sorting the new columns, getting the length of them (i.e. 1), and then all NAs are replaced by zeros
sample_df |>
  pivot_longer(starts_with("parity"), values_drop_na = TRUE) |>
  pivot_wider(id_cols = AGE, names_from = value, values_from = name, names_sort = TRUE, values_fn = length, values_fill = 0)

Output:

# A tibble: 10 × 17
     AGE   `0`   `1`   `2`   `3`   `4`   `5`   `6`   `7`   `8`   `9`  `10`  `11`
   <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
 1    38     1     0     0     0     0     0     0     0     0     0     1     0
 2    39     0     0     0     0     0     0     0     1     0     1     0     0
 3    40     0     0     0     1     0     0     0     0     0     1     0     0
 4    41     0     1     0     1     0     0     0     0     0     0     1     0
 5    42     0     0     0     0     0     0     1     1     0     1     0     0
 6    43     1     0     1     0     1     0     0     0     1     0     0     0
 7    44     1     0     0     0     0     0     0     0     0     1     0     0
 8    45     0     1     0     0     0     0     0     0     0     0     0     1
 9    46     0     0     0     1     0     0     0     0     0     0     0     0
10    47     0     0     1     0     0     1     0     1     0     0     0     0
# ℹ 4 more variables: `12` <int>, `13` <int>, `14` <int>, `15` <int>