How to create a cumulative variable that groups by PERMNO and arranges by date in R

146 Views Asked by At

I have a dataframe with variables from COMPUSTAT containing data on various accounting items, including SG&A expenses from different companies.

I want to create a new variable in the dataframe which accumulates the SG&A expenses for each company in chronological order. I use PERMNO codes as the unique ID for each company.

I have tried this code, however it does not seem to work:

crsp.comp2$cxsgaq <- crsp.comp2 %>%
  group_by(permno) %>%
  arrange(date) %>%
  mutate_at(vars(xsgaq), cumsum(xsgaq))

(xsgag is the COMPUSTAT variable for SG&A expenses)

Thank you very much for your help

2

There are 2 best solutions below

0
On BEST ANSWER

Your example code is attempting write the entire dataframe crsp.comp2, into a variable crsp.comp2$cxsgaq.

Usually the vars() function variables needs to be "quoted"; though in your situation, use the standard mutate() function and assign the cxsgaq variable there.

crsp.comp2 <- crsp.comp2 %>%
  group_by(permno) %>%
  arrange(date) %>%
  mutate(cxsgaq = cumsum(xsgaq))

Reproducible example with iris dataset:

library(tidyverse)
iris %>% 
  group_by(Species) %>% 
  arrange(Sepal.Length) %>% 
  mutate(C.Sepal.Width = cumsum(Sepal.Width))
0
On

Building on the answer from @m-viking, if using the WRDS PostgreSQL server, you would simply use window_order (from dplyr) in place of arrange. (I use the Compustat firm identifier gvkey in place of permno so that this code works, but the idea is the same.)

library(dplyr, warn.conflicts = FALSE)
library(DBI)

pg <- dbConnect(RPostgres::Postgres(), 
                bigint = "integer", sslmode='allow')

fundq <- tbl(pg, sql("SELECT * FROM comp.fundq"))

comp2 <-
  fundq %>%
  filter(indfmt == "INDL", datafmt == "STD",
         consol == "C", popsrc == "D")

comp2 <- 
  comp2 %>%
  group_by(gvkey) %>%
  dbplyr::window_order(datadate) %>%
  mutate(cxsgaq = cumsum(xsgaq))

comp2 %>%
  filter(!is.na(xsgaq)) %>%
  select(gvkey, datadate, xsgaq, cxsgaq)
#> # Source:     lazy query [?? x 4]
#> # Database:   postgres [[email protected]:9737/wrds]
#> # Groups:     gvkey
#> # Ordered by: datadate
#>    gvkey  datadate   xsgaq cxsgaq
#>    <chr>  <date>     <dbl>  <dbl>
#>  1 001000 1966-12-31 0.679  0.679
#>  2 001000 1967-12-31 1.02   1.70 
#>  3 001000 1968-12-31 5.86   7.55 
#>  4 001000 1969-12-31 7.18  14.7  
#>  5 001000 1970-12-31 8.25  23.0  
#>  6 001000 1971-12-31 7.96  30.9  
#>  7 001000 1972-12-31 7.55  38.5  
#>  8 001000 1973-12-31 8.53  47.0  
#>  9 001000 1974-12-31 8.86  55.9  
#> 10 001000 1975-12-31 9.59  65.5  
#> # … with more rows

Created on 2021-04-05 by the reprex package (v1.0.0)