I have a table of Calpers Private Equity Fund performance from several years. I cleaned and joined all the data into a large table with 186 entries for individual fund investments. Some of these funds have data for 5 yrs, most for 4 or less. I would like to calculate the CAGR for each fund using the earliest value and the latest value in the formula:
CAGR= Latest/First^(1/n)-1 ...
The columns with the data are named:
2017
,2018
,2019
,2020
,2021
, so the formula in R will look something like this: (calper is the table with all the data ... one fund per row)
idx<- which(startsWith(names(calperMV),"2")) # locate columns with data needed for CAGR calc
idx <- rev(idx) # match to NCOL_NA order ...
the values here are (6,5,4,3,2) ... which are the column numbers for 2021-2020-2019-2018-2017.
the indx was formed by counting the number of NA in each row ... all the NA are left to right, so the totals here should be a reference to the idx and thus the correct columns.
I use the !!sym(as.String()) with name()[idx[indx]] to pull out the column names symbolically
calperMV %>% rowwise() %>%
mutate(CAGR=`2021`/!!sym((colnames(.)[idx[indx]])^(1/(5-indx))-1))))
Problem is that the referencing either does not work correctly or gets this error:
"Error in local_error_context(dots = dots, .index = i, mask = mask) : promise already under evaluation: recursive default argument reference or earlier problems?"
I've tried creating test code which shows the addressing is working:
calper %>% rowwise() %>% mutate(test = (names(.)[idx[indx]]),
test1= !!sym(as.String(names(.)[idx[1]])),
test2= !!sym(as.String(names(.)[idx[2]])),
test3= !!sym(as.String(names(.)[idx[3]])),
test4= !!sym(as.String(names(.)[idx[4]])),
test5= !!sym(as.String(names(.)[idx[5]])))
But when I do the full CAGR calc I get that recursive error. Here'a tibble of the test data for reference:
Input data:
Security Name 2017 2018 2019 2020 2021 NA_cols indx
ASIA ALT NA NA NA 6,256,876.00 7,687,037.00 3 2
ASIA ALT NA NA NA 32,549,704.00 34,813,844.00 3 2
AVATAR NA NA NA NA 700,088.00 - 3 2
AVENUE FUND VI (A) NA NA NA 10,561,674.00 19,145,496.00 3 2
BDC III C NA 48,098,429.00 85,808,280.00 100,933,699.00 146,420,669.00 1 4
BIRCH HILL NA NA NA 6,488,941.00 9,348,941.00 3 2
BLACKSTONE NA NA NA 4,011,072.00 2,406,075.00 3 2
BLACKSTONE IV NA NA NA 4,923,625.00 3,101,081.00 3 2
BLACKSTONE V NA NA NA 18,456,472.00 17,796,711.00 3 2
BLACKSTONE VI NA NA NA 245,269,656.00 310,576,064.00 3 2
BLACKSTONE VII NA NA NA 465,415,036.00 607,172,062.00 3 2
Results: The indexing selects the proper String and also selects the proper # from the column ... but won't do when I operate with the selected variable:
selYR test1 test2 test3 test4 test5
2020 7,687,037.00 6,256,876.00 NA NA NA
2020 34,813,844.00 32,549,704.00 NA NA NA
2020 - 700,088.00 NA NA NA
2020 19,145,496.00 10,561,674.00 NA NA NA
2018 146,420,669.00 100,933,699.00 85,808,280.00 48,098,429.00 NA
2020 9,348,941.00 6,488,941.00 NA NA NA
2020 2,406,075.00 4,011,072.00 NA NA NA
2020 3,101,081.00 4,923,625.00 NA NA NA
2020 17,796,711.00 18,456,472.00 NA NA NA
2020 310,576,064.00 245,269,656.00 NA NA NA
2020 607,172,062.00 465,415,036.00 NA NA NA
(Sorry ... I don't know how to put these into proper columns :( )
I never learned all those fancy tidystuff techniques. Here's a base R approach:
Firstand second: Use read.delim to bring in tab data and your data has (yeccch) commas in the numbers. (ignore the warnings, they are correct and you do want the NA's.)
Note that lapply in this case returns a list of numeric vectors which can be assigned back inot the origianl dataframe to overwrite the original character values. Or you could have created new columns which could then have gotten the same treatment as below. Now that the data is in, you can count the number of valid numbers and then calculate the CAGR for each row using
apply
on the numeric columns in a rowwise fashion:Problems remaining ... funds that did not have a value in the most recent year; funds that might have had discontinuous reporting. You need to say how these would be handled and provide example data if you want tested solutions.