I need to summarize a huge CSV file (nrow =1102300). This is daily climate data from various climatic models.
First, I want to sum up all the columns with "historical" names. My target is the maximum values filtered by Date (in Year Year (i.e. 1950, 1951, etc ) for all the unique "lat" and "lon."
All help will be appreciated.
The dataframe is like this:
df = read.csv(text = '"lat","lon","Date","pr_CMCC.ESM2_historical","pr_GFDL.ESM4_historical_ssp126","pr_BCC.CSM2.MR_historical_ssp126","pr_INM.CM4.8_historical_ssp126","pr_FGOALS.g3_historical_ssp126","pr_TaiESM1_historical_ssp126","pr_NorESM2.MM_historical_ssp126","pr_CanESM5_historical_ssp126","pr_KIOST.ESM_historical_ssp126","pr_NorESM2.LM_historical_ssp126","pr_INM.CM5.0_historical_ssp126"
46.29166646,-62.62500314,1/1/1950 12:00,1.7243347,6.10E-05,6.10E-05,2.5483093,1.7853699,6.10E-05,1.846405,6.10E-05,1.4954529,1.4496765,3.769043
46.29166646,-62.62500314,1/2/1950 12:00,6.10E-05,6.10E-05,6.10E-05,9.24704,6.10E-05,12.741333,6.10E-05,6.424103,0.56463623,6.10E-05,1.1139832
46.29166646,-62.62500314,1/3/1950 12:00,6.10E-05,6.10E-05,6.10E-05,6.10E-05,6.10E-05,1.052948,6.10E-05,1.1445007,6.10E-05,6.10E-05,6.10E-05
46.29166646,-62.62500314,1/4/1950 12:00,7.965271,6.10E-05,6.10E-05,6.5919495,1.9684753,6.10E-05,6.10E-05,1.4191589,6.10E-05,0.70196533,3.9368896',header = TRUE)
I want my final output dataframe arrainged like this:
lat | lon | Value
Where Value = Maximum value for each year.
Perhaps this:
This assumes that your
lonandlatare relatively invariant, though even with that sometimes grouping by floating point numbers may be problematic (see Why are these numbers not equal? and https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f). I suggest that if the rest of your data has any variance inlonandlatfor each year, then there are likely different things you should be doing to aggregate your data.The use of
.by=requiresdplyr_1.1.0or newer; if you have an older version, change fromsummarize(.by=c(..), stuff)togroup_by(..) |> summarize(stuff) |> ungroup().