Can this Excel pivot chart be duplicated in ggplot?

217 Views Asked by At

Is it possible to duplicate this Excel pivot chart with ggplot2 or have something relatively similar? I really like that the chart displays the prices and the year-to-year percent changes on the legend below. enter image description here

Here's the dput() output of my sample dataset:

price <- structure(list(selling_date = structure(c(17869, 17870, 17870, 
                                          17872, 17872, 17872, 17875, 17878, 17879, 17879, 17882, 17882, 
                                          17885, 17889, 17892, 17893, 17893, 17893, 17896, 18232, 18232, 
                                          18233, 18234, 18235, 18235, 18235, 18236, 18236, 18239, 18239, 
                                          18240, 18242, 18242, 18243, 18243, 18246, 18250, 18250, 18254, 
                                          18257, 18260, 18260, 18261, 18261, 18261, 18598, 18599, 18599, 
                                          18600, 18607, 18612, 18613, 18614, 18617, 18617, 18619, 18619, 
                                          18619, 18624, 18626, 18627, 18964, 18967, 18968, 18970, 18970, 
                                          18976, 18977, 18977, 18981, 18983, 18990), class = "Date"), selling_price = c(800000, 
                                                                                                                        625000, 105000, 580000, 419000, 360000, 795000, 320000, 790000, 
                                                                                                                        625000, 779450, 415000, 615000, 555000, 500000, 710000, 655000, 
                                                                                                                        600000, 639950, 680000, 280000, 365000, 730000, 581000, 360000, 
                                                                                                                        530000, 950000, 825000, 550000, 300000, 385000, 590000, 590000, 
                                                                                                                        660000, 410000, 510000, 799999, 575000, 690000, 700000, 565000, 
                                                                                                                        735000, 866250, 339000, 360000, 674900, 560000, 435000, 1485000, 
                                                                                                                        840000, 525000, 900000, 886000, 647000, 775000, 648000, 650000, 
                                                                                                                        349000, 675000, 1225000, 1260000, 585000, 1100000, 1400000, 960000, 
                                                                                                                        695000, 1000000, 1050000, 630000, 751000, 1225000, 1930721), 
               id_num = c(1364645, 1330850, 1385154, 1337665, 1349409, 1360807, 
                          1356805, 1389378, 1333399, 1286526, 1380444, 1371559, 1325930, 
                          1376221, 1386441, 1334435, 1333581, 1372726, 1364979, 1508940, 
                          1534939, 1526490, 1522480, 1504380, 1536453, 1538182, 1490032, 
                          1533847, 1504948, 1529791, 1521259, 1537746, 1538750, 1538455, 
                          1528390, 1526742, 1538271, 1512930, 1518796, 1517900, 1538821, 
                          1511886, 1544288, 1511768, 1535256, 1680111, 1683767, 1675425, 
                          1639303, 1680942, 1669765, 1683493, 1685799, 1637667, 1682404, 
                          1690692, 1677272, 1651257, 1687505, 1674430, 1601173, 1798143, 
                          1857616, 1850499, 1857149, 1840874, 1831562, 1843883, 1859947, 
                          1858923, 1856840, 1854640), year = c(2018, 2018, 2018, 2018, 
                                                               2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 
                                                               2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 
                                                               2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 
                                                               2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 
                                                               2019, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 
                                                               2020, 2020, 2020, 2020, 2020, 2020, 2020, 2021, 2021, 2021, 
                                                               2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021)), row.names = c(NA, 
                                                                                                                               -72L), class = c("tbl_df", "tbl", "data.frame"))

I'm only able to recreate a basic line plot with the same data in ggplot2, which is in the chart below.

enter image description here

The RStudio library I have loaded for this is library(tidyverse). Getting average and median price is easy. Here's the code I'm using for that:

price_groups <- price %>% 
  group_by(year) %>% 
  summarize(avg_sp = mean(selling_price),
            med_sp = median(selling_price)

Getting the percent change year-to-year is in the code below. Unless there is another way, I've found I must create an object of my grouped summaries first before I can use the lag() or lead() functions on my groups.

price_groups <- price_groups %>% 
  mutate(lag_avg_sp = lag(avg_sp),
     avg_pct_change = (avg_sp - lag_avg_sp) / lag_avg_sp,
     lag_med_sp = lag(med_sp),
     med_pct_change = (med_sp - lag_med_sp) / lag_med_sp) %>% 
  select(-c(lag_avg_sp, lag_med_sp))

Here's the code I've used so far in ggplot2:

options(scipen = 999)

ggplot(price_groups, aes(x = year)) +
  geom_line(aes(y = avg_sp, color = "Avg SP"), size = 2) +
  geom_line(aes(y = med_sp, color = "Median SP"), size = 2) +
  scale_y_continuous(breaks = seq(0, 1100000, by = 250000),
                     limits = c(0, 1100000),
                     labels = scales::dollar_format()) +
  scale_color_manual(values = c("Avg SP" = "steelblue", "Median SP" = "dark gray")) +
  labs(x = "Year",
       y = "Selling Price",
       color = "Price Type",
       title = "Annual Change in Price\n") +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5))

I don't know enough about ggplot2 or the tidyverse functionality to duplicate this Excel chart or create something relatively similar. I'd like to display average and median price along with average and median percent changes in the legend on the bottom on the same plot. Is this possible?

I'd prefer not to display the numbers on the lines themselves (above or below) as I think it looks cluttered.

Also, the date in the Excel chart is in wide format rather than long. After creating the price_groups object, I tried messing with pivot_wider() to recreate the format, but I don't have enough experience with this feature to figure it out. Is there a way to make the variables avg_sp, avg_pct_change, med_sp, and med_pct_change row names with the years as column names and have the values be the prices and percentages?

If recreating the Excel pivot chart isn't possible, I'd be fine with an option using pivot_wider() to create a table and make a separate image I could append to the graph.

1

There are 1 best solutions below

2
On

Here's a shot. It's not perfect, and I don't know how to align the years in the plot with the columns in the table ...

Several packages needed for this:

library(ggplot2)
library(dplyr)
library(tidyr) # pivot_*
# library(gridExtra) # tableGrob
library(patchwork)

Data reshaping:

price_summ <- price %>%
  mutate(year = format(selling_date, format = "%Y")) %>%
  group_by(year) %>%
  summarize(
    "Avg SP" = mean(selling_price),
    "Median SP" = median(selling_price),
    .groups = "drop"
  ) %>%
  arrange(year) %>%
  mutate(
    across(
      `Avg SP`:`Median SP`,
      ~ c(NA, zoo::rollapplyr(., 2, FUN = function(z) diff(z) / z[1])),
      .names = "{sub('SP', '% Change', .col)}"
    )
  ) %>%
  pivot_longer(-year, names_to = "Price Type", values_to = "Selling Price")
price_summ
# # A tibble: 16 x 3
#    year  `Price Type`    `Selling Price`
#    <chr> <chr>                     <dbl>
#  1 2018  Avg SP             573074.     
#  2 2018  Median SP          615000      
#  3 2018  Avg % Change           NA      
#  4 2018  Median % Change        NA      
#  5 2019  Avg SP             574086.     
#  6 2019  Median SP          578000      
#  7 2019  Avg % Change            0.00177
#  8 2019  Median % Change        -0.0602 
#  9 2020  Avg SP             783431.     
# 10 2020  Median SP          674950      
# 11 2020  Avg % Change            0.365  
# 12 2020  Median % Change         0.168  
# 13 2021  Avg SP            1029702.     
# 14 2021  Median SP         1000000      
# 15 2021  Avg % Change            0.314  
# 16 2021  Median % Change         0.482  

And two plot components (with a little more reshaping in gg2):

gg1 <- dplyr::filter(price_summ, !grepl("%", `Price Type`)) %>%
  ggplot(aes(as.integer(year), `Selling Price`)) +
  geom_line(aes(group = `Price Type`, color = `Price Type`)) +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(x = NULL) +
  theme(legend.position = "bottom")

gg2 <- price_summ %>%
  mutate(
    value2 = case_when(
      is.na(`Selling Price`) ~ "",
      grepl("%", `Price Type`) ~ scales::percent(`Selling Price`, accuracy = 0.01),
      TRUE ~ scales::dollar(`Selling Price`)
    )
  ) %>%
  pivot_wider(`Price Type`, names_from = "year", values_from = "value2") %>%
  arrange(grepl("%", `Price Type`), `Price Type`) %>%
  rename(` ` = `Price Type`) %>%
  gridExtra::tableGrob(., rows = NULL)

gg1 / gg2

ggplot2 line plot and table under it