Is it possible to duplicate this Excel pivot chart with ggplot2
or have something relatively similar? I really like that the chart displays the prices and the year-to-year percent changes on the legend below.
Here's the dput()
output of my sample dataset:
price <- structure(list(selling_date = structure(c(17869, 17870, 17870,
17872, 17872, 17872, 17875, 17878, 17879, 17879, 17882, 17882,
17885, 17889, 17892, 17893, 17893, 17893, 17896, 18232, 18232,
18233, 18234, 18235, 18235, 18235, 18236, 18236, 18239, 18239,
18240, 18242, 18242, 18243, 18243, 18246, 18250, 18250, 18254,
18257, 18260, 18260, 18261, 18261, 18261, 18598, 18599, 18599,
18600, 18607, 18612, 18613, 18614, 18617, 18617, 18619, 18619,
18619, 18624, 18626, 18627, 18964, 18967, 18968, 18970, 18970,
18976, 18977, 18977, 18981, 18983, 18990), class = "Date"), selling_price = c(800000,
625000, 105000, 580000, 419000, 360000, 795000, 320000, 790000,
625000, 779450, 415000, 615000, 555000, 500000, 710000, 655000,
600000, 639950, 680000, 280000, 365000, 730000, 581000, 360000,
530000, 950000, 825000, 550000, 300000, 385000, 590000, 590000,
660000, 410000, 510000, 799999, 575000, 690000, 700000, 565000,
735000, 866250, 339000, 360000, 674900, 560000, 435000, 1485000,
840000, 525000, 900000, 886000, 647000, 775000, 648000, 650000,
349000, 675000, 1225000, 1260000, 585000, 1100000, 1400000, 960000,
695000, 1000000, 1050000, 630000, 751000, 1225000, 1930721),
id_num = c(1364645, 1330850, 1385154, 1337665, 1349409, 1360807,
1356805, 1389378, 1333399, 1286526, 1380444, 1371559, 1325930,
1376221, 1386441, 1334435, 1333581, 1372726, 1364979, 1508940,
1534939, 1526490, 1522480, 1504380, 1536453, 1538182, 1490032,
1533847, 1504948, 1529791, 1521259, 1537746, 1538750, 1538455,
1528390, 1526742, 1538271, 1512930, 1518796, 1517900, 1538821,
1511886, 1544288, 1511768, 1535256, 1680111, 1683767, 1675425,
1639303, 1680942, 1669765, 1683493, 1685799, 1637667, 1682404,
1690692, 1677272, 1651257, 1687505, 1674430, 1601173, 1798143,
1857616, 1850499, 1857149, 1840874, 1831562, 1843883, 1859947,
1858923, 1856840, 1854640), year = c(2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2021, 2021, 2021,
2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021)), row.names = c(NA,
-72L), class = c("tbl_df", "tbl", "data.frame"))
I'm only able to recreate a basic line plot with the same data in ggplot2
, which is in the chart below.
The RStudio library I have loaded for this is library(tidyverse)
. Getting average and median price is easy. Here's the code I'm using for that:
price_groups <- price %>%
group_by(year) %>%
summarize(avg_sp = mean(selling_price),
med_sp = median(selling_price)
Getting the percent change year-to-year is in the code below. Unless there is another way, I've found I must create an object of my grouped summaries first before I can use the lag()
or lead()
functions on my groups.
price_groups <- price_groups %>%
mutate(lag_avg_sp = lag(avg_sp),
avg_pct_change = (avg_sp - lag_avg_sp) / lag_avg_sp,
lag_med_sp = lag(med_sp),
med_pct_change = (med_sp - lag_med_sp) / lag_med_sp) %>%
select(-c(lag_avg_sp, lag_med_sp))
Here's the code I've used so far in ggplot2
:
options(scipen = 999)
ggplot(price_groups, aes(x = year)) +
geom_line(aes(y = avg_sp, color = "Avg SP"), size = 2) +
geom_line(aes(y = med_sp, color = "Median SP"), size = 2) +
scale_y_continuous(breaks = seq(0, 1100000, by = 250000),
limits = c(0, 1100000),
labels = scales::dollar_format()) +
scale_color_manual(values = c("Avg SP" = "steelblue", "Median SP" = "dark gray")) +
labs(x = "Year",
y = "Selling Price",
color = "Price Type",
title = "Annual Change in Price\n") +
theme(legend.position = "bottom",
plot.title = element_text(hjust = 0.5))
I don't know enough about ggplot2
or the tidyverse
functionality to duplicate this Excel chart or create something relatively similar. I'd like to display average and median price along with average and median percent changes in the legend on the bottom on the same plot. Is this possible?
I'd prefer not to display the numbers on the lines themselves (above or below) as I think it looks cluttered.
Also, the date in the Excel chart is in wide format rather than long. After creating the price_groups
object, I tried messing with pivot_wider()
to recreate the format, but I don't have enough experience with this feature to figure it out. Is there a way to make the variables avg_sp
, avg_pct_change
, med_sp
, and med_pct_change
row names with the years as column names and have the values be the prices and percentages?
If recreating the Excel pivot chart isn't possible, I'd be fine with an option using pivot_wider()
to create a table and make a separate image I could append to the graph.
Here's a shot. It's not perfect, and I don't know how to align the years in the plot with the columns in the table ...
Several packages needed for this:
Data reshaping:
And two plot components (with a little more reshaping in
gg2
):