Showing flows for ggalluvium

402 Views Asked by At

Seeking some advice around the use of ggalluvium to demonstrate the distribution of preferences in Australia.

Context, in Australia we have preferential voting. Say I live in an area with 4 candidates contesting. The ballot is completed by numbering a box 1-4 according to your party/candidate preference. The candidate with the lowest proportion of the vote after the first count will be eliminated and their votes will be apportioned to where their voters have indicated on their ballot paper. This process is reiterated until two candidates remain and a candidate is elected when they have greater than 50% of the two party preferred vote.

I'm seeking to visualise the above reiterating distribution process using flow diagram, and ggalluvium.

However I can't quite seem to plot the aesthetics to show the flows feeding votes to candidates in the next count of the votes.

Here's what I get so far:

library(tidyverse)
library(magrittr)
library(ggalluvial)


Load Data
house_of_reps <- read_csv("https://results.aec.gov.au/24310/Website/Downloads/HouseDopByDivisionDownload-24310.csv", skip = 1)
house_of_reps$BallotPosition  %<>% as.factor()
house_of_reps$CountNumber %<>% as.factor()

cooper <- house_of_reps %>% 
      filter(DivisionNm == "Cooper") %>% 
      spread(CalculationType, CalculationValue) %>% 
      select(4,9,10,14)

cooper %>% ggplot(aes(x = CountNumber, alluvium = PartyNm, stratum = `Preference Percent`, y = `Preference Percent`, fill = PartyAb)) +
       geom_alluvium(aes(fill = PartyAb), decreasing = TRUE) +
       geom_stratum(decreasing = TRUE) +
       geom_text(stat = "stratum",decreasing = TRUE, aes(label = after_stat(fill))) +
       stat_stratum(decreasing = TRUE) +
       stat_stratum(geom = "text", aes(label = PartyAb), decreasing = TRUE) +
       scale_fill_viridis_d() +
       theme_minimal()

Output image

Would appreciate any guidance on how to show where the votes after each subsequent count are flowing to which political party in the next stratum.

1

There are 1 best solutions below

1
On BEST ANSWER

Unfortunately your dataset is not well suited for the kind of plot you have in mind. While the plotting itself is easy, to achieve the desired plot involves "some" data wrangling and preparation steps.

The general issue is that your dataset as is does not show the flow of votes from one party to a second. It only shows the overall number of votes a party lost or receivd in each count.

However, as in each step only one party drops out this missing information could be extracted from your data. The basic idea is to split the obs for each party or more precisely each party which drops out in one of the later counts by voter's secondary party preference.

Not sure wether each step is clear but I added some explanations as comments and added a plot of the final structure of the dataset which hopefully makes it clearer what's the final result of all the steps:

library(tidyverse)
library(magrittr)
library(ggalluvial)

# Load Data
house_of_reps <- read_csv("https://results.aec.gov.au/24310/Website/Downloads/HouseDopByDivisionDownload-24310.csv", skip = 1)
house_of_reps$BallotPosition  %<>% as.factor()
house_of_reps$CountNumber %<>% as.factor()

cooper <- house_of_reps %>% 
  filter(DivisionNm == "Cooper") %>% 
  spread(CalculationType, CalculationValue) %>% 
  select(count = CountNumber, party = PartyAb, pref = `Preference Count`, trans = `Transfer Count`)

# Helper function to
make_rows <- function(x) {
  # Name of party which gets dropped in this period
  dropped <- filter(x, trans < 0) %>% pull(party)
  if (length(dropped) > 0) {
    x <- filter(x, trans >= 0)
    # Replacements are added two times. Once for the period where the party drops out, 
    # and also for the previous period
    xdrop <- mutate(x, party = dropped, pref = trans, trans = 0, is_drop = FALSE)
    xdrop1 <- mutate(xdrop, count = count - 1, to = party, is_drop = FALSE)
    # For the parties to keep or which receive transfered votes have to adjust the number of votes
    xkeep <- mutate(x, pref = pref - trans, trans = 0) 
    bind_rows(xdrop1, xdrop, xkeep)  
  } else {
    x
  }
}

cooper1 <- cooper %>% 
  # First: Convert count to a numeric. Add a "to" variable for second 
  # party preference or the party where votes are transferred to. This variable 
  # will later on be mapped on the "fill" aes 
  mutate(to = party, count = as.numeric(as.character(count))) %>% 
  group_by(party) %>%
  # Add identifier of obs. to drop. Obs. to drop are obs. of parties which 
  # drop out in the following count
  mutate(is_drop = lead(trans, default = 0) < 0) %>% 
  ungroup() %>% 
  # Split obs. to be dropped by secondary party preference, i.e. in count 0 the 
  # obs for party "IND" is replaced by seven obs. reflecting the secondary preference 
  # for one of the other seven parties
  split(.$count) %>% 
  map(make_rows) %>% 
  bind_rows() %>% 
  # Now drop original obs.
  filter(!is_drop, pref > 0) %>%
  # Add a unique identifier
  group_by(count, party) %>% 
  mutate(id = paste0(party, row_number())) %>% 
  ungroup() %>% 
  # To make the flow chart work we have make the dataset complete, i.e. add 
  # "empty" obs for each type of voter and each count
  complete(count, id, fill = list(pref = 0, trans = 0, is_drop = FALSE)) %>% 
  # Fill up party and "to" columns  
  mutate(across(c(party, to), ~ if_else(is.na(.), str_extract(id, "[^\\d]+"), .))) %>%
  # Filling up the "to" column with last observed value for "to" if any
  group_by(id) %>% 
  mutate(last_id = last(which(party != to)),
         to = if_else(count >= last_id & !is.na(last_id), to[last_id], to)) %>% 
  ungroup()

The final structure of the dataset could be illustrated by means of a tile plot:

cooper1 %>% 
  add_count(count, party) %>% 
  ggplot(aes(count, reorder(id, n), fill = to)) +
  geom_tile(color = "white")

As I said, after all the cumbersome data wrangling making the flow chart itself is the easiest task and could be achieved like so:

cooper1 %>% 
  ggplot(aes(x = count, alluvium = id, stratum = to, y = pref, fill = to)) +
  geom_flow(decreasing = TRUE) +
  geom_stratum(decreasing = TRUE) +
  scale_fill_viridis_d() +
  theme_minimal()