ggplot Geom Histogram is not aligning with left axis

49 Views Asked by At

I'm trying to make the first bar of my histogram touch the left axis. I'm not sure whether all the correct values are being inputted.

I currently have this code

xlimit = 1000000
upper = max(unique_cd$Total_Medicare_Payment)
mprofiles_over_x <- unique_cd |>
  filter(Total_Medicare_Payment >= xlimit, Gender == "M")
fprofiles_over_x <- unique_cd |>
  filter(Total_Medicare_Payment >= xlimit, Gender == "F")

ggplot() +
  geom_histogram(data = fprofiles_over_x, aes(x = Total_Medicare_Payment, y = after_stat(count / nrow(funique_cd)), fill = "Female"),
                 color = "black", bins = 75, alpha = 0.3, size = 0.1, na.rm = TRUE) +
  geom_histogram(data = mprofiles_over_x, aes(x = Total_Medicare_Payment, y = after_stat(count / nrow(munique_cd)), fill = "Male"),
                 color = "black", bins = 75, alpha = 0.3, size = 0.1, na.rm = TRUE) +
  scale_x_continuous(limits = c(xlimit, 12000000), breaks = seq(xlimit, 12000000, by = 3000000), labels = scales::comma) +
  scale_y_continuous(limits = c(0, .001), breaks = seq(0, .001, by = .00025),labels = scales::percent) +
  labs(x = "Total Medicare Payment (2017-2021)", y = "% Radiologists") +
  scale_fill_manual(values = c("Female" = "orange", "Male" = "lightblue"), name = "Gender") +
  theme_minimal() +
  theme(
    legend.position = c(.95, .95),
    legend.justification = c("right", "top"),
    legend.box.just = c("right", "center"),
    legend.box.background = element_rect(linewidth=.1),
    legend.box.margin = margin(-4, 1, 1, 1),
    legend.title=element_blank(),
    legend.text = element_text(size = 7), legend.key.width = unit(.4, "cm"), legend.key.height = unit(0.4, "cm")
    )

This gives me the following histogram. The values look correct, but visually, I don't get why the first left bar of the histogram starts off on the wrong spot.

I tried doing the boundary=0, but it didn't work

enter image description here

1

There are 1 best solutions below

0
Andy Baxter On

Though we can't see your data, there's a strong possibility that your taller histogram bars are being cut off by the limits argument in scale_y_continuous. To demonstrate how this is happening let's create some random data to plot for values above 1,000,000:

library(tidyverse)

xlimit <-  1000000


df <- tibble(a = rnorm(10000, 900000, sd = 2500000),
       Gender = sample(c("Male", "Female"), 10000, replace = TRUE)) 

df |>
  filter(a >= 1000000) |> 
  ggplot(aes(a, fill = Gender)) +
  geom_histogram(
    aes(y = after_stat(count / sum(count))),
    breaks = seq(xlimit, 12000000, 200000),
    color = "black",
    alpha = 0.3,
    linewidth = 0.1,
    position = "identity"
  ) +
  scale_x_continuous(
    limits = c(xlimit, NA),
    breaks = seq(xlimit, 12000000, by = 3000000),
    labels = scales::comma
  ) 

So far so good. There are counts across all bins from the 1,000,000 filter set threshold upwards. Next step is to add formatted tick marks on y-axis:

df |>
  filter(a >= 1000000) |> 
  ggplot(aes(a, fill = Gender)) +
  geom_histogram(
    aes(y = after_stat(count / sum(count))),
    breaks = seq(xlimit, 12000000, 200000),
    color = "black",
    alpha = 0.3,
    linewidth = 0.1,
    position = "identity"
  ) +
  scale_x_continuous(
    limits = c(xlimit, NA),
    breaks = seq(xlimit, 12000000, by = 3000000),
    labels = scales::comma
  ) +
  scale_y_continuous(
    limits = c(0, .01),
    breaks = seq(0, .01, by = .0025),
    labels = scales::percent
  ) 
#> Warning: Removed 39 rows containing missing values or values outside the scale range
#> (`geom_bar()`).

Then just like your graph, lots of the bars just above 1,000,000 have disappeared! The warning message gives us a clue though - when we limit y axis to 0.01, it can't display the bars above 1% (39 bars are missing). Solution is to remove the limits altogether. We still want to create tick marks with breaks on the y-scale, but we can set the upper number of the sequencing to any high number (here 1 or 100%) but it will limit the axis naturally without cutting off any bars:

df |>
  filter(a > 1000000) |> 
  ggplot(aes(a, fill = Gender)) +
  geom_histogram(
    aes(y = after_stat(count / sum(count))),
    breaks = seq(xlimit, 12000000, 200000),
    color = "black",
    alpha = 0.3,
    linewidth = 0.1,
    position = "identity"
  ) +
  scale_x_continuous(
    limits = c(xlimit, NA),
    breaks = seq(xlimit, 12000000, by = 3000000),
    labels = scales::comma
  ) +
  scale_y_continuous(
    breaks = seq(0, 1, by = .0025),
    labels = scales::percent
  )