Why this CSV data complicates with ggplot2 whisker plot?

Question

Why this CSV data complicates with ggplot2 whisker plot?

247 Views Asked by Léo Léopold Hertz 준영 At 17 August 2025 at 12:15

I can reproduce a working ggplot2 boxplot with the test data but not with CSV data in R. Data visually with single point about the events (sleep and awake)

"Vars"    , "Sleep", "Awake"
"Average" , 7      , 12
"Min"     , 4      , 5
"Max"     , 10     , 15

Data in real life about sleep

"Vars"    , "Sleep1", "Sleep2", ...
"Average" , 7       , 5
"Min"     , 4       , 3
"Max"     , 10      , 8

Data in real life about Awake

"Vars"    , "Awake1", "Awake2", ...
"Average" , 12      , 14
"Min"     , 10      , 7
"Max"     , 15      , 17

Code where data integrated

# only single point!
dat.m <- structure(list(Vars = structure(c(1L, 3L, 2L), .Label = c("Average ", 
"Max     ", "Min     "), class = "factor"), Sleep = c(7, 4, 10
), Awake = c(12L, 5L, 15L)), .Names = c("Vars", "Sleep", "Awake"
), class = "data.frame", row.names = c(NA, -3L))

library('ggplot2')    
# works:
str(mpg)
#mpg$class
#mpg$hwy
ggplot(mpg, aes(x = class, y = hwy)) +
    geom_boxplot()

# http://stackoverflow.com/a/44031194/54964
m <- t(dat.m)    
dat.m <- data.frame(m[2:nrow(m),])
names(dat.m) <- m[1,]
dat.m$Vars <- rownames(m)[2:nrow(m)]
dat.m <- melt(dat.m, id.vars = "Vars")

# TODO complicates here although should not
ggplot(dat.m, aes(x = Vars, y = value, fill=variable)) + #
    geom_boxplot()

Test data output in Fig. 1 and Output in Fig. 2.

Fig. 1 Test data output, Fig. 2 Output of the code

Assumption made below for the quartiles:

Code

 # http://stackoverflow.com/a/44043313/54964
 quartiles <- data.frame(Vars = c("Q1","Q3"), Sleep = c(6,8), 
               Awake = c(9,13))

I want to set Q1 <- 0.25 * average and Q3 <- 0.75 * average. Assume you have any amount of the main fields (here Sleep and Awake). How can you request the data (here dat.m) to get min and max of each main field?

R: 3.3.3
OS: Debian 8.7

Original Q&A

There are 1 best solutions below

**Edgar Santos** · Accepted Answer

There is base R function to make boxplots using the quartiles: bxp(), but you need 25th, 50th and 75th percentiles known as well as the lower quartile (Q1), the median (Q2) and upper quartile (Q3).

For example:

bxp(list(stats = matrix(c( 4,6,7,9,10, 10,11,12,14,15), nrow = 5,
 ncol = 2), n = c(30,30), names = c("Sleep", "Awake")))

Now using your data: (Edited)

Let us use the first dataset that you introduced:

dat.m <- structure(list(Vars = structure(c(1L, 3L, 2L), .Label = c("Average ", 
"Max     ", "Min     "), class = "factor"), Sleep = c(7, 4, 10
), Awake = c(12L, 5L, 15L)), .Names = c("Vars", "Sleep", "Awake"
), class = "data.frame", row.names = c(NA, -3L))

> dat.m
      Vars Sleep Awake
1 Average      7    12
2 Min          4     5
3 Max         10    15


> str(dat.m)
'data.frame':   3 obs. of  3 variables:
 $ Vars : Factor w/ 3 levels "Average ","Max     ",..: 1 3 2
 $ Sleep: num  7 4 10
 $ Awake: int  12 5 15

In you data, the first and third quartiles are missing. The second is also needed, which is the median, but let us assume that it is equal to the mean. I will assume that you have all of them e.g.:

quartiles <- data.frame(Vars = c("Q1","Q3"), Sleep = c(6,8), 
                    Awake = c(9,13))

> str(quartiles)
'data.frame':   2 obs. of  3 variables:
 $ Vars : Factor w/ 2 levels "Q1","Q3": 1 2
 $ Sleep: num  6 8
 $ Awake: num  9 13


data <- rbind(dat.m ,quartiles)

      Vars Sleep Awake
1 Average      7    12
2 Min          4     5
3 Max         10    15
4 Q1           6     9
5 Q3           8    13

Then sorting your variables:

library(dplyr)
## Disable this line if you want to use the universal approach
data <-  dplyr::arrange(data, Sleep, Awake)
## Enable the following for more universal approach
# data <- arrange_(data, .dots = as.list(strsplit(colnames(data)[2:ncol(data)], ', '))) 

bxp(list(stats = as.matrix(data[,2:3]), n = c(30,30), names = names(data[,2:3]))) # assuming n = 30.

With ggplot2

We first convert the dataset from 'wide' to 'long' format with reshape2::melt().

library(reshape2)
library(ggplot2)
(data2 <- melt(data))

       Vars variable value
1  Min         Sleep     4
2  Q1          Sleep     6
3  Average     Sleep     7
4  Q3          Sleep     8
5  Max         Sleep    10
6  Min         Awake     5
7  Q1          Awake     9
8  Average     Awake    12
9  Q3          Awake    13
10 Max         Awake    15

Then:

ggplot(data2, aes(x = variable, y = value)) +
  geom_boxplot()

You might find interesting these articles:

Points of Significance: Visualizing samples with box plots (http://www.nature.com/nmeth/journal/v11/n2/full/nmeth.2813.html)
The Box Plot: A Simple Visual Method to Interpret Data (http://annals.org/aim/article/703149/box-plot-simple-visual-method-interpret-data)
Variations of box plots (http://amstat.tandfonline.com/doi/abs/10.1080/00031305.1978.10479236)

Why this CSV data complicates with ggplot2 whisker plot?

Assumption made below for the quartiles:

There are 1 best solutions below

Related Questions in R

Related Questions in CSV

Related Questions in GGPLOT2

Related Questions in BOXPLOT

Related Questions in QUARTILE

Trending Questions

Popular # Hahtags

Popular Questions