GNU Octave - legend() with boxplot()

80 Views Asked by At

I am having problems plotting two or more boxes with boxplot() and using legend().

In short: boxplot() seems to have seven entries (the box itself, the quartils, the median and so on...). So, when handing over two strings as entries to legend(), the first one is representing the first box. But the second entry is not representing the second box but one of those parameters of the first box (i guess). When creating a legend() with eight entries, the last one is the one representing the second box.

(the behaviour of boxplot() seems to be different on my systems: At WinOS the first entry is a real rectangle, while the next six are lines. In Linux, all of the entries are lines, so this problem is somehow hidden).

Here's an example, where the third entry should be a dashed line to show the issue.

The question is how to access the second box to pass it to legend() and leave out the six other parameters of the first box.

data1 = randn(100, 1);
data2 = randn(100, 1) + 3;

figure;

b=boxplot({data1, data2})
hold on;
p1=plot([0 2.5],[2 2],'--')
legend({'data1','data2','line'})
1

There are 1 best solutions below

0
Nick J On

boxplot isn't designed to work well with legend, as it seems you've discovered. It's designed to provide a useful visualization, so it's elements and data aren't overly convenient to use for other statistics or plotting purposes. That said, two ways to get what you want is to either rearrange the child objects so that legend puts them in the order you want, or (easier in my opinion) call legend with the graphics handles of the lines you want to see in the legend. Looking at help legend, you see there's a calling form:

legend (HOBJS, ...).

HOBJS is just an array of graphics handles. So the rest of this is about finding the correct handles you want to display.

It would have been helpful to see exactly what you're seeing onscreen, as well as which versions of Octave and the Statistics package you're using for boxplot. Note that while your comment says the makeup of the boxes depends on your Operating System, I'm not seeing this on either my Windows or Ubuntu versions of Octave (version 8.4.0 with statistics version 1.6.0). I suspect it is more likely due to your graphics toolkit setting. You can check that with the graphics_toolkit command.) Perhaps in older versions of octave gnuplot used individual lines instead of a single multi-point line object to draw the boxes? (testing in 8.4.0 it is still the same single line object for the box). The reason I say this is that, as I'll show below, the first two items in the legend are the lines for the two boxes.

By default, legend displays all items associated with the axes object's children, in reverse order to how their handles are stored in the axes children property. This can be seen by simply typing legend without any parameters to show the default legend with all objects, which appears like:

pkg load statistics
data1 = randn(100, 1);
data2 = randn(100, 1) + 3;
b=boxplot({data1, data2})
hold on;
p1=plot([0 2.5],[2 2],'g--')
legend

base boxplot with extra dashed green line

looking at the axes children:

hax = gca;

plot_children = get(hax, 'children')

plot_children = get(hax, "children")
plot_children =

  -49.192
  -50.009
  -51.401
  -52.245
  -53.519
  -54.200
  -55.394
  -60.358
  -61.554
  -62.016
  -63.631
  -64.866
  -65.637
  -66.030

get (plot_children, "type")
ans =
{
  [1,1] = line
  [2,1] = line
  [3,1] = line
  [4,1] = line
  [5,1] = line
  [6,1] = line
  [7,1] = line
  [8,1] = line
  [9,1] = line
  [10,1] = line
  [11,1] = line
  [12,1] = line
  [13,1] = line
  [14,1] = line

get(plot_children , "color")
ans =
{
  [1,1] =
     0   1   0
  [2,1] =
     1   0   0
  [3,1] =
     1   0   0
  [4,1] =
     1   0   0
  [5,1] =
     0   0   1
  [6,1] =
     0   0   1
  [7,1] =
     0   0   1
  [8,1] =
     0   0   1
  [9,1] =
     0   0   1
  [10,1] =
     0   0   1
  [11,1] =
     0   0   1
  [12,1] =
     0   0   1
  [13,1] =
     0   0   1
  [14,1] =
     0   0   1
}

So, they are all line types, and the first child in the list is green. The only green item is the dashed line, which is the last one made and the last one shown by legend. Note that you get the handle for the green line when you created it:

>> p1
p1 = -49.192

>> get(p1, 'color')
ans =

   0   1   0

The two median lines are red and so is the single line used to draw outliers for all boxes. Other than color, there is not much good initial identifying information for the different boxplot line objects. Also, there may be a different number of objects depending on your data, such as whether or not you have any outliers (if not, the line creating the outlier marks is simply not made). Thankfully, the parts are always built in the same order. A few ways to determine which is which:

1 - pull all of the x and/or y data for the child objects:

xvalues = get (plot_children, "xdata")
xvalues =
{
 [1,1] =

          0   2.5000

  [2,1] =

     1   2   2

  [3,1] =

     1.6000   2.4000

  [4,1] =

     0.6000   1.4000

  [5,1] =

     1.9500   2.0500

  [6,1] =

     0.9500   1.0500

  [7,1] =

     1.9500   2.0500

  [8,1] =

     0.9500   1.0500

  [9,1] =

     2   2

  [10,1] =

     1   1

  [11,1] =

     2   2

  [12,1] =

     1   1

  [13,1] =

     1.6000   1.6000   1.6000   2.4000   2.4000   2.4000   2.4000   2.4000   1.6000   1.6000   1.6000

  [14,1] =

     0.6000   0.6000   0.6000   1.4000   1.4000   1.4000   1.4000   1.4000   0.6000   0.6000   0.6000
}

so we see here from the xdata that the most complicated items are associated with the last two lines objects, which are the boxes. ydata shows similarly:

>> get(plot_children, "ydata")
ans =
{
 [1,1] =

     2   2

  [2,1] =

    -2.2308   0.7610   5.4604

  [3,1] =

     3.0633   3.0633

  [4,1] =

    -0.1161  -0.1161

  [5,1] =

     5.1581   5.1581

  [6,1] =

     1.9568   1.9568

  [7,1] =

     0.9468   0.9468

  [8,1] =

    -2.1596  -2.1596

  [9,1] =

     5.1581   3.6643

  [10,1] =

     1.9568   0.5113

  [11,1] =

     0.9468   2.5391

  [12,1] =

    -2.1596  -0.5580

  [13,1] =

     3.0633   3.2399   3.6643   3.6643   3.2399   3.0633   2.8866   2.5391   2.5391   2.8866   3.0633

  [14,1] =

    -0.116128   0.051752   0.511314   0.511314   0.051752  -0.116128  -0.284009  -0.557990  -0.557990  -0.284009  -0.116128

(The 'box' lines have seemingly extra points because it uses a single line object do draw boxes with notches whether or not the 'notches' option is used. Without the notches, all of the extra points just stay aligned with the box boundary.)

2 - To see what parts all of the other lines objects got to, you could also step through the plot objects and turn their visibility on and off. A simple for loop like:

for idx = 1:length(plot_children)
  disp(idx);
  set(plot_children(idx), "visible", "off");
  pause;
  set(plot_children(idx), "visible", "on");
end

will let you step through all of the objects one at a time, seeing them turn off. Again, this shows the boxes being the last plot_children items.

So, after all of that, how to get legend to just show the entries you want? Well, again, hard to tell what you were seeing, but it turns out the 'box' objects are the first items on the list (corresponding to the last items in the list of children). Again, this can be verified by changing the color/linestyle of those boxes, which should show right up in the legend.

set(plot_children(end), 'color', 'c')
set(plot_children(end-1), 'color', 'm')

this should set the boxes and their legend entries to two different colors, as shown below: boxplot with box object recolored

If we were to just turn on the legend with custom labels for only the first 2 objects, you'd see just the boxes labeled in the legend:

legend off
legend('dataset1', 'dataset2')

boxplot with just colored boxes in legend

Now, if you want the line to be item #3, it's easiest to just call legend with the handles of only the objects you want. The boxes are the last two in the plot_children list, so we can just use end to reference them, and we can use the line's plot handle for the third item (alternatively, we saw it's also item #1 in the plot_children list.):

legend off

>> box1_handle = plot_children(end)
box1_handle = -66.030
>> box2_handle = plot_children(end-1)
box2_handle = -65.637
>> legend_list = [box1_handle, box2_handle, p1]
legend_list =

  -66.030  -65.637  -49.192

>> legend_labels = {"dataset1","dataset2","extraline"}
legend_labels =
{
  [1,1] = dataset1
  [1,2] = dataset2
  [1,3] = extraline
}

>> legend(legend_list, legend_labels)

boxplot with boxes and extra line labeled

So I think that covers getting just the things you want displayed in the legend. Note I mentioned above that you can also rearrange the axes children to affect the legend display order. For example, you can reverse the display order by flipping the list of children:

legend off
set (hax, "children", flipud(plot_children))
legend

which reverses the display order as well (note that the label names have been retained): boxplot with flipped legend order

Again, if you only show the first couple items, it will now only show the extra line and the outliers. The children order can be maniuplated however you see fit.

legend off
legend ("firstitem", "seconditem")

boxplot with flipped children only showing 2 items