*Geek Box: Interpreting Box Plots
Boxplots may appear confusing, but in fact are easy – and very informative – when you know what you are looking at. Boxplots are one of the best ways to display quantitative data, yet bar charts are often used in papers [generally bar charts are more appropriate for categorical data, such as frequencies, percentages, or scales], but boxplots provide substantially more information.
So, let’s start with the box itself: this represents the middle 50% of the data – known as the ‘interquartile range’, i.e., the top of the box is the 75th percentile, and the bottom is the 25th percentile. Across the box you can see a horizontal line: this is the median, i.e., the middle value(s) in the data.
A large box, i.e., a large interquartile range indicates that there is large variability in the values in the data; a small box indicates that most values fall closely within the middle of the data, i.e., are gathered around the median. By looking at the median line, you can also gauge where it fell within the middle 50% of the data.
You’ll also notice ‘whiskers’ extending from the top and bottom of the box, and triangle-shaped ‘dots’ which lie beyond those whiskers. There are several options when plotting whiskers, but the often depict the minimum [bottom] and maximum [top] values. Any dot or symbol beyond these whiskers indicates any outlier(s), and in the graphs above you can see triangle symbols which represent the outliers whose values lay beyond the minimum and maximum values. As you can see, there is a lot of data presented in boxplots, which is very helpful to interpreting findings.