Excel Box Plot
A box plot in excel is a pictorial representation or a chart that is used to represent the distribution of numbers in a dataset. It indicates how the values in the dataset are spread out. In a boxplot, the numerical data is shown using five numbers as a summary: Minimum, Maximum, First Quartile, Second Quartile (Median), Third Quartile.
Between the first and third quartiles, a box is drawn along with an additional line that is drawn along the second quartile to mark the median. Extended lines are drawn outside the first and third quartiles to depict the minimum and maximum. These extended lines are called whiskers in the boxplot.
The box of the boxplot shows first to the third quartile with a line at the second quartile i.e., median. The whiskers’ ends depict the minimum and maximum.
The five numbers represented by the boxplot:
- Minimum: Smallest/Least value of the dataset.
- First Quartile: Mid-value of minimum and median.
- Second Quartile / Median: Mid-value of the dataset.
- Third Quartile: Mid-value of median and maximum.
- Maximum: The Largest value of the dataset.
How to Make Box Plot in Excel? (with an Example)
Let’s say we have a dataset of a class that contains total marks of its students in all subjects (out of 500, max marks: 100 for each subject), and we wish to create a box plot in excel for the same.
We have the following students’ marks data in an excel sheet:
Now the following steps can be used to create a box plot for the above data (basically in excel 2013):
It is very simple to create a box plot with Excel 2016 as it has a ‘Box and Whisker’ chart under the statistical charts in the charts section by default. However, Excel 2013 does not have a chart template for the box plot by default; thus, we have to create it by following the below steps:
Calculate the five statistics from the dataset that are required for box plot: Minimum, Three Quartiles, & Maximum, by using the following functions:
Now calculate minimum statistics from the dataset.
Now calculate quantile1 statistics from the dataset.
Calculation of five statistics for the box plot will be –
We will now create a final table (say differences table) that we will use to create the box plot in excel. In this final table, we will copy the minimum value, and that will be the first entry of our final table. The rest of the entries of this final table would be the differences between each statistic as below:
Now we will create a stacked column chart with this final table and turn it into a box plot.
So let us first create a stacked column chart in excelStacked Column Chart In ExcelA stacked column chart in Excel is a column chart in which multiple series of data representations of various categories are stacked over each other. The stacked series are vertical, and comparing multiple data series is easy, but as the number of data series increases, so does the complexity of representation.:
Select Differences and Value then click on ‘Insert’ -> All Charts -> Stacked Column Charts:
On doing this, we get a stacked chart as below:
We can see that this is not similar to a box plot because, with this stacked chart, excel draws stacked columns from horizontal by default and not from a vertical dataset. So we will have to reverse the chart axes.
To do this, right-click on the chart, and click on ‘Select Data.’
Now click on Switch Row/Column’.
We get stacked chart as below:
We will now convert this stacked chart type to box plot as follows:
Select the bottom part of the column (blue area), and click on ‘Format Data Series.’
In the ‘Format Data Series’ panel, expand the ‘Fill’ option and select the ‘No Fill’ button, and from the ‘border’ dropdown, expand it and select the ‘No Line’ button:
We get the following box plot chart as below:
The next step is to create whiskers by replacing the topmost and second from bottom segments i.e., red and orange regions (as we have deleted the bottom-most) with lines/whiskers.
To draw the top whisker, we select the topmost region/segment (red) and expand the ‘Fill’ tab.
And select the ‘No Fill’ button.
Now click on ‘Design’ -> ‘Add Chart Elements’ -> ‘Error Bars’ -> ‘Standard Deviation’:
Now we click on the Plus button at the top right of the chart, and select and expand ‘Error Bars in excel’ and then select ‘More Options.’
This will open the ‘Format Error Bars’ panel and set the following:
Set: Direction to ‘Plus’
End Style to ‘Cap’
Percentage to ‘100%.’
So we will now have a top whisker drawn as below:
Similarly, to draw the lower whisker, we select the second from the bottom region (orange, which is now visible as last) and repeat the same steps as above. The only change would be that the ‘direction’ in error bars’ will be set to ‘minus.’
So we will now have a lower whisker drawn as below:
So we can see in the above screenshot that the stacked column chart is now resembling a box plot. Box plots are generally of the same color throughout, so we can use a one fill color for it with a slight border.
Now let us see how a box plot is interpreted or viewed:
We can see in the above screenshot that:
- Lower whisker’s endpoints at 300, which depicts the minimum value.
- The top whisker’s endpoints are somewhere below 500, which is exactly 492, which depicts the maximum.
- The top line of the green box depicts quartile3, which we see points at 480.5.
- The mid-line of the green box depicts median/quartile2, which we see points at 450.
- The last line of the green box depicts quartile1, which we see points at 392.
So the box plot for the given dataset is correctly drawn with the five statistics (minimum, three quartiles, and maximum) calculated as above.
Things to Remember
- A box plot is a pictorial representation of a numerical dataset that uses a five-number summary to depict the dataset distribution.
- The Box plot s also known as the box and whisker plot.
- It is generally used for explanatory data analysis.
- They are generally used when we need to compare some samples and test if the data is distributed symmetrically.
- Box plots take less space as compared to density plots or histograms.
- It is used to show the shape of a distribution, central value, and its variability.
- It is not necessary that the median is in the middle of the box.
- The whiskers may be of different lengths.
- Box plot can be used to detect outliers.
This has been a guide to Box Plot in Excel. Here we discuss how to create (make) a box plot in excel along with step by step examples and a downloadable excel template. You may learn more about excel from the following articles –