Histogram: What Are They? How Do You Make One?
A histogram is a graphical representation of data that comprises a series of contiguous columns (i.e., there is no gap between them), the height of which correspond to the frequency of observations that fall within the range of values covered by each column.
Here's a histogram that details the height (in inches) of a sample of 100 pigeons.
Here we have 6 classes (14 to just below 15, 15 to just below 16, etc), each of which has a width (or interval) of one inch. The height of each column corresponds to the number of pigeons that fall into each class - four pigeons between 14 and 15 inches, 18 pigeons between 15 and 16 inches, 38 pigeons between 16 and 17 inches, and so on.
The advantage of a histogram should be apparent. At a glance, it gives us a lot of information about the distribution of our data. In this case, we can see that most pigeons have a height that lies somewhere in the middle of the total range of heights. There aren't many very short pigeons, and there aren't many very tall pigeons.
How Do You Make a Histogram?
You could just use our easy histogram maker, but if you want to do the job by hand, follow these instructions.
1. The first step is to divide your distribution into classes (or bins), which are, in effect, containers for your individual scores. There is no standard way to calculate how many classes you need, but a good rule of thumb is to take the square root of the total number of scores in your distribution, rounding up or down, if necessary, making sure you've got at least 3 classes and no more than 20. So, for example, if your distribution has 27 items, 5 or 6 classes would be appropriate.
2. Next you've got to work out the width (or interval) of your classes. There are various things to bear in mind: (a) you've got to get all your data into the classes; (b) the classes must be contiguous - for instance, you can't leave out a class in the middle of a distribution just because it has no scores in it; (c) the classes must be mutually exclusive - there can be no ambiguity about which class a score belongs to; and (d) the classes should (normally) be of equal width.
Here's a simple way to get started. Find your lowest score. Subtract it from your highest score. That's the range of your distribution. If you divide that by the number of classes you determined in step 1, and then round up, you'll have a working class width. A caveat here is that you'll need to add a class if there is no remainder when you divide.
3. Now you've got to sort out your axes. The horizontal axis (x) represents your scores, the vertical axis (y), your frequencies.
Find your lowest score, and round down if it isn't a whole number (so, for example, 5.2 would become 5). That's the starting point (the lower bound) of your first class. Now just add your class width to find the lower bound of subsequent classes (for example, if the lower bound of your first class is 5, and your class width is 5, the lower bound of your second class will be 10, the lower bound of your third class will be 15, and so on). Plot these along the x-axis, as per the example at the top of the page, until you reach a number that is higher than your highest score. This is the upper bound of your final class (and, all being well, the total number of classes will equal the number you calculated in step 1).
The first value on the y-axis will (almost) always be zero. Any other value is likely to be misleading. To determine the highest value, you need to count how many scores are in each of your classes (a score falls into a class if it is greater than or equal to the lower bound and less than the upper bound), identify the class with the greatest number of scores, and then round up appropriately (so, for example, if the highest frequency is 37, you'd probably round up to 40, which would become the highest value on the y-axis). You fill in the rest of the y-axis by dividing it up equally as in the example at the top of the page.
4. All that remains is to draw in the columns. The height of each column represents the frequency of the scores found within its associated class. Remember, there are no gaps between columns, because histograms are used for continuous data.
- Histograms are appropriate if your data is continuous (i.e., at the interval or ratio level of measurement).
- Classes are contiguous - you can't miss out a class because it's empty, for example.
- Classes should have the same width (interval).
- Classes should encompass the entire distribution of data.
- The height of each column represents the frequency of the number of scores found within a particular class width (interval).
Last updated: 14 October 2016