A histogram is a graph that is similar to a bar chart.
The way that a histogram is different from a bar chart is that on a histogram not all the bars are the same width. Because of this, we use the area of the "bars" instead of the height only.
Histograms look just like bar charts. There are two important differences:
The widths of the columns can change The height of the column does not give you the frequency, the area does
The y-axis of the graph is always ‘frequency density’.
The best way of understanding why they are useful is to look at an example. Imagine someone who works for the Civil Service being asked by the Government to figure out how long it takes people to get to work. What would your survey look like, what would you ask people?
If you gave them boxes to tick like 0-30 minutes, 31-60 minutes, 61-90 minutes etc. then for people who live very close to work, you would miss lots of variation in their travel time. Some might take 5 minutes, while others take 15 minutes, and 15 minutes is three times as long as 5! So you might want to split it down more: 0-5m, 6-10m, 11-15m… but then some people will take an hour or more to get to work – are you going to have two dozen tick boxes? And does it really make a difference to someone if it takes 55 minutes, or an hour to get to work?
There’s an alternative. Start off with small time frames, and then increase them, e.g. 0-5m, 6-10m, 11-20m, 21-30m, 31-60m, 60-120m.
The best way of showing the different time frames visually would be to have columns that are different widths. The 0-5m responses will be shown with a thin column, while the 60-120m boxes will be shown by a fat column. But then, there last category is so wide that lots of people are likely to fit into it, far more than the 0-5m category, and so the last column would look huge, and the first column tiny, making it difficult to read.
To solve this problem, instead of height, the area of the column is used to represent the number of people who fit into each category.
The biggest mistake learners make is treating a histogram like a bar chart, and looking at the height of the columns. Instead, treat it as a series of rectangles, with a width and a height, and an area.
The width will always be the ‘class width’ (e.g. 0-5m, 21-60m), the height will always be ‘frequency density’, and the area will always represent ‘frequency’.
In a survey 120 people were asked how far they travel to work each day.
The table shows the results
|Distance, d (miles)||Frequency|
|0 < d ≤ 5||17|
|5 < d ≤ 10||29|
|10 < d ≤ 20||36|
|20 < d ≤ 30||24|
|30 < d ≤ 50||14|
In order to draw a histogram we need to work out the class widths and frequency densities
|Distance, d (miles)||Frequency||Class width||Frequency density|
|0 < d ≤ 5||17||5||3.4|
|5 < d ≤ 10||29||5||5.8|
|10 < d ≤ 20||36||10||3.6|
|20 < d ≤ 30||24||10||2.4|
|30 < d ≤ 50||14||20||0.7|
Now to plot a histogram we need to plot Frequency density against Distance. We can use the lower limit of the class boundaries to see where to start. i.e. 0 to 5 will have a frequency density of 3.4. (Note: When drawing a histogram on graph paper, the bars will have a different size depending on their class width e.g. for the 30 < d ≤ 50, the class width is 20 and the frequency density is 0.7. So in actual fact this will look like a fat stumpy bar.)
Follow the links below to see how this topic has appeared in past exam papers