A common type of analysis done with data is looking at how spread out or dispersed data is. A simple measure of dispersion is the __range__ of the data, which is just the difference between the greatest and lowest values in the list of data. Consider these numbers, for example:
12, 13, 17, 28, 19, 30, 34, 35, 42, 48, 51, 52.
Which choice below gives the range of this data?
Incorrect.
Actually, 12 is the lowest value in the list of data, so you need to find the difference between this value and the greatest value, 52.
Correct.
Incorrect.
Actually, 52 is the greatest value in the list of data, so you need to find the difference between this value and the lowest value, 12.
A third way of expressing the dispersion, or spread of data, is the __standard deviation__, which expresses a type of average distance from the center of the data.
What might you use in a standard deviation calculation as a center of the data?
Incorrect.
The range is another measure of dispersion, not a measure of the center of the data.
Correct.
As you may expect, both the range and standard deviation are affected by all data points, especially extreme values. __Outliers__ are values that are extreme compared to the other values in the data set. Consider this data set, for example:
1, 3, 5, 2, 6, 3, 5, 3, 3, 1, 6, 12, 4, 6, 2
Which is the outlier for the data set?
Incorrect.
Actually, 3 is the mode of this data set and not the outlier.
Incorrect.
Actually, 6 is not an outlier of this data set.
Correct.
To sum up, measures of dispersion are useful when analyzing data.
[[summary]]
Since 40 is the difference between the greatest value, 52, and the lowest value 12, the range of the data is 40.
The range of a data set can be greatly affected by extreme values. To avoid the effect of extreme values, the __interquartile range__ is used. This is the difference between the third quartile (Q3) and the first quartile (Q1). The __quartiles__ are measures of position that divide the data into four roughly equal groups.
The standard deviation of a list indicates a type of "average" distance each value is from the mean of the list. Keep in mind that this distance is a special type of average, but not itself a mean.
Indeed, 12 is an outlier for this data set because it is an extreme value as compared to the others. If outliers are accidental or erroneous data, they are excluded from analysis of the data.