Suppose you have a data set with $$n = 435$$ observations, and you want to find the first decile, or the 10th percentile. This means that $$y = 10$$ and you're looking for $$L_{10}$$.
First, you sort the data in ascending order (smallest values at the top of the list, largest values at the bottom of the list). Next, you apply the following formula:
$$\displaystyle L_y = (n+1) \frac{y}{100}$$
Which observation gives the 10th percentile?
It's relatively easy to find a percentile value when it is simply the mid-point between two observations. However, reconsider the example of looking for the 43.6th observation to find the 10th percentile. Suppose that the 43rd observation is 0.21 and the 44th observation is 0.24. What is the value of the 10th percentile in this case?
Incorrect.
Consider whether the value of the 43.6th observation will be closer to the value of the 43rd observation or the value of the 44th observation.
Good answer. When you apply the appropriate formula, you get
$$\displaystyle L_y = (n+1) \frac{y}{100} = (435+1) \frac{10}{100} = 43.6$$.
But this is both right and wrong. On one hand, it is true that the 10th percentile is between the 43rd and 44th observations. But on the other hand, how can there be a 43.6th observation? When you are counting stuff, like lines in your list of data, you count in whole numbers: 1, 2, 3, etc. You can't have a fractional observation.
To get a better answer, you have to do what is called __linear interpolation__, which means that you use a weighted average to find the correct amount between the two observations.
Incorrect.
Only if there are 99 observations would the 10th observation be the 10th decile.
Think back to the median, which is also the 50th percentile. Suppose that you have 100 observations, and you want to calculate the median. In this example, the median can be found at the $$(100+1)\frac{50}{100}^{th}=50.5 ^{th}$$ observation.
Using linear interpolation to determine the value of the 50.5th observation, you find the mid-point between the 50th and 51st observations. So if the 50th observation is 192 and the 51st observation is 201 then you get the median:
$$\displaystyle \frac{192+201}{2}=196.5$$
To sum up:
Linear interpolation is using a weighted average to find the appropriate amount between the values of two observations.
Yes!
The answer is 60% of the way between the values of the 43rd and 44th observations. The calculation is:
$$0.21 + 0.6(0.24-0.21) = 0.21 + 0.6(0.03) = 0.21 + 0.018 = 0.228.$$