Untitled Document

Sample size

As described in "Sampling distributions" (under Inferential statistics) a larger sample contains more information. Hence, a parameter (like the population mean) can be estimated with more precision (lower standard deviation) as the sample size increases.

The number of observations in a sample helps to control the probability of making a Type II error (the probability of accepting a false null hypothesis).

For purpose of hypothesis testing about the population mean the following rule is applied in practice.

Is n large (n>=30)?

no -- Is the population approximately normal?
- no -- increase sample size to 30 or more
- yes -- Is the value of sigma known?
  - no -- estimate sigma -- use the t-distribution
  - yes -- use the normal distribution (z)
yes -- Is the value of sigma known?
- no -- estimate sigma -- use the normal distribution
- yes -- use the normal distribution (z)

The determination of an "appropriate" sample size depends on the parameter in question. For the population mean the following information is necessary:

The population variance (or some estimate, e.g., the range divided four).
The degree of confidence.
Some specified bound for the sample mean.

Therefore, the estimate of the sample size can be obtained by applying the following formula:

n = ( z(a/2) * sigma / bound ) ** 2

where z(a/2) is the value from the table of a normal distribution with alpha over two (a/2) as the level of significance, sigma is the population variance, and bound is the limit of the interval for the sample mean. The whole expression is squared (** 2).

Illustration

Suppose we are interested in estimating the mean GPA at UWF, with 95% confidence, to within 0.25 of a point. z(a/2) = z(0.025) = 1.96 bound = 0.25 Suppose that sigma = 0.725 (for instance, assuming that the lowest GPA is 1.1 the range would be 2.9 = 4 - 1.1, and 2.9/4 = 0.725).

The estimated sample size would be: n = (1.96 * 0.725 / 0.25) ** 2 = 32.308 Therefore, we need to sample about 33 students.