6.1 Parameter and Statistic

Wanhua Su

6.1 Parameter and Statistic

We first review the relationship between parameters and statistics.

A parameter is a constant (usually unknown) used to describe some aspect of a population. For example, the population mean [latex]\mu[/latex] is the average value of a characteristic of interest for all individuals in a population (such as the average height of all individuals in Canada).

A statistic describes some aspect of a sample, much like a parameter describes some aspect of a population. However, unlike a parameter (the value of which is assumed to be constant), the value of a statistic varies from one sample to the next. Thus, we generally view a statistic as a random variable before obtaining a random sample, while we view a statistic as a fixed number after the data are collected.

For example, consider the sample mean [latex]\bar{X}[/latex], which is the average of all individuals in a sample. If we have not yet obtained a random sample, then the value of [latex]\bar{X}[/latex] has not yet been realized, and therefore, we can view [latex]\bar{X}[/latex] as a random variable whose value depends on which individuals from the population end up in our sample. However, once we have obtained our sample and computed the value of the sample average, we now have a fixed number, denoted as [latex]\bar{x}[/latex]. That is, [latex]\bar{X}[/latex] denotes the sample average when viewed as a random variable. In contrast, [latex]\bar{x}[/latex] denotes a particular realization of [latex]\bar{X}[/latex], which depends upon the actual data we have obtained.

Suppose there are N individuals in a population from which we plan to obtain a simple random sample of size n. If we wish to compute the value of a statistic, then it is of interest to know what values the statistic may assume and their corresponding frequencies. If we consider all possible samples of size n, there are a total of [latex]\binom{N}{n} = _NC_n[/latex] (N choose n) distinct samples, many of which give different values of the statistic. Drawing a histogram of these [latex]\binom{N}{n}[/latex] values gives the sampling distribution of the statistic. We can describe the distribution of the statistic with its mean, standard deviation, and shape.

Sometimes, we use a statistic to estimate the value of a population parameter; we call the statistic an estimator of the parameter. If sample data are obtained, and the value of the estimator (statistic) is computed, then we refer to this observed value as a point estimate of the population parameter. For example, the sample mean [latex]\bar{X}[/latex] is an estimator of the population mean [latex]\mu[/latex] and a value of the random variable [latex]\bar{X}[/latex], denoted as [latex]\bar{x}[/latex], is a point estimate of [latex]\mu[/latex]. Because [latex]\bar{x}[/latex] is based on a sample of size n rather than on the entire population, it is generally the case that [latex]\bar{x}[/latex] is not equal to [latex]\mu[/latex]. The difference between the point estimate and the parameter is called the sampling error. We call an estimator unbiased if the average of the [latex]\binom{N}{n}[/latex] values of the estimate is equal to the population parameter. For example, the sample mean [latex]\bar{X}[/latex] is an unbiased estimator of the population mean [latex]\mu[/latex], i.e., the mean of the sample mean equals the population mean.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License