Summary of the Video
If we repeat a random sample or a randomized comparative experiment, we will not get the same result as on our first try. A manufacturer taking regular samples of product for quality control purposes also faces the fact that the results will vary even when the process being observed is stable. All processes have inherent variation. But random selection implies that there is a regular pattern to the variation in repeated samples. This pattern is called the sampling distribution. We will look at the sampling distribution of one important numerical description of data, the sample mean .
We visit AT&T's Oklahoma City plant, a highly automated factory with robot carts running about. The plant makes circuit boards for telephone exchange switches. Each board has about 2000 electrical connections, which are soldered all at once by passing the board through a standing wave of molten solder. To monitor this critical operation, workers inspect a sample of 5 boards at regular intervals. Each board is scored for solder quality; 100 is the standard, below 100 is lower quality, above 100 is better than standard. If the average score is too low, that's a sign that something is wrong. This is statistical process control , using samples to monitor the process at critical stages rather than waiting to inspect the finished product.
The pattern of variation in the mean score in repeated samples is described by a normal distribution. If the process is on target, the mean will be 100. How do we decide when an is so small that we should react? We need to know more about the sampling distribution of .
Take a simple random sample of size n from a population having mean µ and standard deviation . In the process control setting, the population is the scores of all circuit boards that would be produced if the process ran on forever in its present state. We can also think of the population distribution as the distribution of individual observations drawn at random from the population. Two big facts:
1. If the population has a normal distribution, so does the mean of a simple random sample;
2. The mean of the distribution of is the same as the mean µ of the entire population; but the standard deviation of is smaller than that of the population. In fact,
An control chart helps us distinguish between the natural variability of the process and the extra variation that shows that the process has been disturbed. We want to take action if there is a disturbance, but we don't want to react to the natural variation present in any process. To make a control chart, plot the sample 's in time order. Draw a center line at 100, the process mean when no disturbance is present. If we know that the process s is 4, then the standard deviation of is = = 1.79. By the 99.7 part of the 68—95—99.7 rule, 99.7% of all samples will have an falling within 3 standard deviations of 100. That's between 94.63 and 105.37. Draw control lines (always drawn dashed) on the chart at these two levels. The control lines mark off the natural range of variation of when the process is operating undisturbed. Points outside the control lines are good evidence that something has gone wrong.
The third big fact about the sampling distribution of is the central limit theorem (CLT). The CLT says that when we take many observations, the distribution of is close to normal whether or not the population distribution is normal. The Census Bureau, for example, takes a very large sample to estimate average income. The mean income in this sample varies from sample to sample according to a normal distribution even though the distribution of incomes in the population is strongly skewed.