Normal distribution, also called Gaussian distribution, is probably the most important distribution related to continuous data from a statistical analysis standpoint. It is sometimes called the “bell curve,” although the tonal qualities of such a bell would be less than pleasing. A normal, or Gaussian, distribution is depicted below.
Normal data is shaped symmetrically surrounding the mean, represented above by the x-bar line. A normal curve is beneficial for determining the probability that a given data point in a population will fall inside a certain range within the distribution.
Strictly speaking, it is not correct to talk about “the normal distribution” since there are many normal distributions. Normal distributions can differ in their means and in their standard deviations. Figure above shows three normal distributions. The green distribution has a mean of -3 and a standard deviation of 0.5, the distribution in red has a mean of 0 and a standard deviation of 1, and the distribution in black has a mean of 2 and a standard deviation of 3. These as well as all other normal distributions are symmetric with relatively more values at the center of the distribution and relatively few in the tails.
Eight features of normal distributions are listed below. These features are illustrated in more detail in the remaining sections of this chapter.
- Normal distributions are symmetric around their mean.
- The mean, median, and mode of a normal distribution are equal.
- The area under the normal curve is equal to 1.0.
- Normal distributions are denser in the center and less dense in the tails.
- Normal distributions are defined by two parameters, the mean (μ) and the standard deviation (σ).
- 68% of the area of a normal distribution is within one standard deviation of the mean.
- Approximately 95% of the area of a normal distribution is within two standard deviations of the mean.
- Three standard deviations from the mean, indicates the portion of the curve that covers 99.73 percent of the data. The 3 standard deviation indicates the size of standard control limit area in Statistical Process Control (SPC).
Distributions – even normal distributions – vary a bit. To determine the exact probabilities of various data points, advanced statistics are required; Excel and other programs such as Minitab perform the calculations for you, making it easier to conduct analysis. Before we discuss in other statistical tools we’ll look at determining whether your data is normal in the first place using in cost effective solutions, Microsoft Excel in our next post
Testing whether data is normal is critical to many steps in statistical analysis, because the results of many tests can be invalid if you don’t account for the data you are working with. The most basic form of many of these tests are designed to work with normal data.
Normally, the Lean Six Sigma Green Belt will cover on the normal distribution analysis and the Non-Normal analysis will be covered on Black Belt session.