# Introduction to Six Sigma Statistics

I thought I’d bring in some key points here on basic **Six Sigma Statistics** before embarking on a serious Design of Experiment, or multiple regression analysis. I’ve come across several black belt candidates in training (like myself), who get too caught up with the statistical side of things (or Minitab) and forget the basics. I thought I’d throw a refresher out there. While you don’t have to be a statistician to master or appreciate Six Sigma Statistics, you do have to have your head screwed on.

## Basic Notions of Six Sigma Statistics

**1. Learn the scales of measure**

Nominal

It is in the name

Marital status, Phone numbers

Ordinal

Relative, unequal value ranking

Race finish, opinion poll response

Interval

Equal intervals are equal differences

Calendar year, Fahrenheit temperature

Ratio

Proportional amount of difference

Has a real zero value

Annual income, Kelvin temperature

**2. Learn the measures of central tendency**

Mean: Arithmetic average of a set of values

• Reflects the influence of all values

• Strongly Influenced by extreme values

Mode: The most frequently occurring value

Median: Midpoint in a string of sorted data, where 50% of the observations, or values, are below and 50% are above

• Does not necessarily include all values in calculation

• Is “robust” to extreme scores

• Organize the data from low values to high when determining the Median

**3. Learn the measures of dispersion**

Range: the distance between the extreme values of a data set (Highest – Lowest)

The range is more sensitive to outliers than the variance

Variance: the Average Squared Deviation of each data point from the Mean

Standard Deviation: the Square Root of the Variance

measure of the average deviation about the mean

**4. Understand the different types of data**

Variable (quantative)

Continuous Data (Decimal subdivisions are meaningful)

o Time (seconds)

o Pressure (psi)

o Conveyor Speed (ft/min)

o Rate (inches

Attribute (qualitative)

o Categories

o Good/Bad (Pass/Fail)

o Machine 1, Machine 2, Machine 3

o Shift number

o Counted things (# of Errors in a document, # units shipped, etc.)

Convert Attribute to Continuous wherever possible. Here are some examples of attribute data converted to continuous data:

– Count of defects to ‘% defects’

– Y/N late to ‘average days late’

– Leaks/No leaks to ‘rate of leaks on a continuous scale’

– Success or failure of electrical parts to ‘voltage flow of good parts’

**5. Descriptive Statistics**

Consists of basic statistics and graphical techniques used to summarize data

Measures of central location

Measures of spread (dispersion)

Evaluation of symmetry & skewness

Typical graphical techniques

- Histograms

- Boxplots

- Dotplots

- Normal probability plots

**6. Be able to identify different data distributions**

Why does it matter? If you’re treating non-normal data as normal, you’ll get a totally different p-value for your data set, thereby overstating or understating your prediction or analysis.

Normal Distribution (Bell Curve)

The “Normal” Distribution is a distribution of data which has certain consistent properties (the mean, median and mode are equal in value)

These properties are very useful in our understanding of the characteristics of the underlying process from which the data were obtained

Most natural phenomena and man-made processes are distributed normally, or can be represented as normally distributed

The Normal Distribution is a continuous distribution which is symmetrical and extreme values are less likely than moderate values (unimodal)

An example would be measuring heights of people or the length of a table. In either case the measurement is continuous and can be broken down into finer increments

t-distribution

The t distribution assumes samples are drawn from a normal distribution but the population variance, s2, is not known… The shape of the t-distribution varies as the sample size, n, changes. The distribution becomes more narrow as the sample size becomes larger. As n becomes very large, the critical value corresponding to the area under the curve approaches the Normal distribution’s Z value

Poisson Distribution

Appropriate as a model of number of defects or nonconformities in a unit of product

X is number of defects found in a per unit basis

– Per unit area, per unit volume, per unit time, etc.

– Area X is a discrete, positive integer

– Area for opportunity is a finite region of space, time or product

When the average is high, the distribution can be approximated by the normal distribution

When the average is low, the distribution is skewed to the right

F Distribution

A continuous distribution formed from the ratio of variances calculated from two independent samples drawn from Normal Distributions

Chi-square Distribution

A continuous distribution used in statistical hypothesis testing and confidence interval estimation for many different applications, including inferences about a population variance

**7. Know about samples and population sizes**

Population is every possible observation (census)

Samples are subsets of populations

Data is obtained using samples because we seldom know the entire population

Descriptive statistics apply to any distribution

- Sample or population

Population statistics are desired, but often not available

Samples from a population can be used to ‘infer’ or approximate population parameters

**8. Know the Statistics and Reporting Tools**

By far the most used Six Sigma tools is Microsoft Excel. Excel provides most of the day to day uses required to manage most Six Sigma projects. Please note though that Excel must be accompanied by other Microsoft tools such as Word, PowerPoint, and Outlook for email. In other words, the Six Sigma Black or Green belt candidate should always ensure that he or she has the latest version of Microsoft Office installed.

Also, check with your IT department or on the Microsoft Excel disc for the “Data Analysis Pack” that will allow you, through Excel to perform:

– ANOVA

– Correlations

– Covariances

– Descriptive Statistics

– Exponential Smoothing

– F-Tests and Two-Samples for Variances

– Fourier Analysis

– Histograms

– Various t-tests

However, the ultimate tool for Six Sigma Black Belts is MiniTab. Minitab will get you going where Excel leaves you hanging.

– Minitab will go into profound levels of englightenment with

– Measurement System Analysis

– Multivariate Analysis

– Anova

– Regression Analysis

– Statistical Process Control

– Reliability/Survival Analysis

– And other great simulations

Don’t forget to check out Minitab’s powerful reporting tools that will provide the following graphical reports:

– Dotplots / Histograms / Normal Plots

– Run charts / Time Series

– Pareto Diagrams

– Stratification (2nd Level Pareto)

– Boxplots

– Scatter Plots

– Checksheets / Concentration Diagrams

Now you should be ready to go run your multi-var charts, multiple regression, 2k full-factorial regression, and last but not least, your Design of Experiment. We’ll keep data transformations to another paper. At least by now you should have a good handle on** six sigma statistics**

Don’t forget to go to templatestaff.com to download free Six Sigma spreadsheets

## Leave a comment