【校稿任务】第十三篇——Understanding Statistical Distributions for Six Sigma
本帖最后由 小编H 于 2011-7-20 17:08 编辑
请对以下文章有校对兴趣的组员留下你的预计完成时间,并发短信息联系小编H,以便小编登记校对者信息以及文章最终完成时的奖惩工作。
请对以下文章有校对兴趣的组员留下你的预计完成时间,并发短信息联系小编H,以便小编登记校对者信息以及文章最终完成时的奖惩工作。
Understanding Statistical Distributions for Six Sigma
了解六西格玛中的统计分布
To interpret data, consultants need to understand distributions. This article discusses how to understand different types of statistical distributions, understand the uses of different distributions, and make assumptions given a known distribution.
By J. DeLayne Stroud
为演绎数据,顾问需要了解分布。本文讨论了如何了解统计分布的不同类型、不同分布的应用以及给出一个已知分布的假设。
Many consultants remember the hypothesis testing roadmap, which was a great template for deciding what type of test to perform. However, think about the type of data one gets. What if there is only summarized data? How can that data be used to make conclusions? Having the raw data is the best case scenario, but if it is not available, there are still tests that can be performed.
- J. DeLayne Stroud
许多顾问会做假设的测试模板来决定进行何种类型的测试。无论如何要考虑所取得的数据的类型。如果仅有总结性的数据,如何应用它来得到结论?原始数据最能反映事情的状况,但是它可能不直观,那就仍然需要进行测试。
In order to not only look at data, but also interpret it, consultants need to understand distributions. This article discusses how to:
为了不仅是看到数据,还要演绎它,顾问需要了解分布。本文讨论了以下几点:
•Understand different types of statistical distributions.
•Understand the uses of different distributions.
•Make assumptions given a known distribution.
 了解统计分布的不同类型。
 了解不同分布的应用。
 给出一个已知分布的假设。
Six Sigma Green Belts receive training focused on shape, center and spread. The concept of shape, however, is limited to just the normal distribution for continuous data. This article will expand upon the notion of shape, described by the distribution (for both the population and sample).
六西格玛绿带的培训集中在图形、中心和宽度。图形的概念受限于连续数据的正态分布。本文会通过分布所表现出来的(包括总体和样本)而在图形概念上进行延展。
Getting Back to the Basics
With probability, statements are made about the chances that certain outcomes will occur, based on an assumed model. With statistics, observed data is used to determine a model that describes this data. This model relates to the distribution of the data. Statistics moves from the sample to the population while probability moves from the population to the sample.
回到基本原理
建立在一个假设模型基础上,用概率,陈述估计必然事件发生的机会。对于数据统计学说,观察数据习惯上确定一个描述这个数据的模型。该模型与数据的分布有关。统计是从样本推断到总体,而概率是从总体到样本。
Inferential statistics is the science of describing population parameters based on sample data. Inferential statistics can be used to:
推断性统计是基于样本数据描述总体参数的一门科学。推断性统计可以应用于:
•Establish a process capability (determine defects per million).
•Utilize distributions to estimate the probability of a variable occurring given known parameters.
 确定过程能力(确定百万分缺陷数)。
 利用分布来估计给出已知参数的变量事件的发生概率。
Inferential statistics are based on a normal distribution.
推断性统计基于正态分布。
Figure 1: Normal Curve and Probability Areas
图1:正态曲线和概率面积
Normal curve distribution can be expanded on to learn about other distributions. The appropriate distribution can be assigned based on an understanding of the process being studied in conjunction with the type of data being collected and the dispersion or shape of the distribution. It can assist with determining the best analysis to perform.
正态曲线分布可以扩展获得其它分布。结合收集到的数据类型在对过程策划和分布离差或图形理解的基础上指定恰当的分布。它可以帮助我们得到最好的分析结果。
Types of Distributions
Distributions are classified in the same ways as data is classified - continuous and discrete:
•Continuous probability distributions are probabilities associated with random variables that are able to assume any of an infinite number of values along an interval.
•Discrete probability distributions are listings of all possible outcomes of an experiment, along with their respective probabilities of occurrence.
分布的类型
分布的分类与数据分类相同-连续和离散:
 连续概率分布是随机变量相关的概率,在一个区间内可以取无限多个数值即为随机变量。
 离散概率分布列出一个实验所有可能的结果和它们各自发生的概率。
Distribution Descriptions
Probability mass function (pmf) - For discrete variables, the pmf is the probability that a variate takes the value x.
分布描述
概率质量函数(pmf)-对于离散变量来说,pmf是随机变量取值x的概率。
Probability density function (pdf) - For continuous variables, the pdf is the probability that a variate assumes the value x, expressed in terms of an integral between two points.
概率密度函数(pdf)-对连续变量来说,pdf是取值为x的随机变量在两点之间总体分布概率。
In the continuous sense, one cannot give a probability of a specific x on a continuum – it will be some specific (and small) range. For additional insight, think of x + x where x is small.
在通常意义上来说,人们在一个连续整体中无法给出一个特定x的概率,而是一些特定(很小)的范围。补充一下,可以想象成x+x,x很小。
The notation for the pdf is f(x). For discrete distributions:
Pdf的符号是f(x)。对于离散分布:
f(x) = P(X = x)
Some refer to this as the probability mass function, since it is evaluating the probability upon that one discrete mass. For continuous distributions, one mass cannot be established.
自从用于评估离散质量的概率开始,有些人把离散分布归类到概率质量函数。对于连续分布来说,无法建立一个点的概率质量函数。
Cumulative density function (cdf) - The probability that a variable takes a value less than or equal to x.
累积密度函数(cdf)-变量取值小于等于x的概率。
Figure 2: Normal Distribution Cdf
图2:正态分布Cdf
Cdf progresses to a value of 1 because there cannot be a probability greater than 1. Once again, cdf is F(x) = P(X < x). This holds for both continuous and discrete.
Cdf最大值是1,因为没有大于1的概率。再次,cdf是F(x) = P(X < x)。适应于连续和离散分布。
Parameters
Parameter is a population description. Consultants rely on parameters to characterize the distributions. There are three parameters:
参数
参数是总体分布。顾问依靠参数来描述分布的特征。下面有三个参数:
•Location parameter - the lower or midpoint (as prescribed by the distribution) of the range of the variate (think of the mean)
•Scale parameter - determines the scale of measurement for x (magnitude of the x-axis scale) (think of the standard deviation)
•Shape parameter - defines the pdf shape within a family of shapes
Not all distributions have all the parameters. For example, the normal distribution parameters have just the mean and standard deviation. Just those two need to be known to describe a normal population.
 位置参数-变量范围(考虑到平均)的下限或中心(分布规定的)
 比例参数-决定x(x轴比例的大小)的测量比例尺(考虑到标准偏差)
 图形参数-画出一组图形中的pdf的图形。
不是所有的分布都有所有的参数。例如,正态分布参数只有平均值和标准偏差。描述一个正态总体仅需要这两个参数。
Summary of Distributions
The remaining portion of this article will summarize the various shapes, basic assumptions and uses of distributions. Keep in mind that there is a different pdf and different distribution parameters associated with each.
分布概述
本文剩下的部分将会主要概述各种图形,基本假设和分布的应用。记住每个分布都有不同的pdf 和不同的分布参数。
Normal Distribution (Gaussian Distribution)
正态分布(高斯分布)
Figure 3: Normal Distribution Shape
图3:正态分布图形
Basic assumptions:
基本假设
•Symmetrical distribution about the mean (bell-shaped curve).
•Commonly used in inferential statistics.
•Family of distributions characterized is by m and s.
 关于平均值的对称分布(钟形曲线)。
 通常用于推断性统计。
 用m和s来表征的一组分布。
Uses include:
•Probabilistic assessments of distribution of time between independent events occurring at a constant rate.
•Mean is the inverse of the Poisson distribution.
•Shape can be used to describe failure rates that are constant as a function of usage.
用途包括:
 独立事件随时间变化以一个固定比率发生的概率评估分布。
 平均值与泊松分布相反。
 图形可以表征不合格率(在函数习惯用法上是常数)。
Exponential Distribution
指数分布
Figure 4:Exponential Distribution Shape
图4:指数分布图形
Basic assumptions:
基本假设
•Family of distributions characterized by its m.
•Distribution of time between independent events occurring at a constant rate.
•Mean is the inverse of the Poisson distribution.
•Shape can be used to describe failure rates that are constant as a function of usage.
 用m表征的一组分布。
 独立事件随时间变化以固定比率发生的分布。
 平均值与泊松分布相反。
 图形可以表征不合格率(在函数习惯用法上是常数)。
Uses include probabilistic assessments of:
包含概率评估的用法:
•Mean time between failure (MTBF).
•Arrival times.
•Time, distance or space between occurrences of the events of interest.
•Queuing or wait-line theories.
 平均故障间隔时间(MTBF)。
 到达次数。
 受关注事件发生的时间、距离和空间的间隔。
 队列或等待线原理。
Lognormal Distribution
对数分布
Figure 5: Lognormal Distribution Shape
图5:对数分布图形
Basic assumptions:
基本假设
•Asymmetrical and positively skewed distribution that is constrained by zero.
•Distribution can exhibit many pdf shapes.
•Describes data that has a large range of values.
•Can be characterized by m and s.
 起于0的不对称和绝对偏斜分布。
 可以显示许多pdf图形的分布。
 描述数据取值范围巨大。
 可以用m和s来表征。
Uses include simulations of:
包含模拟的用途:
•Distribution of wealth.
•Machine downtimes.
•Duration of time.
•Phenomenon that has a positive skew (tails to the right).
 设备停工时间。
 持续时间。
 绝对偏斜(背向右侧)的现象
Weibull Distribution
威布尔分布
Figure 6: Weibull Distribution Pdf
图6:威布尔分布Pdf
Basic assumptions:
基本假设
•Family of distributions.
•Can be used to describe many types of data.
•Fits many common distributions (normal, exponential and lognormal).
•The differing factors are the scale and shape parameters.
 一组分布。
 可用于描述多种类型的数据。
 符合许多常见分布(正态、指数和对数)。
 不同因子是尺度和形状参数。
Uses include:
•Lifetime distributions.
•Reliability applications.
•Failure probabilities that vary over time.
•Can describe burn-in, random, and wear-out phases of a life cycle (bathtub curve).
用途:
 生命周期分布。
 可靠性应用。
 随时变化的失效概率。
 可以描述生命周期中老化、随机的和疲劳阶段(澡盆曲线)
Binomial Distribution
二项分布
Figure 7: Binomial Distribution Shape
图7:二项分布图形
Basic assumptions:
基本假设
•Discrete distribution.
•Number of trials are fixed in advance.
•Just two outcomes for each trial.
•Trials are independent.
•All trials have the same probability of occurrence.
 离散分布。
 测试数量固定。
 独立测试。
 所有测试出现概率相同。
Uses include:
用途
•Estimating the probabilities of an outcome in any set of success or failure trials.
•Sampling for attributes (acceptance sampling).
•Number of defective items in a batch size of n.
•Number of items in a batch.
•Number of items demanded from an inventory.
 评估任何一套在成功或失败测试结果发生的概率。
 抽样特性(接受抽样)。
 一组尺寸n的缺陷项目数量。
 清单里面要求的项目数量。
Geometric
几何
Figure 8: Geometric Distribution Pdf
图8:几何分布Pdf
Basic assumptions:
基本假设
•Discrete distribution.
•Just two outcomes for each trial.
•Trials are independent.
•All trials have the same probability of occurrence.
•Waiting time until the first occurrence.
 离散分布。
 每次测试恰好有两个结果。
 所有测试是独立的。
 所有测试有相同的发生概率。
 直到首次发生的等待时间。
Uses include:
用途
•Number of failures before the first success in a sequence of trials with probability of success p for each trial.
•Number of items inspected before finding the first defective item - for example, the number of interviews performed before finding the first acceptable candidate
 挨次试验在得到首次成功前失败的次数用每次测试成功概率p表示。
 发现首次缺陷项目前检查的项目数量-例如,发现首个可接受的求职者之前进行面试数量。
Negative Binomial
负二项式
Figure 9: Negative Binomial Distribution Pdf
图9:负二项分布Pdf
Basic assumptions:
基本假设
•Discrete distribution.
•Predetermined number of occurrences - s.
•Just two outcomes for each trial.
•Trials are independent.
•All trials have the same probability of occurrence.
 离散分布。
 设定发生的数量-s.
 每次测试恰好有两个结果。
 所有测试是独立的。
 所有的测试有相同的发生概率。
Uses include:
用途
•Number of failures before the sth success in a sequence of trials with probability of success p for each trial.
•Number of good items inspected before finding the sth defective item.
 挨次试验在得到第s次成功前失败的次数用每次测试成功概率p表示。
 在发现第s次缺陷项目前检查的好的项目的次数。
Poisson Distribution
泊松分布
Figure 10: Poisson Distribution Pdf
图10:泊松分布Pdf
Basic assumptions:
基本假设
•Discrete distribution.
•Length of the observation period (or area) is fixed in advance.
•Events occurs at a constant average rate.
•Occurrences are independent.
•Rare event.
 离散分布。
 预先固定观察周期(或区域)的长度。
 事件以一个固定平均比率发生。
 事件独立。
 小概率事件。
Uses include:
用途
•Number of events in an interval of time (or area) when the events are occurring at a constant rate.
•Number of items in a batch of random size.
•Design reliability tests where the failure rate is considered to be constant as a function of usage.
 当事件以固定比率发生时,在时间(或面积)区间内发生的事件数量。
 一批随机尺寸的项目数量。
 设计可靠性测试,此测试是考虑到失效比率固定的一种常用函数。
Hypergeometric
超几何分布
Shape is similar to Binomial/Poisson distribution.
图形与二项/泊松分布相似
Basic assumptions:
基本假设
•Discrete distribution.
•Number of trials are fixed in advance.
•Just two outcomes for each trial.
•Trials are independent.
•Sampling without replacement.
•This is an exact distribution – the Binomial and Poisson are approximations to this.
 离散分布。
 预先固定测试数量。
 每次测试恰好有两个结果。
 所有测试是独立的。
 抽样不放回。
 这是一个精确分布-二项和泊松分布是它的近似值。
Other Distributions
There are other distributions - for example, sampling distributions and X2, t and F distributions.
其它分布
其它分布-例如,抽样分布和X2, t 和F分布。
Summary
Distribution refers to the behavior of a process described by plotting the number of times a variable displays a specific value or range of values rather than by plotting the value itself. It is often said that a picture is worth a thousand words. Viewing data graphically will make a much greater impact to an audience. Becoming familiar with the various distributions can help consultants to better interpret their data.
总结
分布适用于表述过程的变化,测绘一个变量显示为一个特定值或范围发生的次数,而不是测绘数值本身。人们常说,一张照片胜过一千句话。对于读者来说,通过图形观察数据会留下更深刻的印象。合适的各种分布可以帮助顾问更好地演绎数据。
About the Author: J. DeLayne Stroud is a Six Sigma Master Black Belt project manager with DeLeeuw Associates, a division of Conversion Services International. He retired from Bank of America in 2005 with more than 20 years of experience as an executive in project and change management in the banking industry. He has led multiple Design for Six Sigma and Lean initiatives. During his career, Mr. Stroud was a senior project manager in some of the largest mergers and change initiatives in the history of the financial services industry, including former banks such as General Bancshares, Boatmen's Bank, Centerre Bank, Barnett Bank and BankAmerica. He can be reached at mailto:jstroud@deleeuwinc.com .
作家简介:J. DeLayne Stroud是六西格玛黑带大师,DeLeeuw公司的项目经理,他于2005年从美国银行退休,有超过20多年的银行业项目和变更管理领导的经验。他领导设计了多项六西格玛和精益新方案。在他的事业生涯中,Stroud先生是一个资深项目经理,在金融服务行业实现了一些大的合并和变更项目,包括以前的银行如通用 Bancshares银行、Boatmen's银行、巴尼特银行和美洲银行。通过mailto:jstroud@deleeuwinc.com联络他。
没有找到相关结果
已邀请:
3 个回复
wulh (威望:0) (上海 徐汇区) 航空相关 员工
赞同来自:
俺水平较低,还是不掺乎了