第三十四篇 Rediscover an underused probability distribution method
本帖最后由 小编D 于 2012-6-27 16:07 编辑
你好,我是小编H。请对以下文章有校稿兴趣的组员留下你的预计完成时间,并发短信息联系小编H,以便小编登记翻译者信息以及文章最终完成时的奖惩工作。感谢支持翻译组!本文由http://www.6sq.net/space-uid-420937.html 翻译 muddy533 校稿
重新认识一种未被充分利用的概率分布方法
by Lynne B. Hare作者:Lynne B. Hare
The Poisson distribution (pronounced "pwas-son" where the n isspoken through the nose—don’t ask) may be the Rodney Dangerfield of statistics.It doesn’t get the use—and respect—it deserves. Yet, when applied properly, itcan aid the decision-making process considerably. 泊松分布(发音为“pwas-son”,这里n是发鼻音的)是统计学中的Rodney Dangerfield,在这个领域里,它并没有得到应有的重视。然而,如果运用得当,它能够帮助你更好的作出决策。
Here are two examples of its application in the real world. Given these,perhaps you can think of more applications in your own line of work.这里有两个应用实例,通过这些你可能会想到更多能够在你今后工作中应用的方法。
Needles in haystacks大海捞针
For want of preventive maintenance, a screen broke off its frame,disintegrated, mixed with a key product ingredient and then mixed with thefinished product. "We think we got it all," said the quality manager."We ran the finished product over magnets, and we’ve captured enoughpieces to reproduce almost an entire screen. Now, we want to sample to makesure we got it all. How many samples do we need if we want to be 100%certain?" 为了进行预防性维护,一个屏幕要拆除框架、进行结构分解、加上一项关键的产品成分,然后与成品进行组合。“我想这就是全部了”,质量经理如是说:“我们让成品通过磁铁,我们已经获得了足够的零部件细节,我们能够将这台显示器恢复到完好如初。现在,我想以此为例看看我们是否获得了全部细节,如果我们想100%确定的话,我们需要多少样本?”
Well, if love means never having to say you’re sorry, statistics meansnever having to say you’re certain—there’s no such thing as 100% certainty, butyou can get close.好的,如果爱意味着永远不要说抱歉,那么统计就是永远不要说你确定——因为没有100%的确定,但是可以接近。
Suppose the finished product mass is divided into customer offeringquantities, and there are many of these in the offending batch. Define adefective unit as a package containing one or more pieces of screen. If adefective unit is found during sampling, you would conclude the magnets werenot completely effective. How many samples must you take to be persuaded themagnets were effective?假设将量产成品按照客户要求的数量进行分类,那么这会涉及到很多批次。定义一个缺陷单元作为一个检测包,其中包含的一个或多个显示器的零部件。如果在一个样本中发现一个缺陷单元,你就能得出这样的结论:磁铁并不完全有效。到底需要多少样本才能让人相信你磁铁检测是有效的呢?
A useful model to answer this question is the Poisson distribution.Formally, it is:回答这个问题的一个有效的模型就是泊松分布,关系式如下:
in which e is the base of the natural logarithm (e = 2.718), λ isthe distribution mean or expected value (typically estimated by the product of n,the number of samples, and p, the estimate of the proportion defective),and x is the number of screen pieces found.
这里e是自然对数的底数(e = 2.718), λ是泊松分布的平均值或期望值(一般的评价n个产品,n就是样本数;p是不良率),x是检出显示器被拆分的零部件数量。
Because you would conclude the magnets were not fully effective if youfound one defective unit, x in the earlier model is set to zero, and theequation reduces to:
因为如果发现缺陷单元就要得出磁铁检测并不完全有效的结论,在早期的模型中x设定为0,这样等式化简为:
in which P is the probability you would accept or release theproduct to the marketplace.
这里P是产品投放市场时你能接受或需要减少的概率。
For example, if the actual defect rate were 1% (p = 0.01), meaningthat 1% of the finished packages contained at least one piece of the screen,and 100 samples were taken, then np = 1 and P = 0.368. This isthe chance the batch would be incorrectly released to the marketplace.
举例来说,如果真实不良率是1%(p = 0.01),也就是说1%的完成品检测包中至少包含一个不良零部件,样本数量是100的话,那么 np =1, P = 0.368。这就是这一批次产品因错误放行而流入市场造成不良的几率。
You might not be comfortable with that high of a risk. If, instead of0.368, you wanted to fix the risk at a small number such as 0.05, then you cancalculate the corresponding sample size by solving the previous equation for n:
看到这么高的风险你可能会觉得不舒服。如果想替换掉0.368,你想让风险系数降低到比如说0.05,那么你能通过先前的等式算出正确样本的数量,n:
Then substitute 0.05, the risk, for P. If the actual defect ratewere 1%, the sample size required to detect it with only a 5% chance of erroris
然后将风险系数0.05作为概率值P代入到等式中,如果真实不良率仍为1%,不良概率为5%的检测样本所需的数量为:
As you can see, the choice of sample size depends on the risk ofaccepting, or releasing to the marketplace, a defective rate of a certain proportion.How are those numbers chosen? They depend on the associated costs and risks.
如你所看到的,选择样本的大小要由能够接受或放行到市场的风险和一定比率的不良率决定。这些因素怎么选择?那就要看费用和风险的综合结果
What is the cost—in terms of negative publicity and consumer alienation—ofreleasing defective product to the market? Are there health and safetyconcerns? Conversely, what is the cost of destroying the batch of product? Thedecision of sample size is not a statistical one, but rather one of choiceunderpinned by the statistical model.
从负面宣传和消费矛盾角度来看,什么是放行不良产品到市场的费用?是健康或安全角度么?相反的,什么是销毁一批产品的费用呢?这样的样本大小的定义并不单纯是一个统计结果了,而是由统计模型支撑的选择。
It can be helpful to display multiple choices so the decision maker canexamine the pros and cons of alternative sampling plans. This can be donethrough the use of curves that show, for fixed risk (P), the samplesizes corresponding to hypothetical true defect rates. Sample sizes growrapidly as the desired proportion defective to be detected decreases. Thecurves in Figure 1 relate to the situation in which no defective units arepermitted for batch acceptance. Similar curves can be drawn for planscorresponding to other acceptance numbers.
提供多重选择是更有用的,这样决策者就能从正反两方面来考量可替换的样本计划,通过下面的曲线就能够进行这种判断,对于固定的风险概率P,样本大小就由假设的真实不良率决定。当实际检出的不良率比预计的不良比例有所下降时,样本的数量会急速增加。图1中曲线所指出的情况为:批次投料时,不允许出现不良单元。期望风险值不同会出现形状类似的一系列曲线。
基于风险和不良比率的样本大小/ 图1
**
Gooey raisins粘稠的葡萄干**
A process engineer was given responsibility to devise a method ofdepositing a sugar slurry containing raisins on a breakfast confection. Thenumber of raisins was low relative to the mass of the entire slurry.
一位工艺工程师负责开发一种工艺方法,这种方法是要将含有葡萄干的糖浆涂在早餐甜点上,对于糖浆来说,葡萄干的数量非常少。
The specific gravity of the slurry matched that of the raisins as closelyas he could get it, so he was dismayed when the depositor failed to placeexactly two raisins on each confection. His conclusion was lack of thoroughmixing. Yet, repeated efforts to improve the mixing process failed.
他尽最大可能将葡萄干和糖浆按特殊的比重进行配比,但是却不能准确的在每块甜点上放置2枚葡萄干,这让他非常沮丧,他得出结论是混合不够均匀。然而虽然几经努力实验还是不能成功。
What was going wrong? The Poisson distribution comes to the rescue.Suppose the number of raisins in the slurry tank is such that, on average, hemight expect two raisins on each confection. What distribution of raisins mighthe expect to see due to chance variation?
到底是哪出错了?泊松分布来排忧解难了。假设在混浆罐里的葡萄干数量处于平均水平,也就是他所期望的每块甜点放2枚的水平,他能期待的葡萄干处于哪种分布状态要依赖于机会变动。
Recall the Poisson density function:
再回想一下泊松分布的公式:
Here, we have l, the expected mean, equal to two raisins per confection.If you want to know what percentage of confections will have exactly tworaisins (x = 2) under perfect mixing, calculate:
这里,如果期望的平均值是1,等于每个甜点上涂有2枚葡萄干。如果你想知道在理想混合状态下有多少甜点的表面会准确涂有2枚葡萄干(x=2),计算如下:
This means that about 27.1% of the confections will have exactly tworaisins deposited on them.这也就是说,大概有27.1%的甜点上会准确的涂有2枚葡萄干。
Table 1 shows the full distribution of raisins under perfect mixing.Notice there will be as many confections with one raisin as there are with two(27.1% each)。 No raisins will appear at all in 13.5% of the confections, andabout one-third of the confections will have more than two raisins. Almost 5%will have five or six raisins, and confections with seven or more raisins willbe very rare.表1中给出的是在理想混合状态下葡萄干的完全分布数据。注意这里只有涂1枚葡萄干的甜点的比例与2枚的一样(27.1%),没有葡萄干的甜点比率为13.5%,还有另外1/3的甜点上涂葡萄干的数量多于2枚,还有将近5%的甜点上会有5~6 枚葡萄干,涂有7枚葡萄干的甜点就会非常少了。
葡萄干分布表/ 表1
The conclusion? If marketplace viability depends on having exactly tworaisins per confection, a different process will be needed, so you better rig adevice that places raisins separately from the slurry.那么结论呢?如果市场变动要求每只甜点上准确涂有2枚葡萄干的,那么就需要进行工艺改动了,所以你最好能有个装置可以把葡萄干分散的混在糖浆里。
If you have gotten this far, thanks. Right now, you are probably thinkingabout situations in which you might use the Poisson distribution or you mighthave if you had only thought if it. 如果你都理解了,谢谢。现在,你可能会想在什么情况下可以使用泊松分布或者需要用泊松分布去考虑问题。
In general, the use of the Poisson distribution for this kind of problemis only valid if the number of incidents (screen pieces in the first exampleand raisins in the second) is low relative to the overall mass. There are a fewother assumptions you can find in your favorite statistics book.一般来说,如果是偶然事件的数据,跟整体量产的关联性不大,对于这种问题而言,那么泊松分布是唯一有效的分析方法(第一个例子中的显示器部件和第二个例子中的葡萄干)。从你感兴趣的统计学书中你还能找到一些其他的假设。
The Poisson model is a useful component in your bag of tricks.泊松分布的模型是你魔术袋里的一个非常有用的东西。
Lynne B. Hare is a statisticalconsultant. He holds a doctorate in statistics from RutgersUniversity in New Brunswick, NJ.He is a past chairman of the ASQ Statistics Division and a fellow of ASQ andthe American Statistical Association. Lynne B. Hare是一名统计学顾问,他获得了纽不伦瑞克省(加拿大)罗格斯大学统计学博士学位,他曾做过ASQ统计司的前任主席,他还是ASQ和美国统计协会的会员。
你好,我是小编H。请对以下文章有校稿兴趣的组员留下你的预计完成时间,并发短信息联系小编H,以便小编登记翻译者信息以及文章最终完成时的奖惩工作。感谢支持翻译组!本文由http://www.6sq.net/space-uid-420937.html 翻译 muddy533 校稿
重新认识一种未被充分利用的概率分布方法
by Lynne B. Hare作者:Lynne B. Hare
The Poisson distribution (pronounced "pwas-son" where the n isspoken through the nose—don’t ask) may be the Rodney Dangerfield of statistics.It doesn’t get the use—and respect—it deserves. Yet, when applied properly, itcan aid the decision-making process considerably. 泊松分布(发音为“pwas-son”,这里n是发鼻音的)是统计学中的Rodney Dangerfield,在这个领域里,它并没有得到应有的重视。然而,如果运用得当,它能够帮助你更好的作出决策。
Here are two examples of its application in the real world. Given these,perhaps you can think of more applications in your own line of work.这里有两个应用实例,通过这些你可能会想到更多能够在你今后工作中应用的方法。
Needles in haystacks大海捞针
For want of preventive maintenance, a screen broke off its frame,disintegrated, mixed with a key product ingredient and then mixed with thefinished product. "We think we got it all," said the quality manager."We ran the finished product over magnets, and we’ve captured enoughpieces to reproduce almost an entire screen. Now, we want to sample to makesure we got it all. How many samples do we need if we want to be 100%certain?" 为了进行预防性维护,一个屏幕要拆除框架、进行结构分解、加上一项关键的产品成分,然后与成品进行组合。“我想这就是全部了”,质量经理如是说:“我们让成品通过磁铁,我们已经获得了足够的零部件细节,我们能够将这台显示器恢复到完好如初。现在,我想以此为例看看我们是否获得了全部细节,如果我们想100%确定的话,我们需要多少样本?”
Well, if love means never having to say you’re sorry, statistics meansnever having to say you’re certain—there’s no such thing as 100% certainty, butyou can get close.好的,如果爱意味着永远不要说抱歉,那么统计就是永远不要说你确定——因为没有100%的确定,但是可以接近。
Suppose the finished product mass is divided into customer offeringquantities, and there are many of these in the offending batch. Define adefective unit as a package containing one or more pieces of screen. If adefective unit is found during sampling, you would conclude the magnets werenot completely effective. How many samples must you take to be persuaded themagnets were effective?假设将量产成品按照客户要求的数量进行分类,那么这会涉及到很多批次。定义一个缺陷单元作为一个检测包,其中包含的一个或多个显示器的零部件。如果在一个样本中发现一个缺陷单元,你就能得出这样的结论:磁铁并不完全有效。到底需要多少样本才能让人相信你磁铁检测是有效的呢?
A useful model to answer this question is the Poisson distribution.Formally, it is:回答这个问题的一个有效的模型就是泊松分布,关系式如下:
in which e is the base of the natural logarithm (e = 2.718), λ isthe distribution mean or expected value (typically estimated by the product of n,the number of samples, and p, the estimate of the proportion defective),and x is the number of screen pieces found.
这里e是自然对数的底数(e = 2.718), λ是泊松分布的平均值或期望值(一般的评价n个产品,n就是样本数;p是不良率),x是检出显示器被拆分的零部件数量。
Because you would conclude the magnets were not fully effective if youfound one defective unit, x in the earlier model is set to zero, and theequation reduces to:
因为如果发现缺陷单元就要得出磁铁检测并不完全有效的结论,在早期的模型中x设定为0,这样等式化简为:
in which P is the probability you would accept or release theproduct to the marketplace.
这里P是产品投放市场时你能接受或需要减少的概率。
For example, if the actual defect rate were 1% (p = 0.01), meaningthat 1% of the finished packages contained at least one piece of the screen,and 100 samples were taken, then np = 1 and P = 0.368. This isthe chance the batch would be incorrectly released to the marketplace.
举例来说,如果真实不良率是1%(p = 0.01),也就是说1%的完成品检测包中至少包含一个不良零部件,样本数量是100的话,那么 np =1, P = 0.368。这就是这一批次产品因错误放行而流入市场造成不良的几率。
You might not be comfortable with that high of a risk. If, instead of0.368, you wanted to fix the risk at a small number such as 0.05, then you cancalculate the corresponding sample size by solving the previous equation for n:
看到这么高的风险你可能会觉得不舒服。如果想替换掉0.368,你想让风险系数降低到比如说0.05,那么你能通过先前的等式算出正确样本的数量,n:
Then substitute 0.05, the risk, for P. If the actual defect ratewere 1%, the sample size required to detect it with only a 5% chance of erroris
然后将风险系数0.05作为概率值P代入到等式中,如果真实不良率仍为1%,不良概率为5%的检测样本所需的数量为:
As you can see, the choice of sample size depends on the risk ofaccepting, or releasing to the marketplace, a defective rate of a certain proportion.How are those numbers chosen? They depend on the associated costs and risks.
如你所看到的,选择样本的大小要由能够接受或放行到市场的风险和一定比率的不良率决定。这些因素怎么选择?那就要看费用和风险的综合结果
What is the cost—in terms of negative publicity and consumer alienation—ofreleasing defective product to the market? Are there health and safetyconcerns? Conversely, what is the cost of destroying the batch of product? Thedecision of sample size is not a statistical one, but rather one of choiceunderpinned by the statistical model.
从负面宣传和消费矛盾角度来看,什么是放行不良产品到市场的费用?是健康或安全角度么?相反的,什么是销毁一批产品的费用呢?这样的样本大小的定义并不单纯是一个统计结果了,而是由统计模型支撑的选择。
It can be helpful to display multiple choices so the decision maker canexamine the pros and cons of alternative sampling plans. This can be donethrough the use of curves that show, for fixed risk (P), the samplesizes corresponding to hypothetical true defect rates. Sample sizes growrapidly as the desired proportion defective to be detected decreases. Thecurves in Figure 1 relate to the situation in which no defective units arepermitted for batch acceptance. Similar curves can be drawn for planscorresponding to other acceptance numbers.
提供多重选择是更有用的,这样决策者就能从正反两方面来考量可替换的样本计划,通过下面的曲线就能够进行这种判断,对于固定的风险概率P,样本大小就由假设的真实不良率决定。当实际检出的不良率比预计的不良比例有所下降时,样本的数量会急速增加。图1中曲线所指出的情况为:批次投料时,不允许出现不良单元。期望风险值不同会出现形状类似的一系列曲线。
基于风险和不良比率的样本大小/ 图1
**
Gooey raisins粘稠的葡萄干**
A process engineer was given responsibility to devise a method ofdepositing a sugar slurry containing raisins on a breakfast confection. Thenumber of raisins was low relative to the mass of the entire slurry.
一位工艺工程师负责开发一种工艺方法,这种方法是要将含有葡萄干的糖浆涂在早餐甜点上,对于糖浆来说,葡萄干的数量非常少。
The specific gravity of the slurry matched that of the raisins as closelyas he could get it, so he was dismayed when the depositor failed to placeexactly two raisins on each confection. His conclusion was lack of thoroughmixing. Yet, repeated efforts to improve the mixing process failed.
他尽最大可能将葡萄干和糖浆按特殊的比重进行配比,但是却不能准确的在每块甜点上放置2枚葡萄干,这让他非常沮丧,他得出结论是混合不够均匀。然而虽然几经努力实验还是不能成功。
What was going wrong? The Poisson distribution comes to the rescue.Suppose the number of raisins in the slurry tank is such that, on average, hemight expect two raisins on each confection. What distribution of raisins mighthe expect to see due to chance variation?
到底是哪出错了?泊松分布来排忧解难了。假设在混浆罐里的葡萄干数量处于平均水平,也就是他所期望的每块甜点放2枚的水平,他能期待的葡萄干处于哪种分布状态要依赖于机会变动。
Recall the Poisson density function:
再回想一下泊松分布的公式:
Here, we have l, the expected mean, equal to two raisins per confection.If you want to know what percentage of confections will have exactly tworaisins (x = 2) under perfect mixing, calculate:
这里,如果期望的平均值是1,等于每个甜点上涂有2枚葡萄干。如果你想知道在理想混合状态下有多少甜点的表面会准确涂有2枚葡萄干(x=2),计算如下:
This means that about 27.1% of the confections will have exactly tworaisins deposited on them.这也就是说,大概有27.1%的甜点上会准确的涂有2枚葡萄干。
Table 1 shows the full distribution of raisins under perfect mixing.Notice there will be as many confections with one raisin as there are with two(27.1% each)。 No raisins will appear at all in 13.5% of the confections, andabout one-third of the confections will have more than two raisins. Almost 5%will have five or six raisins, and confections with seven or more raisins willbe very rare.表1中给出的是在理想混合状态下葡萄干的完全分布数据。注意这里只有涂1枚葡萄干的甜点的比例与2枚的一样(27.1%),没有葡萄干的甜点比率为13.5%,还有另外1/3的甜点上涂葡萄干的数量多于2枚,还有将近5%的甜点上会有5~6 枚葡萄干,涂有7枚葡萄干的甜点就会非常少了。
葡萄干分布表/ 表1
The conclusion? If marketplace viability depends on having exactly tworaisins per confection, a different process will be needed, so you better rig adevice that places raisins separately from the slurry.那么结论呢?如果市场变动要求每只甜点上准确涂有2枚葡萄干的,那么就需要进行工艺改动了,所以你最好能有个装置可以把葡萄干分散的混在糖浆里。
If you have gotten this far, thanks. Right now, you are probably thinkingabout situations in which you might use the Poisson distribution or you mighthave if you had only thought if it. 如果你都理解了,谢谢。现在,你可能会想在什么情况下可以使用泊松分布或者需要用泊松分布去考虑问题。
In general, the use of the Poisson distribution for this kind of problemis only valid if the number of incidents (screen pieces in the first exampleand raisins in the second) is low relative to the overall mass. There are a fewother assumptions you can find in your favorite statistics book.一般来说,如果是偶然事件的数据,跟整体量产的关联性不大,对于这种问题而言,那么泊松分布是唯一有效的分析方法(第一个例子中的显示器部件和第二个例子中的葡萄干)。从你感兴趣的统计学书中你还能找到一些其他的假设。
The Poisson model is a useful component in your bag of tricks.泊松分布的模型是你魔术袋里的一个非常有用的东西。
Lynne B. Hare is a statisticalconsultant. He holds a doctorate in statistics from RutgersUniversity in New Brunswick, NJ.He is a past chairman of the ASQ Statistics Division and a fellow of ASQ andthe American Statistical Association. Lynne B. Hare是一名统计学顾问,他获得了纽不伦瑞克省(加拿大)罗格斯大学统计学博士学位,他曾做过ASQ统计司的前任主席,他还是ASQ和美国统计协会的会员。
没有找到相关结果
已邀请:
5 个回复
小编H (威望:4) (广东 广州) 互联网 员工
赞同来自: