【完成】第二十五篇 Don’t accept ambiguity; insist on ‘absolute’ information
本帖最后由 小编H 于 2011-8-10 11:21 编辑 _
你好,我是小编H。请对以下文章有疑问或者建议者请跟帖回复,或者发短信息联系小编H,以便小编登记翻译者信息以及文章最终完成时的奖惩工作。
本文由http://www.6sq.net/space-uid-107973.html 翻译 校稿者:http://www.6sq.net/space-uid-301206.html
Don’t accept ambiguity; insist on ‘absolute’ information拒绝含糊不清,坚持“绝对的”信息
by Christine M. Anderson-Cook
克里斯廷M.安德森-库克编著
We live in an age in which media and marketing often spin data and end up misleading and misinforming the consumer. Consider these recent headlines that left some obvious questions unanswered:
我们生活在一个媒体及销售经常杜撰数据并最终误导消费者的时代,最近的大字标题留下了一些明显的未解决的问题:
"Foreclosure auctions drop more than 30%." From when to when? What is the rate now? Is this usually a volatile rate that naturally fluctuates quite a bit?
“抵押品拍卖下跌了超过30%” 从什么时候到什么时候?现在的比率是多少?这是不是一个通常自然波动的易变比率?
"South Carolina’s unemployment rate shot from 5.5% in February 2008 to 12.5% last January." The accompanying article described how the rate fluctuated in 2010 and was 10.7% in November. Why was this window of time selected?
“南卡罗来纳的失业率从2008年2月的5.5%激增到去年1月的12.5%”,随附的文章描述2010年比率如何波动并且在11月份是10.7%。为什么选择这个时间段?
A lot of advertisers and news outlets seek to sensationalize their messages to catch our attention. Often, the way information is communicated in the workplace seems to have borrowed from media and marketing’s approach.
很多广告商及新闻媒体试图使他们的信息引起轰动来吸引我们的注意力。大多数情况下,这种工作场所中被传达的信息似乎已经被媒体和营销方式虚构了。
Recently, I have been resensitized to these misleading practices of data presentation by reading books by Gerd Gigerenzer1 and Donald J. Wheeler.2 Both authors highlight the importance of presenting data in a non-deceptive format, which refrains from prejudicing the audience, giving sufficient information for the assessment or decision to be made independently.
最近,我通过读戈尔德.吉格瑞泽和唐纳德J.威乐的书籍已经体会了一些由于数据表达产生误导的实例。这两位作者都强调用一种不被误解的格式表示信息的重要性,这能戒除读者产生偏见,为评价或者独立作出决定给出充足的信息。
Their ideas build on the work of Edward Tufte.3–6 Key messages are:
他们的观点建立在爱德华塔夫特的工作基础之上,关键的思想是:
Give the raw data in its natural form (absolute summaries), which is strongly preferred to relative comparisons from one observation to another.
以其自然的形式(绝对的概括)给出原始数据,(这种自然形式)指的是两个观察对象的相对比较。
Provide sufficient historical data to allow realistic evaluation of the recent changes, taking into account natural fluctuation and previous trends.
提供充足的历史数据来给予近期变化实际可行的评估,考虑自然的波动和以前的趋势。
Include a quantification of measurement uncertainty with the point estimate if there is uncertainty with an estimated quantity.
如果一个待估计量存在不确定性,要包括一个测量不确定性的量化的点估计。
Consider the following three examples of deceptive or ambiguous statements that illustrate how we might adapt from soundbites and headlines to refocus data presentation to be maximally informative and minimally deceptive:
仔细考虑下面三个易被误解或陈述含糊不清的例子,它阐明我们要怎样适应从话语片段或标题中重新审视数据报告,以使信息最大化以及误导最小化:
‘Production is up 10%’ 产量提高10%
This sounds catchy and impressive, but should this result get you excited? Gigerenzer highlights how the human mind is naturally predisposed to filling in missing information to give the statement context to make it understandable. Without additional details, you are unable to know if this is an important fact.
这听起来易记而且给人印象深刻,但是结果将会使你兴奋吗?吉格瑞泽强调人类的思想是怎样自然地倾向于填充丢失的信息来给出陈述的语境而使意思更易懂。如果没有额外的详细的资料,你不可能知道这是否是一个重要的事实。
What other key information should be provided for you to make an enlightened assessment of this statement?
为使你对这种陈述做出一个有见识的评价,应该为你提供其他什么样的信息呢?
First, you need to know the comparative time period you are relating this interval to: Are you looking at this month’s production compared with last month? Compared with this month last year? Compared with average production in this month for the last 10 years?
首先,你需要知道和这段时间间隔相关的比较期:你在看这个月和上个月的产量对比吗?和去年这个月的产量比较了吗?和过去10年中这个月份的平均产量对比了吗?
Second, when the comparison is based on a single previous time period, it is helpful to acknowledge the natural variation between observations. Figure 1 shows four different plots with a 10% change in production from the last month to this month.
其次,当这种比较是建立在一个以前的单独的时期时,它对确认两个观察值之间的自然变化是有帮助的。 图表1展示了产量从上月到这月变化10%的四个不同的情形。
图表1
In all cases except the first situation (A), we are unlikely to think this change is indicative of real change in production. If last month’s observation represented a 13% drop in production from the previous month (B), then you are likely to be less impressed with this month’s increase.
在除了第一种情况(A)的其他案例里面,我们不太可能认为这种变化是产量真正变化的象征。如果最后月份的观察数据代表产量从先前月份(B)一个13%的下跌,那么你有可能会对这个月的增长印象很少。
Similarly, if there is a seasonal trend (C), the increase in production might coincide with the regular annual pattern, and you would likely be better informed by looking at production compared to the average for this month in other years. Finally, (D) shows a high variability process in which fluctuations of 10% are not unexpected, and you should likely react only when the change falls outside the range of the natural variation of the process.
类似地,如果有一个季节趋势(C),产量的增长可能符合规律的年度变化趋势,你有可能通过观察与其他年度这个月份的平均值的对比对情况有个更好的了解。 最后,(D)展示了一个高度变化的过程,在这个过程里有10%的波动不是意外,你有可能只在变化超出过程自然变化范围之外时才有反应。
Your interpretation of the 10% change is very much a function of understanding the pattern of change in recent times. To assess the importance of this change, it would be ideal for the comparison to be made relative to similar months of data (for instance, the average of months with similar seasonal patterns for several years) and with the associated uncertainty of production appropriately characterized.
你关于10%变化的解释对于理解近期变化的形式起到很大作用。为了评估这种变化的重要性,有关类似月份的数据的比较以及相对于产量特性的不确定性的比较(比如,几年有着相似季节模式月份的平均值)是理想的。
Also, a simple time series plot with enumerated scale shown on the y-axis—and with sufficient history to capture seasonality—is an effective summary to provide a compact and suitable context for interpretation. The inclusion of the actual production numbers and recent history fills in the necessary details, and allows the audience to decide for itself if the change should be considered unusual and important.
同样,一个简单的时间序列图在y轴显示的范围——有充足的历史数据来捕捉季节性——是一个为解释说明提供合适环境的有效概括。 实际产量数据的包含内容和近期的历史数据填满了必要的细节,并且允许读者自己来决定是否应该考虑罕有的和重要的变化。
‘Defect rate doubled last quarter’ 上个季度缺陷率翻倍
Defect rates are typically estimated by sampling from production throughout the time interval. The defect rate change is given relative to a previous time period, but because defect rates across different production environments vary considerably, it is more critical to understand the true defect rates to assess the practical importance of this change.
缺陷率一般通过从一段时间间隔的产品抽样中估计。缺陷率的变化是相对于前一段时间而言的,但是由于缺陷率根据不同的生产环境而变化,所以理解缺陷率真正的涵义对于评价这种变化的实际重要性是至关重要的。
If your focus is on the yield of the process, a defect rate change from one in 50 to two in 50 might have a much higher impact than a defect rate change from one in 100,000 to two in 100,000. If your focus is on safety, any change in defect rate might be considered quite important.
如果你的焦点在过程的产量,那么缺陷率从1/50变化到2/50的比缺陷率从1/100,000变化到2/100,000有更大的影响。如果你的焦点是在安全上,那么在缺陷率上的任何变化可能会被认为相当重要。
Depending on the sampling rate and the cost of testing, the uncertainty associated with the estimates of defect rates can vary substantially. If the point estimate for defect rate doubled but remained within a 95% uncertainty interval for the rate (for instance, 0.002 +/– 0.002 for the previous quarter to 0.004 +/– 0.0025 for the current quarter), it is possible the nature of the sampling procedure might explain a large portion of the observed change.
根据抽样比率和测试成本,与估计缺陷率相关联的不确定性可能变化相当大。如果缺陷率的点估计翻倍但是比率保持在95%的不确定区间(比如,以前季度为0.002 +/– 0.002,当前季度为0.004 +/– 0.0025),抽样程序本身可能就能解释一大部分实测的变化。
But if the associated uncertainty is much smaller (for instance, 0.002 +/– 0.0005 for the previous quarter to 0.004 +/– 0.0005 for the current quarter), the observed change in rates is unlikely to be explained by the sampling process and likely is due to a real change in the defect rate. It is also important, however, to consider the practical importance of the observed change.
但是如果相关联的不确定性很小(比如,以前季度为0.002 +/– 0.0005,当前季度为0.004 +/– 0.0005),那么在利率上实测的变化不太可能被抽样过程解释,而是由于缺陷率真正的变化。然而,考虑实测变化的实际重要性也是很重要的。
To clear up this case, show the absolute defect rate with the associated uncertainty of the estimate for the comparison quarter and the new quarter. This helps to calibrate the absolute change and the importance of the change given the intended use of the parts.
为了处理这种情况,我们用估计比较季度和新季度的相关不确定性来说明绝对缺陷率。这帮助我们使绝对的变化以及零件既定用途变化的重要性
In addition, a summary plot of recent trends in the defect rate using a time series plot with included uncertainty will help assess the longer–term trend, as well as the natural fluctuations in estimates given the sampling and testing procedure.
另外,一张用包括不确定性的时间序列图来总结缺陷率最近趋势的图表将帮助我们评估长期趋势,同时评价给定的抽样和测试程序方面的自然波动。
‘A 10% temperature increase gave a 15% yield increase’
温度增加10%引起产量增加15%
The final example illustrates the importance of understanding units and how reporting the absolute numbers, rather then relative change, can improve interpretability. The data that led to this headline originated from a laboratory study in which different production environments were considered.
最后一个例子阐述了理解单位的重要性以及怎样报告绝对的数字,而不是相对的变化,这样将提高说服力。标题中的数据来源于一个实验室研究,这个研究考虑了不同的生产环境。
The default production temperature was 100°F, and it was found that a change to 110°F (the 10% increase) produced the observed increase in yield from 72% to 82.8%.
默认的生产温度是100华氏度,并且发现温度变化到110华氏度(10%的增加)时产出从72%增加到82.8%。
Given the actual numbers, you could formulate a number of alternative headlines, which all appear to characterize the results but are similarly lacking in real information. You could use degrees Celsius (100°F = 37.8°C and 110°F = 43.3°C, giving a 14.6% increase) or report the defect rate (72% yield↔28% defect rate and 82.8% yield↔17.2% defect rate, giving a 38.6% reduction in defects
如果给了真实的数据,你可能会构想出一些可供选择的标题,这些标题似乎描绘了结果但是同样缺乏真实的信息。你可以使用摄氏度(100华氏度=37.8摄氏度,110华氏度=43.3摄氏度,引起14.6%的增加)或者报告缺陷率(72%的产出↔28%的缺陷率,82.8%的产出↔17.2%的缺陷率,引起缺陷38.6%的减少)。
Hence, the same absolute results could translate into any of the following misleading or incomplete headlines:
因此,相同的绝对的结果可能会转换成下面任何一种误导或者不完全的标题:
A 14.6% increase in temperature (C) gave a 15% increase in yield.
温度(摄氏度)增加14.6%引起产出增加15%。
A 10% increase in temperature (F) gave a 38.6% reduction in defects.
温度(华氏度)增加10%引起缺陷减少38.6%。
A 14.6% increase in temperature (C) gave a 38.6% reduction in defects, in addition to the original option.
温度(摄氏度)增加14.6%引起缺陷减少38.6%,不包括原来的配件。
Clearly, the percentage changes are highly dependent on the summary chosen and give different impressions of the study’s results. There are several other important errors in this headline.
显然,变化的比例高度依赖选择的总体并且给研究的结果带来不同的印象。在这个标题上有几处其它重大的错误。
First, the percentage increase of temperature is actually meaningless. Percentages assume the zero on the scale corresponds to something absolute. Here, 0°C or 0°F are relatively arbitrary and do not represent a starting point for the scale against which percentage changes can be sensibly measured.
首先,温度增加的百分比实际上毫无意义。百分比假设零点在相当于某种绝对事物的程度上。在这里,0摄氏度和0华氏度相对主观,而且不能代表一个程度的起始点,相反这个变化百分比能够被很明显地测量。
Perhaps even more misleading is the idea that a temperature change is in any way comparable to a change in yield. It might make sense to compare a change in input costs of production (how much does it cost to raise the temperature from 100°F to 110°F) against change in output yield, but the headline is a classic apples–to–oranges comparison that lacks intrinsic meaning.
或许更多的误解是那种温度变化同产量变化类似的思想。或许比较产品输入成本的变化(将温度从100华氏度提高到110华氏度花费多少成本)与输出产量的变化更为合理。但是本标题是一个典型的“苹果到桔子”的对比,缺少了内在的含义。
Complete and self contained
完全且独立的(信息)
There is no substitute for providing complete information on the absolute scale—it allows the audience to directly assess the context and importance of the information. Providing a graphical or numerical summary of recent history is also valuable for enhancing the context and incorporating a measure of natural variation. When the quantities of interest are obtained by estimation, the uncertainty associated with this should be included as well.
没有能在绝对的程度上提供完全信息的替代物——它允许读者直接评定环境及信息的重要性。再这样的条件下,最近历史的绘图或者数据概括对于加强语境及自然变化的测量也是有价值的。当通过判断获得了大量的利益时,与此相关的不确定性也将被包含在内。
While the catchy headlines and relative summaries have the opportunity to be attention–grabbers, statisticians and those who report data–based results should resist these tactics and provide a complete self–contained summary that includes all of the key information from which to make an informed decision.
尽管易记的标题和相对的概要有被注意的机会——数据采集者、统计员和那些报告数据的人们——基本的结果应该不受这些策略的影响而且能够提供一个完全独立的概要,并通过它所包含的所有关键信息来做一个明智的决定。
References 参考文献
Gerd Gigerenzer, _Calculated Risk: How to Know When Numbers Deceive You_, Simon & Schuster, 2002.
戈尔德.吉格瑞泽,《风险计算:怎样知道数字何时欺骗你》,西蒙和舒斯特,2002年。
Donald J. Wheeler, _Understanding Variation: The Key to Managing Chaos,_ SPC Press, 2000.
唐纳德J.威乐,《了解变化:管理混乱的关键》,SPC出版社,2000年。
Edward Tufte, _The Visual Display of Quantitative Information, _Graphics Press 2001.
爱德华.塔夫特,《大量信息的视觉展示》,Graphics出版社,2001年。
Edward Tufte, _Envisioning Information,_ Graphics Press, 1990.
爱德华.塔夫特,《想象信息》,Graphics出版社,1990年。
Edward Tufte, _Visual Explanations: Images and Quantities, Evident and Narrative, _Graphics Press, 1997.
爱德华.塔夫特,《视觉解释:形象和数量,明显的和叙述的》,Graphics出版社,1997年。
Edward Tufte, _Beautiful Evidence, _Graphics Press, 2006.
爱德华.塔夫特,《极好的证据》,Graphics出版社,2006年。
Christine M Anderson-Cook is a research scientist at Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario. Anderson-Cook is a fellow of the American Statistical Association and a senior member of ASQ.
克里斯廷M.安德森-库克是一位洛杉矶国家实验室的研究科学家,她在美国安大略湖滑铁卢大学获得了统计学博士学位。安德森-库克是一位美国统计协会会员,而且是ASQ资深会员。
你好,我是小编H。请对以下文章有疑问或者建议者请跟帖回复,或者发短信息联系小编H,以便小编登记翻译者信息以及文章最终完成时的奖惩工作。
本文由http://www.6sq.net/space-uid-107973.html 翻译 校稿者:http://www.6sq.net/space-uid-301206.html
Don’t accept ambiguity; insist on ‘absolute’ information拒绝含糊不清,坚持“绝对的”信息
by Christine M. Anderson-Cook
克里斯廷M.安德森-库克编著
We live in an age in which media and marketing often spin data and end up misleading and misinforming the consumer. Consider these recent headlines that left some obvious questions unanswered:
我们生活在一个媒体及销售经常杜撰数据并最终误导消费者的时代,最近的大字标题留下了一些明显的未解决的问题:
"Foreclosure auctions drop more than 30%." From when to when? What is the rate now? Is this usually a volatile rate that naturally fluctuates quite a bit?
“抵押品拍卖下跌了超过30%” 从什么时候到什么时候?现在的比率是多少?这是不是一个通常自然波动的易变比率?
"South Carolina’s unemployment rate shot from 5.5% in February 2008 to 12.5% last January." The accompanying article described how the rate fluctuated in 2010 and was 10.7% in November. Why was this window of time selected?
“南卡罗来纳的失业率从2008年2月的5.5%激增到去年1月的12.5%”,随附的文章描述2010年比率如何波动并且在11月份是10.7%。为什么选择这个时间段?
A lot of advertisers and news outlets seek to sensationalize their messages to catch our attention. Often, the way information is communicated in the workplace seems to have borrowed from media and marketing’s approach.
很多广告商及新闻媒体试图使他们的信息引起轰动来吸引我们的注意力。大多数情况下,这种工作场所中被传达的信息似乎已经被媒体和营销方式虚构了。
Recently, I have been resensitized to these misleading practices of data presentation by reading books by Gerd Gigerenzer1 and Donald J. Wheeler.2 Both authors highlight the importance of presenting data in a non-deceptive format, which refrains from prejudicing the audience, giving sufficient information for the assessment or decision to be made independently.
最近,我通过读戈尔德.吉格瑞泽和唐纳德J.威乐的书籍已经体会了一些由于数据表达产生误导的实例。这两位作者都强调用一种不被误解的格式表示信息的重要性,这能戒除读者产生偏见,为评价或者独立作出决定给出充足的信息。
Their ideas build on the work of Edward Tufte.3–6 Key messages are:
他们的观点建立在爱德华塔夫特的工作基础之上,关键的思想是:
Give the raw data in its natural form (absolute summaries), which is strongly preferred to relative comparisons from one observation to another.
以其自然的形式(绝对的概括)给出原始数据,(这种自然形式)指的是两个观察对象的相对比较。
Provide sufficient historical data to allow realistic evaluation of the recent changes, taking into account natural fluctuation and previous trends.
提供充足的历史数据来给予近期变化实际可行的评估,考虑自然的波动和以前的趋势。
Include a quantification of measurement uncertainty with the point estimate if there is uncertainty with an estimated quantity.
如果一个待估计量存在不确定性,要包括一个测量不确定性的量化的点估计。
Consider the following three examples of deceptive or ambiguous statements that illustrate how we might adapt from soundbites and headlines to refocus data presentation to be maximally informative and minimally deceptive:
仔细考虑下面三个易被误解或陈述含糊不清的例子,它阐明我们要怎样适应从话语片段或标题中重新审视数据报告,以使信息最大化以及误导最小化:
‘Production is up 10%’ 产量提高10%
This sounds catchy and impressive, but should this result get you excited? Gigerenzer highlights how the human mind is naturally predisposed to filling in missing information to give the statement context to make it understandable. Without additional details, you are unable to know if this is an important fact.
这听起来易记而且给人印象深刻,但是结果将会使你兴奋吗?吉格瑞泽强调人类的思想是怎样自然地倾向于填充丢失的信息来给出陈述的语境而使意思更易懂。如果没有额外的详细的资料,你不可能知道这是否是一个重要的事实。
What other key information should be provided for you to make an enlightened assessment of this statement?
为使你对这种陈述做出一个有见识的评价,应该为你提供其他什么样的信息呢?
First, you need to know the comparative time period you are relating this interval to: Are you looking at this month’s production compared with last month? Compared with this month last year? Compared with average production in this month for the last 10 years?
首先,你需要知道和这段时间间隔相关的比较期:你在看这个月和上个月的产量对比吗?和去年这个月的产量比较了吗?和过去10年中这个月份的平均产量对比了吗?
Second, when the comparison is based on a single previous time period, it is helpful to acknowledge the natural variation between observations. Figure 1 shows four different plots with a 10% change in production from the last month to this month.
其次,当这种比较是建立在一个以前的单独的时期时,它对确认两个观察值之间的自然变化是有帮助的。 图表1展示了产量从上月到这月变化10%的四个不同的情形。
图表1
In all cases except the first situation (A), we are unlikely to think this change is indicative of real change in production. If last month’s observation represented a 13% drop in production from the previous month (B), then you are likely to be less impressed with this month’s increase.
在除了第一种情况(A)的其他案例里面,我们不太可能认为这种变化是产量真正变化的象征。如果最后月份的观察数据代表产量从先前月份(B)一个13%的下跌,那么你有可能会对这个月的增长印象很少。
Similarly, if there is a seasonal trend (C), the increase in production might coincide with the regular annual pattern, and you would likely be better informed by looking at production compared to the average for this month in other years. Finally, (D) shows a high variability process in which fluctuations of 10% are not unexpected, and you should likely react only when the change falls outside the range of the natural variation of the process.
类似地,如果有一个季节趋势(C),产量的增长可能符合规律的年度变化趋势,你有可能通过观察与其他年度这个月份的平均值的对比对情况有个更好的了解。 最后,(D)展示了一个高度变化的过程,在这个过程里有10%的波动不是意外,你有可能只在变化超出过程自然变化范围之外时才有反应。
Your interpretation of the 10% change is very much a function of understanding the pattern of change in recent times. To assess the importance of this change, it would be ideal for the comparison to be made relative to similar months of data (for instance, the average of months with similar seasonal patterns for several years) and with the associated uncertainty of production appropriately characterized.
你关于10%变化的解释对于理解近期变化的形式起到很大作用。为了评估这种变化的重要性,有关类似月份的数据的比较以及相对于产量特性的不确定性的比较(比如,几年有着相似季节模式月份的平均值)是理想的。
Also, a simple time series plot with enumerated scale shown on the y-axis—and with sufficient history to capture seasonality—is an effective summary to provide a compact and suitable context for interpretation. The inclusion of the actual production numbers and recent history fills in the necessary details, and allows the audience to decide for itself if the change should be considered unusual and important.
同样,一个简单的时间序列图在y轴显示的范围——有充足的历史数据来捕捉季节性——是一个为解释说明提供合适环境的有效概括。 实际产量数据的包含内容和近期的历史数据填满了必要的细节,并且允许读者自己来决定是否应该考虑罕有的和重要的变化。
‘Defect rate doubled last quarter’ 上个季度缺陷率翻倍
Defect rates are typically estimated by sampling from production throughout the time interval. The defect rate change is given relative to a previous time period, but because defect rates across different production environments vary considerably, it is more critical to understand the true defect rates to assess the practical importance of this change.
缺陷率一般通过从一段时间间隔的产品抽样中估计。缺陷率的变化是相对于前一段时间而言的,但是由于缺陷率根据不同的生产环境而变化,所以理解缺陷率真正的涵义对于评价这种变化的实际重要性是至关重要的。
If your focus is on the yield of the process, a defect rate change from one in 50 to two in 50 might have a much higher impact than a defect rate change from one in 100,000 to two in 100,000. If your focus is on safety, any change in defect rate might be considered quite important.
如果你的焦点在过程的产量,那么缺陷率从1/50变化到2/50的比缺陷率从1/100,000变化到2/100,000有更大的影响。如果你的焦点是在安全上,那么在缺陷率上的任何变化可能会被认为相当重要。
Depending on the sampling rate and the cost of testing, the uncertainty associated with the estimates of defect rates can vary substantially. If the point estimate for defect rate doubled but remained within a 95% uncertainty interval for the rate (for instance, 0.002 +/– 0.002 for the previous quarter to 0.004 +/– 0.0025 for the current quarter), it is possible the nature of the sampling procedure might explain a large portion of the observed change.
根据抽样比率和测试成本,与估计缺陷率相关联的不确定性可能变化相当大。如果缺陷率的点估计翻倍但是比率保持在95%的不确定区间(比如,以前季度为0.002 +/– 0.002,当前季度为0.004 +/– 0.0025),抽样程序本身可能就能解释一大部分实测的变化。
But if the associated uncertainty is much smaller (for instance, 0.002 +/– 0.0005 for the previous quarter to 0.004 +/– 0.0005 for the current quarter), the observed change in rates is unlikely to be explained by the sampling process and likely is due to a real change in the defect rate. It is also important, however, to consider the practical importance of the observed change.
但是如果相关联的不确定性很小(比如,以前季度为0.002 +/– 0.0005,当前季度为0.004 +/– 0.0005),那么在利率上实测的变化不太可能被抽样过程解释,而是由于缺陷率真正的变化。然而,考虑实测变化的实际重要性也是很重要的。
To clear up this case, show the absolute defect rate with the associated uncertainty of the estimate for the comparison quarter and the new quarter. This helps to calibrate the absolute change and the importance of the change given the intended use of the parts.
为了处理这种情况,我们用估计比较季度和新季度的相关不确定性来说明绝对缺陷率。这帮助我们使绝对的变化以及零件既定用途变化的重要性
In addition, a summary plot of recent trends in the defect rate using a time series plot with included uncertainty will help assess the longer–term trend, as well as the natural fluctuations in estimates given the sampling and testing procedure.
另外,一张用包括不确定性的时间序列图来总结缺陷率最近趋势的图表将帮助我们评估长期趋势,同时评价给定的抽样和测试程序方面的自然波动。
‘A 10% temperature increase gave a 15% yield increase’
温度增加10%引起产量增加15%
The final example illustrates the importance of understanding units and how reporting the absolute numbers, rather then relative change, can improve interpretability. The data that led to this headline originated from a laboratory study in which different production environments were considered.
最后一个例子阐述了理解单位的重要性以及怎样报告绝对的数字,而不是相对的变化,这样将提高说服力。标题中的数据来源于一个实验室研究,这个研究考虑了不同的生产环境。
The default production temperature was 100°F, and it was found that a change to 110°F (the 10% increase) produced the observed increase in yield from 72% to 82.8%.
默认的生产温度是100华氏度,并且发现温度变化到110华氏度(10%的增加)时产出从72%增加到82.8%。
Given the actual numbers, you could formulate a number of alternative headlines, which all appear to characterize the results but are similarly lacking in real information. You could use degrees Celsius (100°F = 37.8°C and 110°F = 43.3°C, giving a 14.6% increase) or report the defect rate (72% yield↔28% defect rate and 82.8% yield↔17.2% defect rate, giving a 38.6% reduction in defects
如果给了真实的数据,你可能会构想出一些可供选择的标题,这些标题似乎描绘了结果但是同样缺乏真实的信息。你可以使用摄氏度(100华氏度=37.8摄氏度,110华氏度=43.3摄氏度,引起14.6%的增加)或者报告缺陷率(72%的产出↔28%的缺陷率,82.8%的产出↔17.2%的缺陷率,引起缺陷38.6%的减少)。
Hence, the same absolute results could translate into any of the following misleading or incomplete headlines:
因此,相同的绝对的结果可能会转换成下面任何一种误导或者不完全的标题:
A 14.6% increase in temperature (C) gave a 15% increase in yield.
温度(摄氏度)增加14.6%引起产出增加15%。
A 10% increase in temperature (F) gave a 38.6% reduction in defects.
温度(华氏度)增加10%引起缺陷减少38.6%。
A 14.6% increase in temperature (C) gave a 38.6% reduction in defects, in addition to the original option.
温度(摄氏度)增加14.6%引起缺陷减少38.6%,不包括原来的配件。
Clearly, the percentage changes are highly dependent on the summary chosen and give different impressions of the study’s results. There are several other important errors in this headline.
显然,变化的比例高度依赖选择的总体并且给研究的结果带来不同的印象。在这个标题上有几处其它重大的错误。
First, the percentage increase of temperature is actually meaningless. Percentages assume the zero on the scale corresponds to something absolute. Here, 0°C or 0°F are relatively arbitrary and do not represent a starting point for the scale against which percentage changes can be sensibly measured.
首先,温度增加的百分比实际上毫无意义。百分比假设零点在相当于某种绝对事物的程度上。在这里,0摄氏度和0华氏度相对主观,而且不能代表一个程度的起始点,相反这个变化百分比能够被很明显地测量。
Perhaps even more misleading is the idea that a temperature change is in any way comparable to a change in yield. It might make sense to compare a change in input costs of production (how much does it cost to raise the temperature from 100°F to 110°F) against change in output yield, but the headline is a classic apples–to–oranges comparison that lacks intrinsic meaning.
或许更多的误解是那种温度变化同产量变化类似的思想。或许比较产品输入成本的变化(将温度从100华氏度提高到110华氏度花费多少成本)与输出产量的变化更为合理。但是本标题是一个典型的“苹果到桔子”的对比,缺少了内在的含义。
Complete and self contained
完全且独立的(信息)
There is no substitute for providing complete information on the absolute scale—it allows the audience to directly assess the context and importance of the information. Providing a graphical or numerical summary of recent history is also valuable for enhancing the context and incorporating a measure of natural variation. When the quantities of interest are obtained by estimation, the uncertainty associated with this should be included as well.
没有能在绝对的程度上提供完全信息的替代物——它允许读者直接评定环境及信息的重要性。再这样的条件下,最近历史的绘图或者数据概括对于加强语境及自然变化的测量也是有价值的。当通过判断获得了大量的利益时,与此相关的不确定性也将被包含在内。
While the catchy headlines and relative summaries have the opportunity to be attention–grabbers, statisticians and those who report data–based results should resist these tactics and provide a complete self–contained summary that includes all of the key information from which to make an informed decision.
尽管易记的标题和相对的概要有被注意的机会——数据采集者、统计员和那些报告数据的人们——基本的结果应该不受这些策略的影响而且能够提供一个完全独立的概要,并通过它所包含的所有关键信息来做一个明智的决定。
References 参考文献
Gerd Gigerenzer, _Calculated Risk: How to Know When Numbers Deceive You_, Simon & Schuster, 2002.
戈尔德.吉格瑞泽,《风险计算:怎样知道数字何时欺骗你》,西蒙和舒斯特,2002年。
Donald J. Wheeler, _Understanding Variation: The Key to Managing Chaos,_ SPC Press, 2000.
唐纳德J.威乐,《了解变化:管理混乱的关键》,SPC出版社,2000年。
Edward Tufte, _The Visual Display of Quantitative Information, _Graphics Press 2001.
爱德华.塔夫特,《大量信息的视觉展示》,Graphics出版社,2001年。
Edward Tufte, _Envisioning Information,_ Graphics Press, 1990.
爱德华.塔夫特,《想象信息》,Graphics出版社,1990年。
Edward Tufte, _Visual Explanations: Images and Quantities, Evident and Narrative, _Graphics Press, 1997.
爱德华.塔夫特,《视觉解释:形象和数量,明显的和叙述的》,Graphics出版社,1997年。
Edward Tufte, _Beautiful Evidence, _Graphics Press, 2006.
爱德华.塔夫特,《极好的证据》,Graphics出版社,2006年。
Christine M Anderson-Cook is a research scientist at Los Alamos National Laboratory in Los Alamos, NM. She earned a doctorate in statistics from the University of Waterloo in Ontario. Anderson-Cook is a fellow of the American Statistical Association and a senior member of ASQ.
克里斯廷M.安德森-库克是一位洛杉矶国家实验室的研究科学家,她在美国安大略湖滑铁卢大学获得了统计学博士学位。安德森-库克是一位美国统计协会会员,而且是ASQ资深会员。
没有找到相关结果
已邀请:
3 个回复
小编H (威望:4) (广东 广州) 互联网 员工
赞同来自: