翻译小组

【校稿】第四十五篇 A five-pronged approach to analyze process data

本帖最后由小编D 于 2011-12-15 17:24 编辑

你好，我是小编H。请对以下文章有校稿兴趣的组员留下你的预计完成时间，并发短信息联系小H，以便小编登记翻译者信息以及文章最终完成时的奖惩工作。感谢支持翻译组！
翻译：http://www.6sq.net/space-uid-393147.html
Make DataMatterA five-prongedapproach to analyze process data
byRonald D. Snee

is data analysis an art or a science? Arguments exist for both sides,and many people simply come down in the middle. In my mind, I believe it’sboth.
数据分析是一门艺术或科学？两种争论一直存在，很多人就简单的持中立态度。而在我看来，数据分析即是艺术也是科学。
Regardless of which view you take, the discussion misses acritical element—the need for an explicitly articulated strategy for dataanalysis. In fact, the various attitudes toward the nature of data analysisoften imply unreflective strategies.
无论你持有哪种观点，一个非常关键因素是不能少的，数据分析需要有一个明确而且逻辑非常缜密的策略。事实上，很多数据分析方法都没有明确的策略。

Partisans of data analysis as an art simply might look at thedata, manipulate it based on their intuition and experience, and proceedconfidently to extract what they believe is useful information. The morescientific folks, with perhaps too much faith in numbers, go straight tostatistical software and do some indisputable number crunching.
把数据分析作为艺术的人，他们在进行数据分析时候，是基于他们自己直觉和经验，并满怀信心地去提取他们认为是有用的信息。认为数据分析是科学的人，对数字本身非常信仰，直接用统计软件做一些纯数字运算。

Those who stand on middle ground—possibly the great majority ofpractitioners—do a little of both: rely on their insight to manipulate thedata, run the numbers, do some further manipulation and rerun the numbers untilthey achieve what they believe is a satisfactory result.
持中间态度的人—实际工作中大部分人都这样：依靠他们的洞察力，进行数据提取，然后做运算；然后对数据进一步的分析处理，并重新进行数字运算，直到他们获得他们认为的满意结果。

All of those approaches are likely to produce questionableresults in terms of what the analysis addresses and the significance of theresults.
就分析的方法和意义而言，以上这些方法都可能产生可疑的结果。

Five activitiesPractitioners can avoid the pitfalls of these unreflective orad hoc approaches by adopting a clearly articulated, proven strategy foranalyzing process data and systematically following that strategy.1Such astrategy entails five essential activities:
在实际工作的时候，我们可以采用逻辑非常清晰且已经被证明有效的策略来避免出现得出一些草率的分析结果。这种策略包括以下五个基本活动：

Understanding the context of the analysis.
Examining the pedigree of the data.
Graphically representing the process.
Graphically representing the data.
Statistically analyzing the data.

1。了解分析的范围。
2。研究数据的谱系。
3。用图形表示分析过程。
4。用图形来展示数据。
5。统计分析数据。
Note that these are iterative, as opposed to sequential, activities. Depending on the circumstances, the order of some of these activities may shift.
请注意，这些活动是迭代重复的，而不是按照顺序一成不变的。根据实际的情况，也可以改变活动的顺序。

For example, in the mutually dependent iterations of this approach, the graphical representation of the process may precede the examination of the pedigree. In any case, most of these activities look forward and backward. The examination of the data’s pedigree—where it came from and how it was collected—may drive the analyst back to a fuller exploration of the context of the process to fill out that pedigree.
例如，在相互依存的迭代过程中，分析过程的图形展示可以先于研究数据的谱系。在任何情况下，这些活动都可以前移或者后移。检查数据的来源（数据的获取以及如何收集），可以促使分析人员进一步熟悉数据的谱系，对数据的范围有更全面的理解。

But the pedigree of the data also points to how the process should be graphically represented. That, in turn, could retrospectively suggest the need for additional types of data and prospectively affect the graphical representation. By engaging iteratively in these activities, you can arrive at important results that are ready to be fully and persuasively reported.
但是数据的系谱也可以说明过程中应如何用图形表示。反过来，可以追溯是否需要其他的数据类型和预示图形表达的影响。通过这些反复的活动，可以得出全面和有说服力报告的结论。

This approach offers at least three distinct advantages over less structured approaches. First, it is repeatable—it can be used in any situation that calls for the greater understanding of a process. Second, like sound processes themselves, it’s robust—flexible enough to encompass the wide variation of particulars to be found in different situations. Third, and most importantly, it’s more likely to produce useful results.
这种方法比结构化程度较低的方法至少有三个明显的优势：首先，它是可重复的，它可以在任何情况下更深入的了解的过程。其次，就像其本身健全的过程一样，可以灵活包括在不同情况下方方面面的细节。第三，也是最重要的的，它更可能产生有用的结果。

Understanding the context
了解背景范围
It’s difficult to know precisely how to proceed until you ask the most basic of questions: What is the purpose of the analysis? Are you trying to confirm a hypothesis?
当你对最基本的问题：分析的目的是什么？你试图证实一个假设？了解后，你才能清楚知道如何进行分析。

For example, a manufacturer that uses raw materials from two different vendors suspects that differences in quality are causing defects in the finished product. Data analysis can confirm or disconfirm the hypothesis and, in this example, identify the offending vendor. Such contexts call for what is sometimes referred to as confirmatory data analysis.
例如：一家工厂使用两个不同供应商提供的原材料，假设原材料的不同是导致产品最终缺陷的原因。通过数据分析可以证实假说，比如这个例子，就可以确定那家是违规的供应商。这种情况称之为数据分析验证。

Alternatively, let’s say you’re trying to solve specific problems, the causes of which you do not understand. For example, a chemical process is producing unacceptable variations in purity from batch to batch. Or a business process, like a bank loan approval process, is taking far too long to complete. Or, perhaps a distributor’s percentage of on-time deliveries is fluctuating widely. These contexts call for exploratory data analysis, which must first have a hypothesis to test.
另外，比方说，你正在试图解决的具体问题，其中有你不明白的原因。例如，一个化学工艺生产的产品的纯度是不合格。或者一个业务流程，像银行贷款审批程序，需要很长时间才能走完。或者分销商的准时交货率相差很大。这些背景需要进行探索性数据分析，必须首先有一个假设来测试。

In confirmatory and exploratory analyses of a process, the goal is the same: find the inputs and the controlled and uncontrolled variables that have a major impact on the output of the process.2
在进行的验证性和探索性的分析过程中，我们的目标是相同的：确定在对输出结果有主要影响的变量，这些变量是可控或者不可控的。

Examining the pedigree
检查系谱

Data analysis begins with a data table, which is either provided to or constructed by the analyst. In either case, you should always question the data because data can be, among many other things:
数据分析开始于一张数据表，这或者是分析人员提供或者制作的。在这两种情况下，你应该始终对数据持有怀疑态度：

• Incorrect: Some of the information is wrong—for example, when someone monitoring a process records the data incorrectly or a measurement device is faulty.
• Irrelevant: Some of it is the wrong information—for example, when data on the wrong variables are captured.
• Incomplete: Crucial information is missing—for example, when data on an important variable are missing.
• Misleading/biased: Data points you in the wrong direction for analysis—for example, when an important variable has been examined only over a short time, thus making it appear to be a constant.
•不正确的：有些信息是错误的，例如，当有人监控过程中记录的数据不正确或测量设备故障的。
•不相关的，有些是错误的信息，例如，记录了一个错误的变量数据
•不完整的：关键信息丢失，例如，当一个重要的变量数据缺失。
•误导/偏颇：数据可能把你指向错误的分析方向，例如当一个重要的变量仅仅通过短期的测量，这样就会使得这个变量看起来像恒定的（不变的）

An understanding of the context of the process can guard against these errors, but the context alone is insufficient. Given these and the many other shortcomings that can undermine the value of the data, it is absolutely critical to understand the pedigree of the data—where it came from and how it was collected.
了解数据的背景可以防止这些错误，但仅仅了解背景是不够的。鉴于这些或者其他的不足，会破坏数据的价值，所以了解数据是从哪里来的，如何收集，对理解数据的系谱就变得尤为重要。

For example, consider a batch manufacturing process in which a sample is taken every shift and carried to an analytical lab where it is tested for purity, and the results are recorded. Thus, the data trail is:
例如，在一个批量制造过程中，样品是从每个批次或者从用于纯度测试的实验室而来。因此，数据的线索是：

Production process ► sampling process ► testing process ► data-logging process.
生产过程►采样的过程►测试过程►数据记录过程。
To understand the resulting data, it is necessary to understand this data trail and the production process parameters. That is the pedigree of the data.
要了解所产生的数据，有必要了解这一数据的产生轨迹和生产工艺参数，这就是数据的系谱。

Incomplete understanding of the data’s pedigree can lead you down wrong analytical trails. Suppose, for example, a pharmaceutical company is experiencing differences in yield from batch to batch of a product because of the properties of the raw materials supplied by a vendor. Although the properties for each batch of raw materials are within specifications, the yield nevertheless varies unacceptably.
对数据的系谱不了解会导致错误的分析试验。假设，例如，由于供应商提供的原材料属性问题，一家制药厂每批次的产量存在差异性。虽然产品的原材料属性都在规格范围内，但产量的差异仍不可接受。

The analyst has been given a data table that includes the properties of the raw materials for each batch of product under consideration. But if the analyst does not know that some raw material batches were analyzed by the vendor’s quality assurance lab and some by the manufacturer, then there is a strong possibility the analysis will come up empty. By taking the time to understand the pedigree of the data fully, the analyst can save much frustration and fruitless work.
分析师得到一张包括每一批原材料属性的数据表，但是，如果分析师不知道部分原材料是由供应商的质量保证实验室提供和另外一部分是有制造商提供的话，分析最后结果很有可能一无所获。所以花一些时间去全面的了解数据的系谱，可以减少分析时候的挫折感和做一些徒劳的工作。

Some Guiding Principles
指导性原则

• The process provides the context for the problem being studied and the data being analyzed.
• Know the pedigree of the data—the who, what, when, where, why and how of its collection.
• Analysis is defined by how the data were generated.
• Understand the measurement system as well as the process.
• Be aware of human intervention in a process. Humans are often a large source of variation.
•过程中提供的问题正在研究和数据分析的背景下。
•了解数据的系谱谁—是谁，什么，何时，何地，为何以及怎么样收集
•分析是如何产生的数据。
•了解测量系统以及过程。
•了解测量过程中的人为干预，人往往是数据异常的主要原因。

Graphing the process

A graphical representation of the process shows how the process works from end to end. Such representations fall into two broad categories: flow charts and schematics. A flow chart maps the sequence and flow of the process and often includes icons, such as pictures of a truck to represent a transportation step or smokestacks to indicate a factory.
通过图形表示过程可以展示过程是如何从一端运转到另一端。这种展示可以分为两大类：流程图和示意图。流程图按照顺序和流动过程形式展现，通常包括图标，如一辆卡车代表的是交通运输环节或烟囱表明是工厂。

A schematic representation is designed to exhibit the inputs and the controlled and uncontrolled variables that go into a process to produce its outputs. Both types of representation reinforce one another by suggesting what types of data are needed, where they can be found and how they can be analyzed.
示意图用于展示输入因素以及过程中的可控和不可控变量，最终得到的输出。两种类型的表现可以彼此加强，需要什么类型的数据，在那里他们可以找到，以及如何进行分析。

Figure 1 is an elementary schematic representation of a process (such as pharmaceutical, chemical or loan approval As the analyst knows, the context is unacceptable variations in yield from batch to batch of the finished product. Therefore, “yield” is the key output.
图1是一个基本的过程示意图（如制药，化工或贷款审批）。正如分析所知，这一部分内容（不可控变量）在每一批完成产品的“产量”中表示的是不可接受差异。因此“产量”是关键输出。

http://www.asq.org/img/qp/082506_figure1.gif

Toget an accurate picture of the process again, however, analysts should notsimply rely on the context. To find out how the process really works, theyshould also observe the process first-hand and question the people who operateit. This investigation might also lead the analyst to further refine the pedigreeof the data—the who, when and why of its measurement and collection.
然而，为了准确了解过程的画面，分析人员不应该仅仅依靠上下文。分析人员应该直接去观察过程，询问具体的操作人员，这样才能知道流程是如何运转的。这样分析人员进一步了解数据的系谱，即是谁、何时以及为什么测量和收集。

Withyield as the key output of a manufacturing process, the analyst can nowgraphically represent the process and fill in the blanks with the sources ofpossible variation that led to the unacceptable variations in yield. For theinputs, sources of variation might be energy, raw materials and different lotsof the same raw materials. Controlled variables that go into the process mightinclude things like temperature, speed of flow and mixing time.
随着产量的制造过程中的关键输出，分析人员可以用图形方式表示过程，也知道哪些是可能导致不可接受变量的来源。输入和变量的来源可能是能源，原材料和不同批次的相同原料。可控变量包括如温度，流速和搅拌时间等。

In essence, controlled variables are the things that can beadjusted with a knob or a dial. Uncontrolled variables that go into thisprocess may include human intervention and differences in work teams,production lots, days of the week, machines or even heads on the same machine.In the output of the process, variation may result from the measurement systemitself.
实质上，可控变量是能够通过工具（knob or dial）度量的事物。不可控变量在流程中可能包含人为干涉和工作团队、生产批量、每周工作天数、生产工具甚至是同一个生产工具的使用人数所产生的差异。在流程的输出中，差异可能由于度量体系本身所产生。

A good rule tofollow when you have, for example, two production lines doing the same thing ortwo pieces of equipment performing the same task, is to assume they vary untilproven otherwise. That’s especially true for the human factor. Experience showsthat in creating the initial data table and in the graphical representation ofthe process, the human element is a frequently overlooked source ofvariation.
可以遵循一个规则, 比方说，当你有两条生产线做同样的事情或者两套设备执行同样的任务,你可以先假定不一样，直到证明它们确实没差别。经验表明，在创建初始数据表和用图形表示的过程的时候，人为因素是经常被忽视的一种变量。

Inthe aforementioned pharmaceutical manufacturing process, the analyst mayoverlook that the process includes three shifts with four different workteams on the shifts.
在前面提到的药品生产过程中，分析人员可能忽略4个不同工作组进行三次转换的过程。

As aresult of the observation and investigation that goes into constructing thegraphical representation of the process, however, the analyst makes sure thedata table records which team produced which batches on which days and that thedata are stratified in the analysis. The failure to take that human elementinto account results in a highly misleading data table and might obscure theultimate solution to the problem.
由于分析人员通过观测和深入调查研究了过程，知道数据是由哪个团队，是哪天以及哪个批次产生的，所以她们认为这些数据是分层。但是由于把人为因素考虑进去，从而导致错误数据表，可能会掩盖最终问题的解决办法。

Graphing the data
Thegraphical representation of the process—and the understanding of the possiblesources of variation it helps generate—suggests ways in which the analyst cangraphically represent the data. Because data are almost always sequential, arun chart is often needed. In our example, the x-axis would register time and they-axis would register yield.
用图形展示过程以及了解产生变量的可能来源，分析人员可以用图表表示数据。决大部分数据都是有时间顺序的，所以做趋势图就非常有必要。以我们公司趋势图为例，x轴代表时间和Y轴代表产量。

Ascatter plot also may be used, with process variables registered on the x-axisand process outputs registered on the y-axis. Other familiar graphicaltechniques include box plots, histograms, dot plots and Pareto charts.
散点图也是一种使用的图形，X轴表示过程变量，Y轴表示过程输出。其他的熟悉图形方法有箱线图，直方图，点图和帕累托图。

Inusing any of these techniques, the goal is to make sure you are exploring therelationships of potentially important variables and preparing an appropriategraphical representation for purposes of statistical analysis. Plotting thedata in different ways can lead to insights and surprises about the sources ofvariation.
不管使用哪一种图形表示，我们的目标是把重要的变量之间的潜在关系可以表现出来，并通过适当的图形来展示统计分析的目的。通过不同图形来展示数据，会得到不同的视角。

Statistically analyzing the data
Thestatistical analysis of the data, usually with the aid of statistical software,establishes what factors are statistically significant. For example, are thedifferences in yield produced by different work teams statisticallysignificant? What about variations in temperature or flow? What about themeasurement system itself?
在统计软件的帮助下，数据的统计分析需确定统计因子的统计意义。例如，不同的工作组的产量是不同吗？温度或流量是怎么变化的？测量系统本身的有什么问题？

Thekey to success lies in intimately knowing the data from the context of theprocess, graphically representing it and formulating a model that includes thecomparisons, relationships, tests and fits you are going to study.
对数据进行分析成功的关键在于要对数据的背景非常熟悉，能够用图形展示过程，并建立一个具有可比性、相关性、可测试的模型。这个模型可以用于进一步的研究。

Onceyou have created the graphics and done the statistical calculations, theresults should be checked against the model. Does it account for all of thevariation? In short, do the results make sense? If so, you can confidentlyreport your results.3
如果你创建的图形，并运行了统计分析，最后统计的结果要再一次验证你的数据模型。是否包含了所有的变量？总之，你的结果有意义吗？如果有意义，就可以放心地汇报你的结果。

Beyond analysis to action
The final point about reporting the results offers a reminderthat analysis goes beyond the exploratory or confirmatory. The analyst also must beable to display and communicate results to decision makers. The most elegantanalysis possible is wasted if it fails to communicate and the organizationtherefore fails to act.
最后给分析人员一个提醒：必须要向决策者表达清楚你的分析结果，同时要会沟通。如果因为不擅长沟通或者组织没有执行，那是非常可惜的。

Earlyin my career, I was asked to analyze whether a chemical company’s new producthad adversely affected animals in safety studies. Personnelin the company’s lab insisted the data from the experiments showed adverseeffects, and the company should therefore cease development of the product.Analysts on the company’s business side had concluded the data showed noadverse effects. My analysis reached the same conclusion, and in a showdownmeeting between the business and the lab personnel, I presented my findings.
在我刚参加工作不久，参与了一家化工厂开发的新成品是否对动物有不良影响的安全性研究。公司里实验部门人得出的结果是对动物有不良影响，所以建议停止开发这款新产品。分析人员从公司的商业角度分析没有不良影响。我的分析得出同样地结论，在实验部和商业部的最终表决会议上，我发表我的观点。

Atthe conclusion of my presentation, replete with analytical representations ofthe statistical significance of the data, the lab director remainedunconvinced. So I handed him one final graph: a dot plot that, for some reason,I had not included in my presentation.
我演讲的结论都是数据分析的结果，但实验部门的主管始终不为所动。所以，我递给他的最后一个图-点图，出于某种原因，我没有在我的演包括。

Helooked at the graph and began to think aloud while everyone in the meeting satsilently. He continued to look and talk and look and talk. At last, he saidemphatically, “Maybe there isn’t a difference.”
他一边看着图一边自言自语的说些什么，其他的人安静的坐在一旁。他继续一边看一边说些什么，最后他大声的说了一句：“也许没有什么区别。”

Inthe absence of that persuasive graphical representation and model of the data,the company might have ceased production of what turned out to be a valuableand harmless product. The bottom line is that the analyst must notonly do data analysis that matters, but also make it matter.
在缺乏有说服力的图表和数据模型时候，公司可能会放弃生产原本是有价值无害的产品。因此分析人员不仅要分析数据本身，也要使数据分析的结果成为事实。

©Ronald D. Snee, 2008.