翻译小组

【翻译】第四十五篇 A five-pronged approach to analyze process data

本帖最后由小编H 于 2011-11-15 16:24 编辑

你好，我是小编H。请对以下文章有翻译兴趣的组员留下你的预计完成时间，并发短信息联系小编H，以便小编登记翻译者信息以及文章最终完成时的奖惩工作。感谢支持翻译组！

MakeData MatterAfive-pronged approach to analyze process data
by Ronald D. Snee

is data analysis an art or a science?Arguments exist for both sides, and many people simply come down in the middle.In my mind, I believe it’s both.

Regardless of which view you take, thediscussion misses a critical element—the need for an explicitly articulatedstrategy for data analysis. In fact, the various attitudes toward the nature ofdata analysis often imply unreflective strategies.

Partisans of data analysis as an art simplymight look at the data, manipulate it based on their intuition and experience,and proceed confidently to extract what they believe is useful information. Themore scientific folks, with perhaps too much faith in numbers, go straight tostatistical software and do some indisputable number crunching.

Those who stand on middle ground—possiblythe great majority of practitioners—do a little of both: rely on their insightto manipulate the data, run the numbers, do some further manipulation and rerunthe numbers until they achieve what they believe is a satisfactory result.

All of those approaches are likely toproduce questionable results in terms of what the analysis addresses and thesignificance of the results.

Five activities
Practitioners can avoid the pitfalls ofthese unreflective or ad hoc approaches by adopting a clearly articulated,proven strategy for analyzing process data and systematically following that strategy.1Sucha strategy entails five essential activities:

Understanding the context of the analysis.2. Examining the pedigree of the data.3. Graphically representing the process.4. Graphically representing the data.5. Statistically analyzing the data.

Note that these are iterative, as opposedto sequential, activities. Depending on the circumstances, the order of some ofthese activities may shift.

For example, in the mutually dependentiterations of this approach, the graphical representation of the process mayprecede the examination of the pedigree. In any case, most of these activitieslook forward and backward. The examination of the data’s pedigree—where it camefrom and how it was collected—may drive the analyst back to a fullerexploration of the context of the process to fill out that pedigree.
But the pedigree of the data also points tohow the process should be graphically represented. That, in turn, couldretrospectively suggest the need for additional types of data and prospectivelyaffect the graphical representation. By engaging iteratively in theseactivities, you can arrive at important results that are ready to be fully andpersuasively reported.
This approach offers at least threedistinct advantages over less structured approaches. First, it is repeatable—itcan be used in any situation that calls for the greater understanding of aprocess. Second, like sound processes themselves, it’s robust—flexible enoughto encompass the wide variation of particulars to be found in differentsituations. Third, and most importantly, it’s more likely to produce usefulresults.

Understanding the context
It’s difficult to know precisely how toproceed until you ask the most basic of questions: What is the purpose of theanalysis? Are you trying to confirm a hypothesis?
For example, a manufacturer that uses rawmaterials from two different vendors suspects that differences in quality arecausing defects in the finished product. Data analysis can confirm ordisconfirm the hypothesis and, in this example, identify the offending vendor.Such contexts call for what is sometimes referred to as confirmatory dataanalysis.
Alternatively, let’s say you’re trying tosolve specific problems, the causes of which you do not understand. Forexample, a chemical process is producing unacceptable variations in purity frombatch to batch. Or a business process, like a bank loan approval process, istaking far too long to complete. Or, perhaps a distributor’s percentage ofon-time deliveries is fluctuating widely. These contexts call for exploratorydata analysis, which must first have a hypothesis to test.
In confirmatory and exploratory analyses ofa process, the goal is the same: find the inputs and the controlled anduncontrolled variables that have a major impact on the output of the process.2
Examining the pedigree
Data analysis begins with a data table,which is either provided to or constructed by the analyst. In either case, youshould always question the data because data can be, among many other things:·
Incorrect: Some of the information is wrong—for example,when someone monitoring a process records the data incorrectly or a measurementdevice is faulty.·
Irrelevant: Some of it is the wrong information—forexample, when data on the wrong variables are captured.·
Incomplete: Crucial information is missing—for example,when data on an important variable are missing.·

Misleading/biased: Data points you in the wrong direction foranalysis—for example, when an important variable has been examined only over ashort time, thus making it appear to be a constant.
An understanding of the context of theprocess can guard against these errors, but the context alone is insufficient.Given these and the many other shortcomings that can undermine the value of thedata, it is absolutely critical to understand the pedigree of the data—where itcame from and how it was collected.
For example, consider a batch manufacturingprocess in which a sample is taken every shift and carried to an analytical labwhere it is tested for purity, and the results are recorded. Thus, the datatrail is:
Production process ► sampling process ► testing process ► data-logging process.
To understand the resulting data, it isnecessary to understand this data trail and the production process parameters.That is the pedigree of the data.
Incomplete understanding of the data’spedigree can lead you down wrong analytical trails. Suppose, for example, apharmaceutical company is experiencing differences in yield from batch to batchof a product because of the properties of the raw materials supplied by avendor. Although the properties for each batch of raw materials are withinspecifications, the yield nevertheless varies unacceptably.
The analyst has been given a data tablethat includes the properties of the raw materials for each batch of productunder consideration. But if the analyst does not know that some raw materialbatches were analyzed by the vendor’s quality assurance lab and some by themanufacturer, then there is a strong possibility the analysis will come up empty.By taking the time to understand the pedigree of the data fully, the analystcan save much frustration and fruitless work.
Some Guiding Principles·
The process provides the context for theproblem being studied and the data being analyzed.· Know the pedigree of the data—the who,what, when, where, why and how of its collection.· Analysis is defined by how the data weregenerated.· Understand the measurement system as wellas the process.· Be aware of human intervention in aprocess. Humans are often a large source of variation.
Graphing the process
A graphical representation of the processshows how the process works from end to end. Such representations fall into twobroad categories: flow charts and schematics. A flow chart maps the sequence andflow of the process and often includes icons, such as pictures of a truck torepresent a transportation step or smokestacks to indicate a factory.
A schematic representation is designed toexhibit the inputs and the controlled and uncontrolled variables that go into aprocess to produce its outputs. Both types of representation reinforce oneanother by suggesting what types of data are needed, where they can be foundand how they can be analyzed.
Figure 1 is an elementary schematicrepresentation of a process (such as pharmaceutical, chemical or loanapproval As the analyst knows, the context is unacceptable variations inyield from batch to batch of the finished product. Therefore, “yield” is thekey output.

http://www.asq.org/img/qp/082506_figure1.gif

To get an accurate picture of the process again,however, analysts should not simply rely on the context. To find out how theprocess really works, they should also observe the process first-hand andquestion the people who operate it. This investigation might also lead theanalyst to further refine the pedigree of the data—the who, when and why of itsmeasurement and collection.

With yield as the key output of amanufacturing process, the analyst can now graphically represent the processand fill in the blanks with the sources of possible variation that led to theunacceptable variations in yield. For the inputs, sources of variation might beenergy, raw materials and different lots of the same raw materials. Controlledvariables that go into the process might include things like temperature, speedof flow and mixing time.

In essence, controlled variables are thethings that can be adjusted with a knob or a dial. Uncontrolled variables thatgo into this process may include human intervention and differences in workteams, production lots, days of the week, machines or even heads on the samemachine. In the output of the process, variation may result from themeasurement system itself.

A good rule to follow when you have, forexample, two production lines doing the same thing or two pieces of equipmentperforming the same task, is to assume they vary until proven otherwise. That’sespecially true for the human factor. Experience shows that in creating theinitial data table and in the graphical representation of the process, thehuman element is a frequently overlooked source of variation.

In the aforementioned pharmaceuticalmanufacturing process, the analyst may overlook that the process includes threeshifts with four different work teams on the shifts.

As a result of the observation andinvestigation that goes into constructing the graphical representation of theprocess, however, the analyst makes sure the data table records which teamproduced which batches on which days and that the data are stratified in theanalysis. The failure to take that human element into account results in ahighly misleading data table and might obscure the ultimate solution to theproblem.

Graphing the data
The graphical representation of theprocess—and the understanding of the possible sources of variation it helpsgenerate—suggests ways in which the analyst can graphically represent the data.Because data are almost always sequential, a run chart is often needed. In ourexample, the x-axis would register time and the y-axis would registeryield.

A scatter plot also may be used, withprocess variables registered on the x-axis and process outputs registered onthe y-axis. Other familiar graphical techniques include box plots, histograms,dot plots and Pareto charts.

In using any of these techniques, the goalis to make sure you are exploring the relationships of potentially importantvariables and preparing an appropriate graphical representation for purposes ofstatistical analysis. Plotting the data in different ways can lead to insightsand surprises about the sources of variation.

Statistically analyzing thedata
The statistical analysis of the data,usually with the aid of statistical software, establishes what factors arestatistically significant. For example, are the differences in yield producedby different work teams statistically significant? What about variations intemperature or flow? What about the measurement system itself?

The key to success lies in intimatelyknowing the data from the context of the process, graphically representing itand formulating a model that includes the comparisons, relationships, tests andfits you are going to study.

Once you have created the graphics and donethe statistical calculations, the results should be checked against the model.Does it account for all of the variation? In short, do the results make sense?If so, you can confidently report your results.3

Beyond analysis to action
The final point about reporting the resultsoffers a reminder that analysis goes beyond the exploratory or confirmatory.The analyst also must be able to display and communicate results to decisionmakers. The most elegant analysis possible is wasted if it fails to communicateand the organization therefore fails to act.

Early in my career, I was asked to analyzewhether a chemical company’s new product had adversely affected animals insafety studies. Personnel in the company’s lab insisted the data from theexperiments showed adverse effects, and the company should therefore ceasedevelopment of the product. Analysts on the company’s business side hadconcluded the data showed no adverse effects. My analysis reached the sameconclusion, and in a showdown meeting between the business and the labpersonnel, I presented my findings.

At the conclusion of my presentation,replete with analytical representations of the statistical significance of thedata, the lab director remained unconvinced. So I handed him one final graph: adot plot that, for some reason, I had not included in my presentation.

He looked at the graph and began to thinkaloud while everyone in the meeting sat silently. He continued to look and talkand look and talk. At last, he said emphatically, “Maybe there isn’t adifference.”

In the absence of that persuasive graphicalrepresentation and model of the data, the company might have ceased productionof what turned out to be a valuable and harmless product. The bottom line isthat the analyst must not only do data analysis that matters, but also make itmatter.
© Ronald D. Snee, 2008.