
第五十九篇 Gaming the Metrics

Gaming the Metrics

One of the cornerstones of quality and lean Six Sigma is data: “We insist on it.” “Don’t tell us what you think the situation is; let the data do the talking.” “In God we trust—all others bring data.” You get the idea.
质量和精益六西格玛的奠基石之一是数据: “我们坚持要它。” “别告诉我们你认为现状是什么; 让数据来说话。“我们信赖上帝,其他的请拿数据来。” 你明白的!

An unfortunate side effect of this emphasis is the proliferation of useless data. If the useless data weren’t used, then collecting the data would merely be a waste of time. But if a person’s performance is being measured by these data, you can bet your last euro that the measurements will get a lot of attention, and it will drive a lot of behavior. And if the system doesn’t change, there’s still one way to make the measurements look better: cheat.
不幸的是,这样强调数据的重要性将会带来另外一个副作用——大量的无效数据。 如果无效数据不适用的话,顶多是浪费一些收集数据的时间。但是如果用这样的数据来衡量一个人的绩效,你可以用你压裤兜的钱来打赌,这样做的话,肯定会备受关注,招来大量的注意及行动。假如这样子的衡量系统不加以改变的话,大家为了让衡量的结果好看点,最后只有一条路:作假!

I often open my face-to-face training sessions with Dr. Deming’s Red Bead Experiment. It’s a great icebreaker, and it introduces some important statistical ideas. The experiment is actually a game with very simple rules. “Willing workers” are required to use a paddle with holes in it to sample beads from a container that has red and white beads in it. “We don’t want any red beads,” the workers are told. To drive the point home, there are “quality Inspectors” to check the samples for the unwanted red beads and to record the results, and “supervisors” to use the results to “coach” and discipline the hapless willing workers.

Before the game concludes, there are always participants who, seeing a bunch of red beads on their paddle, quickly dump the sample back before the count can be made. Others deliberately pick out red beads and throw them back. Still others bring partially filled paddles to the quality inspectors. There are all manner of ways to try and beat the system. And this is just a fun game, played for no stakes at all. Imagine what people do when real consequences are on the line, such as pay and promotions.

The most serious games are probably played in totalitarian countries, where factory managers are measured and sometimes executed when the results are less than required by the authorities. According to the History Learning Site, in Stalin’s Russia:

Factories took to inflating their production figures and the products produced were frequently so poor that they could not be used—even if the factory producing those goods appeared to be meeting its target. The punishment for failure was severe.

In the book Eat the Rich (Atlantic Monthly Press, 1999), author P. J. O’Rourke tells us that in the former USSR:
在《Eat the Rich》这本书 (Atlantic Monthly Press, 1999),作者P. J. O’Rourke告诉我们,在前苏联:

The trouble wasn’t that factory managers disobeyed orders. The trouble was that they obeyed them precisely. If a shoe factory was told to produce 1,000 shoes, it produced 1,000 baby shoes because they were the cheapest and easiest to make. If it was told to produce 1,000 men’s shoes, it made them all one size. If it was told to produce 1,000 shoes in a variety for men, women, and children, it produced 998 baby shoes, one pump, and a wingtip. If it was told to produce 3,000 pounds of shoes, it produced one enormous pair of concrete sneakers.

Perhaps O’Rourke is exaggerating, but the point is still essentially valid: Metrics can—and probably will—be gamed. In lean Six Sigma there’s a common metric gaming activity that I call Denominator Improvement. One of the most popular metrics is defects per million opportunities, or DPMOs. The formula itself is quite simple: DPMO = 1,000,000 x Defects/Opportunities. If someone’s performance is being measured using DPMOs, he can make the metric look better by reducing defects (the numerator), or by increasing the number of opportunities (the denominator)。
就算O’rourke说的夸张了点,但有一点还是基本上有效的:度量指标能——或很大可能即将——是用来当游戏玩的。在精益六西格玛存在一个常见的度量指标的游戏活动,我称之为分母改进。最受欢迎的一个缺陷度量标准是每百万机会的缺陷数,或DPMOs。这个公式本身相当简单:DPMO = 1000000 x 缺陷/机会。如果某人的绩效是用DPMOs来计量的,他可以通过减少缺陷(分子) 或者增大机会值(分母)使指标更好看。

For example, we might be interested in the number of typing errors in this post. The DPMO metric might be 1,000,000 x Errors/Total Words. But if this number didn’t look good enough, I might also use 1,000,000 x Errors/Total Letters, or 1,000,000 x Errors/Total Characters, counting spaces and punctuation.
举个例子来说吧:我们可能会对这篇文章里出现的输入错误数量感兴趣。DPMO的指标可能是1000000 x 错误数/单词总数。但是如果这个数字不好看,我也会用1000000 x 错误数/总字母数,或1000000 x 错误数/总字符数,包括计数空格和标点符号。

The solution to metrics gaming is to use metrics to guide improvement, not to measure the performance of people. Metrics should be limited to those numbers that quantify an important outcome (Y metrics), or quantify an input that is critical to the quality of the outcome (a CTQ or X metric)。 The reason for quantifying these things is to discover, validate, and use a transfer function—such as Y = f(x), a model of the cause-and-effect relationship—to guide improvement planning and activity. When metrics serve a useful purpose such as this, the tendency to manipulate and game them is, if not eliminated, at least reduced.
这个问题的解决方案是,使用他们来度量改善活动,而不是来衡量一个人的绩效。数据指标,应限于量化那些重要的输出(Y),或量化CTQ ( 一个CTQ 或 X)。量化这些事情的理由是:发现、验证和使用一个转换函数,如Y = f(x),因果关联图去指导改善计划和改善活动。当度量指标是为想前述这样有益的目的服务的话,篡改或作假数据的倾向就算不会消失不见,至少也会减少。


Tom Pyzdek
Thomas Pyzdek’s career in business process improvement spans more than 40 years. He is the author more than 50 copyrighted works including The Six Sigma Handbook (McGraw-Hill, 2003)。He provides online certification and training in Six Sigma and lean.
Tom Pyzdek
Tomas Pyzdek在业务流程改进方面有着超过40年职业生涯。他有着超过50个有版权的著作,其中包括《六西格玛指南》(McGraw-Hill, 2003)。他提供了在线的,在六西格玛和精益方面的认证和培训。


