desertcockatooΔιαχείριση Δεδομένων

20 Νοε 2013 (πριν από 3 χρόνια και 6 μήνες)

92 εμφανίσεις


By Peter Coy


Michael Drosnin has performed a tremendous public service by writing
The Bible Code,
the fast
selling new book that claims to find hidden
messages in the Bible about dinosaurs, Bill Clin
on, and the Land of
Magog. Not because Drosnin is correct, but because his methodology
is so bad that it's a valuable example of how not to read data.

The pitfall Drosnin tumbled into threatens to ensnare any unwary
practitioner of "data mining," the popul
ar technique for building pre
dictive models of the real world by discerning patterns in masses of
computer data. Done right, data min
ing can help discover drugs,
forecast recessions, weed out credit
card fraud, and pinpoint sales
prospects. Done wrong,
it produces bogus correla
tions that range from
less to dangerous.

The error Drosnin committed in
The Bible
Code was a data
classic. He wrote out the Hebrew Bible on a huge grid of letters and
used a computer to look for words that appear across
, up, down, or
diagonally. The cryptic "messages" consist of seemingly related words
that ap
pear near each other, for in
stance, dinosaur and asteroid.

GARBAGE in. It's best not to spill too much ink on
the bible Code.
Drosnin says that he used the code

to foresee the assassination of
Israeli Prime Minister Ytzhak Rabin, among other events. But his
approach is immune to statisti
cal verification or rebuttal, for that
matter. Eliyahu Rips, the Is
raeli mathematician whom Drosnin credits
as the code's disc
overer, says he doesn't support the book. Its main
value, then, is to illustrate a principle enunciated by Andrew W Lo, a
finance professor at Massachusetts Institute of Technology: "Given
enough time, enough attempts, and enough imagination, almost any
ttern can be teased out of any data set."

Experts from economists to epi
demiologists have made similar
mistakes. It was once common to mine health records in search of
"hot spots” with above
average cancer rates. Epidemiologists would
then develop hypot
hesis about what might have caused the apparent
outbreak. This terrorized residents, usually for no good reason. Some
have above average cancer rates by pure chance.

Data mining can lead to costly misinterpretations. ProCyte C
Kirkland, Wash., was dismayed in 1992 when a clinical trial found that
its new drug, Iamin, didn’t seem to promote general healing of diabetic
ulcer wounds. So the company searched through subsets of the data
and found that I
min seemed to work on certa
in foot wounds. But that
was a statistical fluke, as it turned out after another expensive and
fruitless clinical trial. Not allowed drug status, Iamin is now sold as a
wound dressing.

Finance is rife with wrong
headed data mining.

David J.
Leinweber, ma
naging director of First Quadrant Corp. in Pasadena,
Calif., which man
ages $20 billion in assets, likes to il
lustrate the
problem with "Stupid Data
mining Tricks." For example, he sifted
through a United Nations CD
ROM and discovered that the single best

tor of the Standard & Poor's

stock index was butter produc
tion in Bangladesh. The lesson: A formula that happens to fit the data
of the past won't necessarily have any
predictive value. That's true
even of the Index of Leading Economic Indica
tors, which the
Commerce Dept. turned over to the Conference Board in 1995.

University of Pennsylvania economist Diebold says the Commerce
Dept.’s periodic rejiggering of the index made it fit the historical data
more closely but didn’t improve it as a fo
recasting tool.

The problem could get worse. With desktop computers becoming

more powerful, data
mining tools are being used by people who are

clueless about statistics. It's human nature to search for patterns

whether constellations in the stars or f
aces in the clouds. And comput
ers allow that impulse to run wild. Says Ajexis DePlanque, a senior
research analyst at MEtA Group in Stamford, Conn.: "We need to be
sure we're not just empowering people to shoot themselves in the
foot." That's true whether

the data come from supermarket scanners
or the Bible.

Coy is BUSINESS Week’s associ
ate economics editor.

June 16, 1997