Dyxlexic Digital Digger

Michael Rothschild

This article appeared in Forbes ASAP (October 1994).
Despite stunning advances in technology, experts estimate that over 80% of corporate information still resides on paper. Less than five percent is loaded into neatly structured databases, with the balance -- word processed documents, spreadsheets, and scanned images -- stuffed into a vast electronic attic on disk.

"Information is a company's most important asset, but it's of no value if you can't get at it when you need it," says Casey Stern of Starr Securities in New York. "When I read an article, I make notes in the margin. If I scan it into the computer, how do I get my notes back? Paper may not be high-tech, but it works."

Nonetheless, the dream of the "paperless office" lives on. Like bereaved relatives who cryogenically freeze loved ones' corpses in the hope that future technology will bring them back to life, PC users lovingly bury their spreadsheets, presentations, email, and graphics files on their disk drives in the belief that someday they might bring them back from data heaven. But when a document is important, they print out and file away a hard copy just to be safe.

With desktop multimedia -- and its full motion video and sound -- just around the technological corner, the retrieval problem is about to become infinitely worse. Even if you have unlimited time and money, which elements of a photo deserve to be listed in the index? Is it "boy with dog," "German shepard," "cloudless sky," "winter scene," or any of dozens of other details which could reasonably be used to index a single image? How would you label a few seconds of a melody, a video, or the readout of an electrocardiogram?

"With all the hype on multimedia and the information superhighway, the real payoffs will only be possible if we can retrieve whatever we want by saying, 'Give me something that kind of looks like this," says John Heidbreder, Director of Work Group Solutions for IBM. "I'm afraid there's going to be huge disappointment, if we can't do free form searching."

But the very notion of "free form searching," contravenes the two fundamental principles of our centuries-old, index-based retrieval methods. First, the indexer must decide up front which aspects of a document's content are worth categorizing, even though he can't possibly know all the ways in which that content might relate to some future situation. As IBM's Heidbreder puts it, "If you weren't in on the design of the index, your chances of finding what you want are remote."

Second, the indexer has to define the boundaries of each category -- when to lump things together and when to split them apart -- even though real world information rarely falls neatly into pigeon holes. Take fingerprints, for example, the conventional indexing scheme recognizes five types: left swirl, right swirl, small loop, and so on. But fingerprint characteristics are spread along a multi-dimensional continuum. Some are 60% left swirl, 30% small loop, etc. Like every other indexer, the fingerprint expert's categories are not empirical facts but convenient baskets.

Despite these profound defects in the "index before you know what you might be looking for" approach, the experts long ago concluded that they had no choice but to engineer ever more elaborate index-based systems. A flawed solution is better than none. And selling imperfect retrieval systems pays better than throwing in the towel and admitting that the nirvana of "free form searching" could never be reached.

Somebody forgot to tell Jim Dowe. In the late 1970s, the bearded and Birkenstocked founder and chief scientist of San Diego-based Excalibur Technologies began puzzling through these fundamental questions. A dyslexic computer programmer, Dowe was hampered by his inability to see where he'd mixed up the sequence of letters in the code he was writing. To get around his disability, Dowe decided to program his computer to recognize the overall pattern of a word by piercing beneath its alphanumeric symbols to study the pattern of 1s and 0s that represent each character.

Easier said than done. To expect any human being, much less one with dyslexia, to identify patterns in long strings of 1s and 0s is ridiculous. But unless a human being could first see such patterns, or categories, and define the logical rules that describe them, how could anyone write a program that worked? Back to the indexer's conundrum.

But instead of giving up, Jim Dowe performed one of those miraculous intellectual leaps characteristic of breakthrough technologies. A serious student of biology, Dowe knew nature has its own pattern recognition system, otherwise known as evolution. Random mutations, or copying errors, rearrange the precise sequence of the four biochemical symbols that make up DNA code. Mutations in the genetic code inherited from parents alter the physical characteristics of the offspring.

Usually, a mutation reduces the offspring's chances of surviving long enough to copy its code and pass it along to its own offspring. On occasion, however, a beneficial mutation helps an organism fit into its environment a bit better than its parents. In effect, Charles Darwin's natural selection is the competitive process which sorts the mutations which fit into the pattern of the environment from those that don't. Over the eons, evolution has yielded billions of species, each a finely adapted expression of an unconscious DNA-based pattern recognition system.

Jim Dowe believed that if he could mimic nature's mindless pattern identifier inside a computer, the solution to his problem would emerge spontaneously. To pull it off, he created mutating "digital organisms" -- invisible creatures much like harmless computer viruses -- that are allowed to reproduce only if they evolve the code needed to detect bit patterns in their binary environment. Mutants that can't find patterns, or can't detect them as efficiently, die off. In a few minutes of computer time, thousands of generations come and go until a highly efficient pattern recognizer evolves.

Dowe and his colleagues worked throughout the 1980s gradually refining Excalibur's unique "adaptive pattern recognition" technology. Four years after shipping its first products, the company's sales are running just over $10 million a year. Unlike competing products, Excalibur's document retrieval system requires no up-front indexing. Instead, the software's "learn" mode has a population of digital organisms examine all the bits in a data file, mutating, reproducing, and dying until they evolve a pattern recognition scheme optimized for that particular data set. Without human intervention, it takes the little critters about two minutes to index every word of a five megabyte file, like The Complete Works of William Shakespeare. In "search" mode, Excalibur's system takes less than a second to find Hamlet's "To be or not to be?"

Best of all, because Excalibur's digital creatures operate at the "atomic level" of information, hunting for repeating bit patterns, they aren't thrown off by the misspellings that plague conventional retrieval methods. Close approximations are good enough. Even handwriting and the messy images produced by OCR scanning won't confuse Excalibur's little critters.

At Lockheed's Palmdale, California plant, this "fuzzy searching" capability has proven crucial. With 30,000 pages of safety information scanned in from several hundred suppliers of hazardous materials, Lockheed's materials safety group can now retrieve up-to-date information in under twelve seconds. Along with enhanced worker safety, Lockheed saves several hundred thousand dollars a year by no longer having to index, photocopy, collate, and distribute materials safety data sheets.

With multimedia vendors now facing up to the futility of retrieving sound and video by traditional index-based methods, Jim Dowe's digital creatures are finding their most promising market niche. Excalibur has recently licensed its core technology to a slew of industry giants including IBM, Hewlett-Packard, DEC, and Sun. "Quite frankly, most of our customers don't have any understanding of the self-organizing, evolutionary nature of our software," admits Mike Kennedy, Excalibur's CEO, "They just want a system that solves their problem."

But even that may not be enough. Jim Dowe says, "Most programmers react to our technology with hostility. They simply refuse to believe that unconscious evolution can work where intentional design fails. It upsets their world view." In this, Excalibur is not alone. The bionomic view that solutions to the complex problems of the Information Age must be allowed evolve from experience instead of being engineered in advance offends everyone from the "reengineers" inside corporations to the social engineers in Washington.


Copyright 1994 The Bionomics Institute

| Resources Page | Conf 95 |