2003 Apr 02
Total Surveillance Technology
David M. Raab
DM Review
April, 2003
.

It’s like one of those propaganda newsreels from the 1940’s: “Marketing Software Goes to War”. Yes, the lowly duplicate detection systems designed to eliminate unnecessary catalogs are now hunting bigger game–terrorists at our borders. Fraud detection software has graduated from guarding your cell phone to protecting airports, water supplies and ball parks. Credit card databases that once minded your spending limit are now watching your every move for suspicious behavior.

Like pictures of determined Cub Scouts searching the skies for Nazi warplanes, this is all very cute but doesn’t inspire much confidence. One hopes there are professionals hidden in the background who have been working on the problems for years and already come up with more reliable solutions. At a minimum, systems professionals must hope that the people in charge realize how error prone any automated approach must be.

Just what the people in charge do or do not realize is itself a well kept secret. One public face of the government’s effort is the DARPA Information Awareness Office (www.darpa.mil/aio), which lists the ambitious goal of achieving “total information awareness that is useful for preemption, national security warning, and national security decision making.” Specific projects include “focused warnings within an hour after a triggering event occurs” (Total Information Awareness System), “technology enabling ultra-large, all-source information repositories” (Genisys), “automated discovery, extracting and linking of sparse evidence contained in large amounts of classified and unclassified data sources” (Evidence Extraction and Link Discovery) and “automated and adaptive behavior prediction models tuned to specific terrorist groups and individuals” (Wargaming the Asymmetric Environment). This is in addition to more prosaic work on machine translation, text analysis, physical surveillance, and decision support.

Published descriptions show that some of these programs are based on previous projects within the intelligence community. But private conversations with commercial vendors make clear that they too are involved in this sort of work, if not necessarily in these particular DARPA projects. (In fact, the U.S. Congress in February placed severe restraints on the Information Awareness program. But it’s safe to assume similar research will proceed elsewhere.)

This brings back the original question: how reliable are these new surveillance systems likely to be? If secret government research has produced advanced solutions, the answer should not be based performance of commercial products. But the continued involvement of commercial vendors in these projects suggests that their technologies are as good as the government’s own. Another hint is that the government uses commercial technology for many of its own deployed systems, and has delegated critical surveillance functions–such as watching for suspicious funds transfers–to private institutions that themselves use mostly on commercial software. And at the most basic level, both government and commercial researchers must deal with the same fundamental constraints imposed by the nature of the underlying tasks themselves.

What are those tasks? Apart from issues related to language interpretation, the DARPA program really focuses on two challenges: assembling data from multiple sources and making predictions based on behavior patterns. Both have long been the subject of commercial activity.

Data assembly has a mechanical component, which is essentially the extract, transform and load (ETL) function long familiar to data warehouse developers. A “total” surveillance system would surely supplement batch-oriented ETL with real time data feeds. While this is a different process, it is also reasonably well understood. In short, there are few theoretical obstacles to data assembly, although there will be many practical challenges at the gargantuan scale required for a total surveillance system.

There is, however, one theoretical obstacle: establishing identity. Information with a specific ID, such as a passport or credit card number, can easily be tied to other information using the same ID. But different sources uses different ID systems, and the ID numbers themselves can be misreported, intentionally or not. Thus any surveillance system must deal with the decidedly non-mechanical problem of linking records based on name, address and other information that itself may be non-unique and inconsistently recorded.

Happily, governments and businesses already have more effective commercial linking technology than the standard deduplication systems used for catalog mailings. But even this technology is imperfect, and any mistake is costly: a missed match means a terrorist passes undetected, while a false positive can disrupt the life of an innocent person. The obvious solution–accepting lots of false positives to avoid missing any true matches–works in applications like border control, but causes unacceptable pollution when assembling dossiers in a database.

Of course, data generated by terrorists contains more than random errors. They hide their tracks intentionally through simple tricks such as using initials and variant name spellings, and more advanced ones like using multiple identities. Software can counter some techniques: for example, it can link all identities at the same address. But clever operatives will establish–or steal–identities that have no logical connection. So perfect linking will never be possible.

The second DARPA focus is predicting behavior based on patterns. This is another field with an established commercial market, both in security matters such as fraud detection and in marketing applications such as response and attrition modeling. Like linking software, pattern detection systems face a fundamental limit on their accuracy: they need previous patterns to use as a basis for prediction. While it’s unlikely that twenty Middle Eastern men could take flight training today without being noticed, it’s equally unlikely that terrorists will try. Instead, they’ll do something different–and if it’s truly unique, no pattern detection system will notice.

Of course, it’s a good thing that the U.S. has seen too few terrorist incidents to identify many patterns. And it’s possible that some information can be gained from other places where terrorism is more common. It’s also possible that a system might simply highlight activity that’s unusual without necessarily being suspicious. This could at least lead to closer investigation–so long as the investigators have a high tolerance for false alarms. But it’s hard to imagine that pattern detection systems will ever provide anything close to comprehensive protection.

In short, the two key tasks at the heart of total surveillance are inherently limited: all data regarding an individual can never be perfectly linked, and patterns predicting terrorist acts can never be perfectly detected. Of course, perfection is an unnecessarily high standard–remember that DARPA’s own statement sets the bar at a much less ambitious “useful”. But it’s important to recognize that these systems have very real limitations. Readers of this magazine will interact with these systems as professionals by working on them, feeding them data, receiving their outputs, or commenting on them to others. This gives us a special responsibility to insist that total surveillance be treated with the same intelligent skepticism as any other systems project. Or–with both liberty and security at stake–perhaps even a bit more.

* * *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

Leave a Reply

You must be logged in to post a comment.