1999 Jan 01
Technologies for Complex Segmentation
David M. Raab
Relationship Marketing Report
January, 1999
.

Today’s state-of-the-art customer management involves many long-term, multi-step campaigns, tailored to different customer segments and executed daily. Although the concept is widely understood, it’s still hard to find companies who are actually doing it. Among other things, this means that the software offered for this task has not necessarily been tested in full scale, high volume operations. In fact, given the haste with which most vendors launch products, you can almost guarantee that most systems will have problems the first few times they try to do this for real.

So just how do you design a system to handle really complicated, large scale contact management programs? Vendors have taken widely different approaches–a sparkling display of creativity unfettered by experience (in the sense that when these systems were designed, no one had run marketing programs on the scale now envisioned). Still, it’s roughly possible to place their approaches on a continuum between two extremes.

The first extreme is to evaluate each customer once, running its data through a complicated selection tree to find all the branches and outputs that apply. This approach has the great merit of loading the customer’s data one time only, thereby minimizing the volume of data access–which is often the greatest bottleneck in system performance. This is the oldest segmentation technique, one that came quite naturally when records were stored on tapes and necessarily read in sequence. Many early database marketing systems, both on mainframe computers and PC’s, worked this way: all data relating to a customer was physically stored together in a hierarchical data structure, and selections were made by loading the block of data associated with a customer, processing it through all the segmentation rules, storing the results, and then reading the next customer’s data block and repeating the process. On the big service bureau systems, dozens or even hundreds of separate selections might be made simultaneously during one nightly pass of the file. This is still the approach used by most large data vendors to fulfill their customers’ orders. Among modern database marketing systems, Decision Software Inc.’s TopDog uses the technique in almost pure form, except that it first runs a relational database query to pull together the necessary information from separate tables. TopDog then steps through the resulting file one customer at a time, reading the data and executing the actual selection logic outside of the relational database itself. As an extra bonus, this lets TopDog use its own logic to execute all those annoying functions–like random sampling and calculations within groups–that are difficult or impossible within the standard relational database query language, SQL.

Of course, relational database purists find the idea of working outside SQL to be anathema. They occupy the other extreme of the design continuum, where each segment is selected directly from the relational database with its own SQL query. This approach has the virtue of consistency with other SQL-based systems, and, of more practical importance, lets the system benefit from performance-enhancing capabilities built into the relational database software and associated hardware. As a result, users get on-line queries and fast execution of campaigns that involve small numbers of records, so long as SQL can find them without having the read the entire database. With a small number of unrelated selections, using separate queries can be faster than running one big sequential pass. But when segments are linked in a hierarchy–with one segment being the subset of another, either due to a sequence over time or to splits based on customer characteristics–selecting each segment with an independent query against the main database is horribly inefficient.

Despite the drawbacks of independent SQL queries, they are used by most of today’s leading campaign management tools including Exchange Application’s ValEx, Prime Vantage and Recognition Systems Inc.’s Ideas Solution. None relies on SQL exclusively, since tasks like sampling must still be handled externally. These systems also can all flag a set of records and select against that set in a subsequent segment–thus limiting the SQL query to a small universe rather than the entire database. This saves much effort, but introduces its own drawbacks: flagging must be specified manually, the flagged sets take time to build and space to store, and each segment still generates the overhead of a separate SQL query. In reality, vendors tend to rely on other methods–including as parallel processing, powerful hardware, and splits made outside of the actual campaign selection–to get the throughput they need. Even then, it is not unusual to hear of large systems that need 10 or 12 hours to finish each nightly run.

A less critical problem with independent queries is the need to write increasingly complex queries to execute a multi-level selection. That is, each query must contain SQL code to exclude records selected in other segments, or to include only records belonging to its “parent”. Recognition Systems and Prime Response generate much of this code automatically, limiting the burden on the user and the potential for error. Exchange Applications relies more on limiting the complexity of the campaigns themselves–allowing only one level of segments defined by SQL queries, plus a separate level of splits. While this does impose some limits on the user, Exchange would probably argue that they can still accomplish pretty much whatever they might want.

Other vendors have take an approach between the two extremes. Paragren Technologies extracts the required data from the underlying relational database, using special techniques that let it execute a single pass no matter how complicated the original selection. The data is placed in a flat file, where the user can then sort and segment records without requerying the main database. Marketing Synergy accepts separate SQL queries for each segment but then evaluates them together, running them in the appropriate sequence and automatically excluding records in prior segments from later segments. Marketing Synergy also can build a separate, non-relational database using bit-based indexes, and query against that instead of the relational database.

Whatever a vendor’s current approach to segmentation, it will be challenged as nightly updates are replaced by online interactions. Most current systems do not provide a real-time solution: instead, they generally produce a static file of current and planned messages for each customer, which is read by online systems as transactions progress. The messages are typically updated as part of nightly or weekly batch process, although sometimes the interval is shorter. Still, even updating every few minutes is not the same as reacting to a transaction as it happens–say, deciding what supplemental product to offer in conjunction with a new purchase. This requires that the system actually accept new inputs, make a decision, and feed back the result.

The flow of such a decision-making process looks a lot like a conventional, branching campaign tree. Since one record at a time is run through the system, the underlying technology is probably closer to the old-style sequential access than to multiple independent SQL queries. But while the old sequential products executed the entire tree from start to finish, an interactive system will start and stop within the tree as the transaction progresses. So even vendors using the sequential approach will need to make adjustments.

One intriguing question is whether vendors who develop single-record processing for interactive applications will later change their batch processes to use the same technique. So far, the vendors who started with batch selects and added interactive–Recognition Systems and Paragren–appear to have kept the two processes separate. On the other hand, systems that started with an interactive focus, like “marketing automation” vendors Rubric and MarketFirst, appear to use the same technology for both. While the scalability of the “marketing automation” systems is even less tested than scalability of conventional campaign managers, it does seem likely that most vendors will eventually use a single method for batch and interactive processes.

So what does all this mean for marketers? First, if your daily campaigns have more than a hundred segments, you need to take a close look at the technical details of how a proposed campaign manager would run them. Second, if you want to mix batch and interactive campaigns, you’d best wait until the vendors figure out how to do it–and even then, be ready to do some careful testing to make sure you know what you’re really getting.

* * *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

Leave a Reply

You must be logged in to post a comment.