1997 May 01
Segmentation Systems
David M. Raab
Relationship Marketing Report
May, 1997

Database marketers often think of segmentation solely in terms of predictive models–that is, which customers or prospects are most likely to behave in a certain way. In fact, segmentation can involve other types of analysis, such as splitting a file into groups with similar characteristics or showing relationships among data elements. General-purpose statistical packages such as SAS (916-677-8000) and SPSS (312-329-3500) provide different tools for all these tasks and remain the most commonly used products for database marketing applications. But these systems require highly trained users and substantial statistical skill, which has limited their application to firms that can afford in-house statisticians or consultants who charge upwards of $25,000 for a single model. Users without such resources must find software that is considerably simpler to use.

Such software needs more than a powerful modeling capability. It has to guide a potentially ignorant user through the segmentation process, paying particular attention to the preparation given to data before it is actually processed. It must run quickly enough to meet marketers’ deadlines. It must provide outputs that are meaningful to non-statisticians. It must make it easy to score new records once the model is built.

This is a demanding set of requirements, but the potential market is large and many vendors have attempted to meet it. Their products fall into three broad groups.

Neural network products attempt to mimic the way the human brain “learns”, by testing different relationships among inputs and outputs, adjusting the results based on errors, and repeating the process until it appears no more improvement is possible. The promise of neural networks is that they can automatically capture complex data relationships that are missed by traditional statistical methods. This should yield more powerful models with less need for a skilled user. Actual performance has been mixed: in some cases the neural nets actually do perform better, but in other situations they can be less reliable than conventional techniques. And it turns out that skilled data preparation is still very much required, even if statistical expertise is not.

There are other challenges with neural network systems as well. The technique requires a great deal of calculation, meaning that runs can take hours even on powerful computers. The resulting models are expressed as very complex mathematical formulas, making it them hard to understand and evaluate for reasonableness. The complexity of the models also makes it difficult to reproduce them in other systems if scoring is to be done outside of the neural net package itself.

Neural net packages built for marketers must address these issues. ModelMAX (Advanced Software Applications, 412-429-1003) is a PC-based package that provides an extremely simple interface tailored specifically to predictive models for direct response marketing. Introduced in 1993, the system is the best known of the neural net database marketing systems. It includes tools to extract and combine data from multiple sources, to analyze input data to choose which elements to use in a model, and to predict the profitability of mailings to different file segments. The models themselves use a simplified structure that speeds processing–typically two to three hours on a standard PC–although they may miss some subtleties in the data. System reports show the importance of different variables in a completed model and the system can generate “C” program code to score data externally. ASA also produces dbPROFILE, a data analysis and visualization tool that helps with data preparation, and specialized scoring tools for applications including credit card fraud detection. The base system costs about $25,000.

DataBase Mining Marksman (HNC Software, 619-546-8877) adds a powerful supplemental processor to the user’s PC, letting it process a large number of complicated models in three to four hours. The system includes some data preparation and analysis tools but largely relies on its internal processing to select appropriate variables. Reports show the variables used and how their values differ for each file segment, offering some insight into the model’s underlying “logic”. Several other reports illustrate the expected profitability from mailings based on the models. Once the model is built, additional records must be imported to the system to be scored. Marksman is priced at just under $50,000 including a PC workstation with the processor board installed.

dbProphet (Trajecta, 512-326-2411) is aimed at more sophisticated users who want help building neural nets more efficiently. The system has a variety of striking graphical interface tools that let users visualize and transform input data, watch the system build its models, and explore the sensitivity of model results to changes in inputs. It lets users define complex rules both for transforming input data and can determine the most profitable promotion strategies given multiple constraints. The system runs on the Unix, Windows NT or DEC VMS operating systems, with separate workstations and servers. It is extremely fast, building a simple model in under ten minutes. Pricing matches the system’s sophistication, starting at $165,000 for a single copy.

The second major group of segmentation systems use “tree analysis” techniques that repeatedly split a file into smaller and smaller subsegments. CHAID (Chi-Squared Automatic Interaction Detection) is the older and more common technique, orginally developed to help statisticians find relationships when preparing conventional regression models. CART (Classification and Regression Trees) is a more recent variant that overcomes some of CHAID’s technical limits. Both methods work by finding the combination of variables that split a file into groups that are as different as possible and then further splitting the resulting groups. Splitting continues until segments reach a minimum size or no significant differences are found.

These methods are fast, powerful, easy to use, and provide understandable results. However, they do not produce numeric model “scores” to rank individual records for promotion selection. CHAID also can also sometimes make poor “choices” early in the process that harm results later in the tree. CART uses statistical methods that make this somewhat less likely.

KnowledgeSeeker (Angoss Software, 416-593-1122) and SPSS CHAID+ (312-329-3500) are both CHAID-based tools, while CART (Salford Systems, 619-582-7534) uses CART. Each costs under $1,000 for a PC-based version. These systems provide basic data import and preparation capabilities, can develop trees in under a half hour on a reasonable sample of data, and allow the user to view and modify the trees on-screen. KnowledgeSeeker is particularly strong at displaying results graphically, ordering segments based on their predicted response, and providing segment definitions in the form of “pseudo-code” that can be used to generate computer programs.

The third group of software products use conventional statistical methods but attempt to make them more accessible to non-technical users. ARM (Demographic Research Company, 310-645-7195) was built as a “glass box” that lets the user see and control what is going on within the modeling process. The process is guided by an “expert system” that asks the user about the current project and suggests appropriate techniques at each step. ARM is built for a non-technical users, although it is also used to increase the productivity of experienced statisticians. The system includes a wide range of data preparation and analysis routines and offers statistical methods including neural networks, CHAID, linear regression, logistic regression, discriminate function analysis and principal components analysis. It can build both predictive models and classification models. Outputs include a variety of diagostic reports and some profitability analysis. Once a model is built, new records are scored by importing them into ARM and applying the model. The system runs on DOS, Windows and Unix systems and costs about $30,000 for an annual license and consulting support. (Note: actual pricing as of 4/97 is $25,000 for annual software license with an option for $10,000 in annual on-site consulting.)

Model1 (Group 1 Software, 800-368-5806) was introduced earlier this year as an easy-to-use tool for non-technical modelers. The product provides a variety of data import and transformation tools but offers little in the way of input analysis or diagnostics. Instead, the system relies on its ability to generate models using multiple techniques and combinations of input variables to find the best possible solutions without extensive preprocessing. Model1 can build neural net, CHAID, logistic regression, linear regression and RFM (Recency-Frequency-Monetary) models and will do so automatically against the same data set if the user wants. The time to build the models depends on the technique and the amount of testing the system does for reliability. A single neural network might take 30 minutes to build, but a variety of models with extensive validation might run for several days. As a practical matter, users are expected to run a few model types against a small data sample and then select only the best ones for more extensive exploration. The system will automatically plot the results of the different models on a graph to simplify the comparison. Reports also show the importance of each variable to a model and the values of each variable in different model segments, but do not provide more detailed analysis.

New records can be scored within the system or by creating “C” code modules to export to other systems. The initial release of Model1 provides response modeling with appropriate profitability reports. Additional modules are planned for customer valuation, cross selling and customer segmentation. The system is priced at $30,000 with a single module and $15,000 for each additional module.

* * *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

Leave a Reply

You must be logged in to post a comment.