1998 Feb 01
SAS Institute Enterprise Miner
Urban Science GainSmarts
CrossZ Software CrossZ Voyager

David M. Raab
Relationship Marketing Report
February, 1998

Modeling is like cooking: some people enjoy doing it, but most of us would rather just consume the result. Unfortunately, database marketing has hugely increased marketers’ appetite for models–and an unlimited supply of professionally-built models is roughly as affordable as eating at a five-star restaurant every night. So many marketers must rely on the statistical equivalent of home cooking–with predictably uneven outcomes.

Given the situation, it’s no surprise that much of today’s statistical modeling software resembles nothing so much as a cook book. In particular, these systems lay out a step-by-step modeling process and then lead the user through it. But, just like all but the most basic cook books, this software is best used by someone with a reasonable level of prior experience. True neophytes can still create a disaster.

GainSmarts (Urban Science, 313-259-9900; www.urbanscience.com) is designed specifically for direct response modeling. The system includes modules for sample creation, data preprocessing, profiling, predictive modeling, and selection. In each module, it attempts to mimic the processes used by expert statisticians when they build a model manually. But it does not attempt to create a fully automated process, instead requiring users to examine the output of each step and determine which of the system’s recommendations to accept. This is a conscious choice by GainSmart’s developers, who wanted to ensure that a knowledgeable user stayed involved in the project. Ultimately, this makes GainSmarts a tool to enhance the productivity of statisticians rather than replace them. In cooking terms, it resembles a food processor.

The actual models built by GainSmarts use a variety of regression techniques that have proven especially effective in direct marketing applications. It uses a rule-driven “expert system” for the critical task of choosing which variables to include in the model. The system gained considerable notice last year when its methods took first place in a prestigious competition sponsored by the American Association for Artificial Intelligence. Results from this competition are available on the Web at www.epsilon.com/kddcup.

GainSmarts also offers CHAID and several other versions of tree-based analysis. This can be used to identify distinct file segments, a task for which regression is generally inappropriate.

The selection portion of the system can either produce scored records or generate SAS code to calculate scores externally. In addition to scores, the system can optimize profits based on promotion costs, order values, a maximum mail quantity and limited product inventory. It can also choose the best of several offers to make to each individual. The system produces extensive reports on its data analysis and projected promotion results.

GainSmarts is based on a combination of SAS and proprietary programs. It will work either flat ASCII files or SAS data sets as input. The system runs on Windows NT or Sun Solaris workstations. It is priced at $45,000 with discounts for multiple users. GainSmart has been used internally by Urban Science, a large statistical consulting firm, since 1996. The system was released as a product in October 1997 and has sold four external licenses.

Enterprise Miner (SAS Institute, 919-677-8000, www.sas.com) follows SAS’s own sequence of modeling steps, called SEMMA (for Sample, Explore, Modify, Model, and Assess). Enterprise Miner is also designed to help experienced statisticians become more productive. In particular, the system lets the user create a graphical flow chart of the SEMMA process, defining the specific actions to be performed at each step. This flow chart is built by dragging icons from a tool palette, making it easy to understand and modify the approach that will be taken to each project.

Enterprise Miner includes several procedures that automate traditional analysis tasks, such as choosing the variables to include in a model and applying the appropriate transformations. Users can control the details behind these procedures and review and override their results. The system also provides extensive visualization to help users explore the data to decide on additional manipulations that the system itself does not recommend. While many semi-automated products will select and transform single data variables, Enterprise Miner will also use a CHAID-like tree analysis to create dummy variables that incorporate interactions among multiple variables. Identifying such interactions are often critical to successful modeling, and the inability to find them has been one of the main drawbacks of semi-automated modeling products.

In addition to tree-based models, Enterprise Miner offers several types of regression and neural network modeling. The system can run multiple models within the same project and then compare the results against each other. These comparisons are based on promotion return on investment as well as traditional statistical measures. Scoring can be done within the system or by extracting SAS code modules that can run any system with base SAS installed.

Enterprise Miner has been in beta testing at about 30 sites since December 1997 is scheduled for formal release in March of this year. The system runs on Windows 95 and NT clients and Windows NT and Unix servers. Pricing will be start at $80,000 for an NT server and five workstations and be higher for more powerful servers. It can read data from most standard sources via other SAS products and stores its own data in a proprietary format.

CrossZ Voyager (CrossZ Software, 888-848-3883, www.crossz.com) is primarily a data exploration tool for non-statisticians, but it also incorporates modeling capabilities. It is built to exploit CrossZ’s “QueryObject” technology, which provides fast access to highly compressed multi-dimensional data sets. In particular, it is designed to automatically determine which data elements, from a large warehouse or other source system, should be built into a QueryObject to explore a specific business problem.

The system attempts to make this process as automated as possible. It will read data from dBASE, ASCII, SPSS or any ODBC source and then examine each data element to determine its type. It will flag elements that it has trouble classifying and lets the user override its choices. The user can perform a limited number of transformations, although Voyager will not recommend or execute any transformations on its own.

Once the data is loaded, the user is asked to identify a specific element as a “goal”. The system then runs statistical routines to determine which other data elements are the best predictors of the goal. Optionally, it will also create a regression model or tree-based model using the selected elements. Although the system provides gains charts and other statistics to assess the power and profitability of these models, it does not allow users to export the actual scored records or the model algorithms. This is reflects the product’s primary focus on data exploration.

The final step in the process is to create a QueryObject incorporating the selected data elements. This can then be analyzed using CrossZ’s multi-dimensional data viewer or accessed through a ODBC connection by third-party query and reporting tools. Because QueryObjects are highly compressed–often under 100 megabytes on a much larger input source–these can be loaded onto a laptop or made available across a network for easy access throughout an organization.

Voyager was launched in December 1997 and is priced at $1,000. It runs on Windows 95 and Windows NT systems. As of late January, the vendor had sold about a half-dozen copies.

* * *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

Leave a Reply

You must be logged in to post a comment.