1997 Jan 01
Group 1 Software Model1
HNC Software Marksman
David M. Raab
DM News
January, 1997

Statisticians are like tax accountants–useful, but painful and costly to visit. So just as tax preparation software is a popular substitute for paying a live accountant, marketers are always interested in modeling software that lets them do without a professional statistician. The challenge, for both tax and modeling software, is to mimic the judgement as well as the mechanics of a trained professional.

Model1 (Group 1 Software, 800-368-5806; www.g1.com) represents one of the most ambitious attempts so far to automate the model building process. Instead of trying to choose an appropriate method in advance, the way a statistician might, Model1 executes several methods and then picks whichever gives the best results. The system can build models with RFM (Recency-Frequency-Monetary Value), logistic regression, linear regression, CHAID (Chi-Squared Automated Interaction Detection), and neural networks. To make this approach more effective, Model1 builds multiple versions of each model type, using “genetic algorithms” that draw on previous results to select variables and refine the model structure.

If that sounds like a lot of processing, it is. Model1 can rip through multiple methods on a small demonstration dataset (2,700 records with 25 variables) in a few minutes. But with a more realistic sample of 30,000 records and several hundred variables, a single neural network model takes 10 minutes on a top-of-the-line, 200 MHZ PentiumPro PC and 30 minutes on a more common 90 MHZ Pentium. More methods and multiple versions of each model would expand the time to hours or possibly days. Cross validation–which increases reliability by building ten separate models on samples that each exclude one-tenth of the test data–takes ten times as long every time it is used. Clearly, Model1 users will not employ the system’s full capabilities for each project.

The system does let users control which methods are used and how intensely they are pursued. Users can set the number of records in the test sample, the variables included, and detailed parameters such as the number of iterations used to train a neural network. They can also control the sequence in which models are built, allowing them to run the most promising models first and cancel later methods if these do not seem productive. Users who do not want to invest the time for cross validation can still test against a set of records excluded from the model-building process.

There is much more to Model1 than multiple models. The system includes a well-designed data import capability that can read ASCII files or use a third-party data loader to access relational database tables. Data is loaded into a large spreadsheet with a theoretical maximum of 16 million rows and 16,350 columns. (The practical maximum is determined by the user’s hardware.) Users can identify specific data types–such as telephone numbers or time zones–that are automatically treated in special ways during model generation. In addition, users can define macros that create calculated values, assign records to categories, handle missing values and perform other transformations. Model1 provides some basic statistics and plotting of input data, but pretty much relies on the user’s independent activities to identify relationships that require complex derived variables. The quality of such data preparation can have a substantial impact on the power of the final models.

Model1 does simplify matters by providing different modules to handle specific types of problems. The initial module, with a mid-January 1997 release, builds “response” models with a yes/no result. Later modules, due before June, will be tailored for customer valuation, cross selling, and segmentation. Each module will have inputs and reports tailored to its functions. The response module has a campaign optimizer that lets the user enter key variables–cost per contact, profit per response, maximum number of contacts, maximum budget and minimum number of responses–and produce reports that show the expected results for different sized mailings, along with the optimal mail size to maximize profits and to break even. Other reports compare the results of different models in a “gains chart” and show the relations of individual variables to a selected model. The system lacks more advanced model analysis tools, such as the ability to display the “tree” generated by a CHAID model or to directly examine the change in model score from altering a single value in a record.

Once a model is selected, it can be implemented by either importing a new data set to score within Model1 or by creating a run-time module that will work outside of the system. Model1 automatically executes the data transformation macros that were applied to the original data set.

Model1 was developed for Group 1 Software by Unica Solutions Inc., a statistical software and consulting firm which uses similar but less automated technology in its own Pattern Recognition Workbench. The Model1 product is priced at $30,000 for a single module plus $15,000 for additional modules, with a $5,000 introductory discount offered through June 1997. Group 1 is offering special arrangements for service bureaus. Prices include one day of training per module with additional training available. The system runs on a standard PC using any version of Microsoft Windows.

Marksman (HNC Software, 619-546-8877) takes a focused approach to automated modeling, limiting itself to neural networks. It includes correlation and cross tab tools to help users identify relationships among variables and has automated capabilities to combine and select variables for the model. The system builds several models simultaneously–16 is the default–using different model structures. After each set, it attempts to reduce the number of variables and builds the models again. Using specialized hardware to add processing power to a standard PC, Marksman can complete this process in three to four hours on 50,000 records. Reports show the importance of different variables in the model, the expected lift at different mail volumes, and expected results based on mail quantities set to meet a fixed budget, target number of responses or maximum profit.

Marksman is priced at $48,750 including the PC workstation and specialized hardware. It was introduced in 1996 and has sold about 20 copies.

* * *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

Leave a Reply

You must be logged in to post a comment.