1996 Jun 01
Trajecta dbProphet
David M. Raab
DM News
June, 1996

It’s about three years since direct marketers first became widely aware of neural networks as an alternative to traditional statistical techniques for building predictive models. Like conventional techniques such as logistic regression, neural nets create equations that use known information about a record to calculate the relative likelihood of that record having another, unknown characteristic–such as responding to a direct mail offer. These equations, or “models”, are developed by working with records for which the correct “answer” is already known, such as a list of people who were sent and responded to a previous mailing. The promise of neural networks is that they can build more accurate models because they capture subtle relationships among data elements that conventional statistical techniques will miss. A second promise is that these models can be built much faster and without the help of a trained statistician.

Neither promise has been fully kept. Comparative tests generally show that, with occasional deviations in both directions, neural nets and carefully-built conventional models perform about the same. And, although a neural net can definitely be built faster than a statistical model–in hours or days rather than weeks–the bulk of the work still lies with data preparation, which doesn’t change much regardless of the modeling technique. Development time is reduced largely because the work is done by the marketers or business analysts themselves instead of someone else.

Experience has also uncovered some new difficulties. Neural nets have a tendency to “overtrain”, meaning they treat random variations in the test data as significant. A model with this problem will perform considerably less well than expected on fresh data. Overtraining can happen with conventional techniques as well; in both cases it requires a skilled modeler to avoid. With the neural net, a person with the right skills may not be present.

A second problem is that neural net models can be hard to understand. This is partly because they are more complex than conventional models and partly because they are generated automatically instead of being built incrementally by a statistician. Understanding a model helps to find errors and can provide valuable business insights.

Neural net vendors are well aware of these problems and have produced data preparation and model analysis tools to address them. Armed with these tools and a healthy attention to detail, non-statisticians can indeed use neural nets to create sophisticated, accurate models at a fraction of the time and cost of conventional techniques.

dbProphet (Trajecta, Inc., 800-250-2242; 512-326-2411) is carefully tailored to exploit the strengths and minimize the weaknesses of neural network modeling. It relies heavily on visualization to simplify data preparation and to help users understand the resulting model.

dbProphet works with data that is stored as a flat data file–that is, all data for each “case” is in a single record. Currently the system requires that data be placed in a flat ASCII format before it is imported. Trajecta is working to import data directly from multiple relational tables.

Once the data has been loaded, dbProphet provides several tools to help understand and prepare it for modeling. Exploratory tools include histograms, frequency distributions, correlation plots, and principal components charts. Transformations can include calculated fields, treatments for missing values, and processes to exclude out-of-range or invalid entries. Some transformations can be accomplished visually, such as excluding values by highlighting a range on a graph. There are special functions to handle time-series data.

Before the system builds a model, the user specifies the data elements to use as inputs and the values to predict. The user also determines how records will be split among the training, testing and validation sets, which are used respectively to build the model, test the model for errors, and report on the final model performance. A typical analysis uses about 40,000 training cases, although the software could actually handle millions of records given adequate hardware.

Like any neural network, dbProphet works by starting with an arbitrary model and then modifying it repeatedly, retaining changes that yield more accurate results. On a Pentium PC with 32 MB of RAM, the system can run through 30,000 cases with 16 input variables in about 15 seconds. A typical run might repeat the process 25 times and take about six minutes. One striking feature is a horizontally scrolling line chart that compares the predicted and actual values as each case is run through the test model. This lets the user watch the system improve its accuracy as it trains itself.

Default settings allow the system to build a model without much direction from the user, although neural net experts can control the details if they wish. dbProphet also includes patented algorithms to deal with correlated variables, which Trajecta says allows it to perform better than other neural network systems.

dbProphet provides a range of analytical tools to understand and exploit the final model. Again, these are highly visual, including graphs of estimated vs. actual results, the impact of individual variables on projected results, the expected output of a set of user-defined inputs, and the inputs needed to produce a user-specified output. This is done by moving on-screen “sliders” representing the value of one variable, and watching how sliders representing other variables change as a result. The system has extremely powerful optimization features, capable of associating non-linear values and costs with input and output variables and of determining the most profitable business strategy under a variety of constraints.

Like other neural net systems, dbProphet does not produce explicit if/then rules or cell definitions. Models can be applied either by loading additional data into dbProphet and appending scores or by exporting “C” programming code generated by the system.

dbProphet runs on Unix, DEC VMS and Windows NT workstations and servers. The system was introduced in December 1995 and has several sales pending. With a price of $165,000 per copy, it is aimed at large direct marketers and retailers. Trajecta also offers modeling services beginning at $20,000 for a straightforward response model.

* * *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

Leave a Reply

You must be logged in to post a comment.