1995 Jun 01

Miglautsch Marketing LOOP MD
Newspaper Marketing Solutions Inc. Pro*Filer
David M. Raab
DM News
June, 1995

One of the most common reasons to build a marketing database is better file segmentation. So it may seem surprising that few marketing database systems directly integrate advanced segmentation tools with their other functions. Instead, most products require users to export data to external statistical or neural network systems, and then to somehow reimport the results.

There are several reasons for this paradox. People already doing advanced segmentation are reluctant to abandon their current tools for a new product. People not already doing advanced segmentation often have other, more pressing objectives for their marketing database system. And software developers are reluctant to build functions that are already available in third-party packages. Indeed, the decision to leave out integrated segmentation is probably a sound one: most marketing software developers who do offer integrated segmentation tools report that only a small fraction of their clients actually purchase those options.

Still, some users would employ a segmentation tool if it were made easy enough–that is, if they didn’t have to learn a complicated segmentation technique and then manage the messy details of file extracts, conversions and imports. The key to building such a system lies in finding the right statistical technique: one that is powerful enough to be useful, yet efficient to run, easy to understand and simple to automate. Regression methods are too hard to learn, while neural network software is hard to create and takes a lot of computing power. Perhaps the best choice is Chi-Squared Automated Interaction Detection (CHAID), a technique that repeatedly splits a file into significantly different segments.

CHAID is relatively simple to program, runs quickly, provides reliable results and yields easily-understandable segments. It is available in stand-alone form in SPSS’s CHAID+ (800-543-9262) and Angoss’ KnowledgeSEEKER (416-593-1122). More to the point, it is directly embedded in two marketing database packages.

LOOP MD (Miglautsch Marketing, 414-542-5633) is aimed at catalog and retail marketers. The system was developed for Miglautsch Marketing’s consulting clients, who needed access to their data and an easy way to integrate automated file segmentation with sophisticated campaign management. It runs on Unix servers, either Intel-based machines with one or more processors or mid-range systems such as the HP 9000. Data is stored in the Borland Interbase database system, a relational product that can scan about 300,000 records per minute on a powerful Unix server. Users access the system from an OS/2 or Windows desktop computer.

The system is typically mounted on the customer’s in-house computer, where files are built by regular extracts from company transaction and customer files. The update process is highly automated, although adding new fields or calculations usually requires assistance from the vendor. A typical load pulls down transaction detail and creates customer-level transaction summaries, so that most processing can be done without reading the transaction tables themselves. The load also usually creates the sample files used for quick tests of new selections and reports.

The heart of the system is the integrated segmentation and selection process. This incorporates SPSS CHAID+, and allows the user to build a CHAID model to predict any desired variable. The system will then use the model to rank records in the file, present the results in a graph, let the user pick a cut-off level, and automatically create the SQL statements that extract the chosen records.

Promotions can include multiple cells, each of which can be a single predefined CHAID selection, another type of predefined query, or created by choosing values for a limited number of fields listed on the promotion screen itself. The campaign management feature includes a full range of functions, including random samples and splits within cells; keycodes, offers, and other data stored for each cell; selection by title or recency within a business site or household; and automatic elimination of duplicates across cells within a promotion. The system can create separate output files for individual cells, to handle situations such as retail chains where each store gets a tape with only its own customers.

While the segmentation and campaign management areas were custom-developed by Miglautsch Marketing, the balance of system relies mostly on third-party software. Address procsesing and consolidation is handled primarily by Postalsoft products. Cross tabs and other analysis are done primarily by Cognos PowerPlay, a powerful drill-down and reporting tool that works by extracting selected data elements into its own files. It can over an hour to build a PowerPlay extract table, although response is virtually instantaneous once the table is built. Users can also apply Paradox query and reporting tools, Crystal Reports, the Quattro Pro spreadsheet, or other products that read the Interbase files via ODBC (Open Database Connectivity) links.

Although Interbase allows multiple data tables, LOOP MD databases are usually set up so most queries can be handled by accessing data on the customer level alone. Where transaction detail is needed, it is summarized and posted to the customer table, typically in an overnight process. Reports and selections are also usually run overnight or against small samples of the main database. The largest existing database handles about 5 gigabytes of data, relating to 5 million customers.

The first LOOP MD system was developed in 1992, and the system now has four installations. Installations usually begin with a $50,000 to $100,000 project to build and test an initial model. The system itself is priced based on the number of customers, starting at around $50,000 per year for up to 500,000 customers.

Pro*Filer (Newspaper Marketing Solutions Inc., 415-721-2975) was built to let field salespeople–with little training and low-powered portable computers–generate instant lists of households to show prospective buyers of direct mail or non-postal carrier delivered advertising. Created by a subsidiary of Alternate Postal Delivery, the nation’s largest private delivery company, the system has been used initially to help newspapers develop targeted marketing campaigns for their own circulation and their advertisers.

Using a highly-compressed flat file structure, the efficient “C” programming language and the DOS operating system, Pro*Filer can develop a custom CHAID analysis in four to ten minutes on a not-very-powerful notebook computer–fast enough to complete during a sales call. (The same analysis might run in under two minutes on a Pentium machine with a fast disk drive.) Once the analysis is built, the system can generate a list of selected names at rates from 60,000 to 200,000 records per minute, again depending on hardware.

Pro*Filer is considerably more than a CHAID-based selection tool, however. The system begins with a household file for the entire market area, typically based on the user’s own records merged with a compiled household file such as R.L. Polk’s TotaList. (Polk helped develop the system, although other vendors’ data could be used instead.) The master database can either hold one record per household, with data on individuals stored in separate fields, or separate records for each individual. The actual record layout is determined by each user, and typically holds 600 to 800 bytes of data.

Data can be appended from external sources such as advertisers’ own house files, provided they are in fixed-length ASCII format. After the user defines the record layout with a simple visual screen, the system automatically runs proprietary address matching routines to link the input records with existing households, and then append whatever data is desired. On reasonably clean data, the process can usually match 85% to 95% of the records. It provides a file of records that did not match any existing households, but will not add them to the system. The matching process runs at about 75,000 records per hour on a fast machine.

Once the master database is built, it is typically broken up into separate files for each Zip code in the region. These files are then compressed–often with the name and address removed for both efficiency and security–and used in the actual processing.

The CHAID system itself works by comparing two file segments, typically a 25,000 record sample of the master universe (or some user-defined subset) and up to 20,000 records from a distinct group such as customers of a potential advertiser. It defines segments that are significantly more or less likely than average to belong to the second group, and then provides a report that ranks the segments by their concentration of group members. To use the CHAID definitions in an actual selection, the user pages through the segments one at a time, in rank order, viewing the criteria that define each segment and indicating whether or not to include that segment in the selection. When this is complete, the system will pull names from the main database that match the selected segments. There is no way at present to select a group of segments–say, all those with more than a 500% concentration index–with one command, however.

Although CHAID provides the most impressive segmentation abilities in the system, Pro*Filer also gives users a conventional point-and-shoot selection screen that allows selections based on all fields in the file. Complex selections can be handled by defining two sets separately and then adding them together. The current version of the system will not eliminate any resulting duplicates, although this capability is scheduled to be added in a June 1995 release. The new release will also include postal sorting capabilities and allow users to place flags on records when they are included in a selection. Features such as multi-cell selection (with different tags or keycodes on each cell), test/control splits, and response analysis reports are scheduled to be added in 1996. A Windows version is expected sooner, around December 1995.

Reporting in the current system is limited. A simple cross tab can compare two fields in the database, showing the percentage of records for each combination of values. In addition, a profile report will show counts for occurrences of each value of demographic variables, while a comparison report will show percentages of values in the demographic variables for up to ten files, side by side. The system does have an extract capability that can export user-selected data, either as flat ASCII files, DBASE records, or multi-line print formats such as mailing labels or form letters. It can also produce DBASE files in the specialized formats needed for Atlas and MapInfo mapping software.

Pro*Filer is priced at $20,000 to $40,000 per year, based on the circulation of the newspaper it is sold to. This does not include the cost of any external data such as TotaList, which costs about $60 per thousand without automobile information. Installation and training costs an additional $5,000 in the first year, and it takes four to six weeks to set up a new client. The system was introduced in 1994 and now has about a dozen installations at newspapers with circulations from 200,000 to over one million.

* * *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

Leave a Reply

You must be logged in to post a comment.