David M. Raab
DM News
August, 1994
“In the beginning, there were customers. Customers begat transactions, and transactions begat transaction processing systems. Transaction processing systems multiplied, until the earth was covered with data and customers could no longer be seen. And this was not good.
“So the marketers built great warehouses to store their data, that they might see their customers once again. And it came to pass that a great hunger for information came upon the marketers, and they opened their warehouses and drew forth the data and molded it into information. And the marketers grew great and prosperous from the information, and their customers became as many as the stars in the sky.”
Whether the Promised Land is truly studded with data warehouses remains to be seen. But many businesses have experienced the vision of a central corporate repository for customers and other marketing data, and software developers are happy to support their quest. They offer two types of aids:
– “Data management” tools to handle the huge volumes of data–hundreds of millions of records–in a large corporation’s files. Some tools help extract and merge the data from operational systems; some provide quick response to analytical queries that are difficult for conventional databases; and some let end-users generate queries and manipulate the results. In the first category, earlier columns have reviewed tools from companies including Customer Potential Management, Worth Information, Customer Insight Company, Harte-Hanks, and OKRA Marketing. In the second category, we have reviewed Expressway Technology, MarketPulse, MegaPlex Systems, Mercantile Software Systems, Red Brick Systems, and HOPS International. The third category beyond the range of this column because it is fairly well covered in the general computer press.
– “Relationship management” tools that execute marketing programs based on the central data repository. Several of the “data management” systems also fit this category because they have integrated marketing components. In addition, the group would include advanced sales automation products from companies such as Brock, Data Code, Information Management Associates, and Software Development Group. But probably the most interesting niche holds products such as Third Wave International’s MIND, reviewed earlier this year. These systems are specifically designed to integrate marketing activities with company-wide operational systems.
Here are three more systems (one from each category) to help run your data warehouse.
ISI-Match (Innovative Systems, Incorporated; 800-622-6390) is one of several products that ISI offers to combine data from multiple sources. These include systems that assess the quality of existing customer files, standardize name and address data, consolidate records by household, and link fragmented client or vendor relationships. The company also sells a business editing and matching system through an alliance with Dun & Bradstreet, as well as on-line software to improve the quality of data entry.
This column cannot “review” ISI software in the usual sense, because the quality of its data scrubbing and matching algorithms can only be assessed through detailed testing. But the software has a good reputation in the industry and has long been one of the few sophisticated packages in the market. In short, it’s worth knowing about when you consider data warehouse applications.
ISI’s clients have traditionally been large financial institutions who want to unify all customer data on an in-house system, either for a marketing database or a company-wide data warehouse. More recently, ISI has expanded into non-financial markets and internationally. The firm’s consulting affiliate, SMS Consulting, Ltd., offers project justification, data modeling, database design, and implementation for corporate data warehouses.
Most ISI products run on mainframes, although there are some PC-based data maintenance programs. ISI is considering client/server Unix versions, but has not announced specific plans to add them.
ISI projects typically include a mix of consulting and software. Costs range upward from $25,000, and can reach $500,000 or more.
Omnidex (Dynamic Information Systems Corporation, 303-444-4000) is an indexing and data access tool that gives very fast access to very large databases. The product was launched in 1981 on Hewlett-Packard midrange computers, and DISC added a version for the Digital Equipment VAX in 1991. It now supports the HP-3000 and HP-9000 lines, as well as DEC VAX and Alpha. There are over 4,000 installations.
Omnidex performance is determined, not by the number of records in a file, but by the number of unique values (“keys”) in fields within those records. The system can process up to 15 million keys per minute, depending on hardware. This can translate to well over 100 million records per minute, depending on how many records share each key. Actual performance is much faster still, since most queries process only a few keys.
The system works by creating a separate index entry for each “key”. Each entry contains the number of records having the key value, plus a list of the records themselves. Since the system has already counted the records, queries that require only the number of records with a particular value in a single field can be returned almost instantaneously. Queries on multiple fields take a little longer, because the system must read and combine the record lists associated with the selected keys. Still, the record lists are much smaller than the entire database, so working with them is much faster than scanning the entire file. Working with record lists also means the number of fields in a query has less impact on Omnidex than on systems using an inverted file structure. Inverted file systems read all data for each field in a query, so adding a new field always adds a significant processing burden.
The catch with Omnidex is that a single query cannot compare data in different fields. This means that queries cannot do calculations: for example, a standard query cannot divide total sales into total orders to get average order value. Either this value would have to be precalculated and stored in a field in the database, or DISC consultants would have to write special procedures to handle the calculation. In fact, DISC regularly creates such procedures for its clients; this is not necessarily difficult or expensive.
Omnidex also suffers when a field has few repeated values, since the number of keys will be almost as high as the number of records. This can happen with numeric variables such as lifetime purchases, where no two customers may have the same exact total. Again, DISC consultants have methods to reduce the impact of such conditions.
Within its limits, Omnidex is very flexible. Even though it cannot compare multiple fields, it can still reference them in a query–say, to find everyone in California who purchased an earthquake insurance policy. It can also find fragments of words, find several values within the same field, find the same value in several fields, and find a combination of values in one or more fields. Indexes can combine data from several fields, even when these fields originate in different databases. The system can also build indexes on fragments of a field (say, the third through sixth characters) or on all individual words within a field (for text search applications).
The system can work with C-ISAM sequential files and with relational databases including Oracle, HP TurboImage, DEC Rdb and RMS. DISC is working to add new databases such as Sybase and Informix. Tables can be related through predefined joins when the indexes are built, or the relations can be established for a single query. Unlike conventional relational systems, Omnidex pays little performance penalty for queries across multiple tables.
Indexes are built at rates up to 500,000 keys (not records) per minute. On Hewlett-Packard systems, Omnidex can adjust its indexes incrementally as new records are inserted, allowing the technology to be embedded directly in transaction processing applications. DEC systems must rebuild an entire table to incorporate any change, so they are typically updated on a daily or less frequent basis. Omnidex indexes take ten to twenty percent as much space as the original data. The system has been tested on databases up to several hundred million records, although most existing installations are much smaller.
Omnidex technology is used in several ways. Users can write queries directly, with a proprietary programming language. Standard report writing tools can be modified to call Omnidex functions when they select the records for a report. PC-based, client/server “executive information systems” can interactively query particular sets of records. Outside the data warehouse, transaction processing or customer service systems can search for records with key words or last names when a unique ID number is not available. Omnidex can also export ASCII files into other applications.
These functions are embedded in several different modules. Pricing is based on the size of the user’s computer and on which modules are purchased; it starts as low as $7,000, although a typical installation costs $20,000 to $50,000 and a major project can reach six figures. Most projects involve consulting by DISC staff to design the data structures and help build applications that take advantage of Omnidex’s capabilities.
MKS Marketing Information System (Customer Focus International, 909-869-0083) is a part of a larger set of tools designed to build and exploit customer data repositories on standard relational databases. These modules can link records that belong to the same household or individual, manage telemarketing and field sales activities, calculate customer profitability, evaluate credit risks, and track personal identification numbers for automated teller machine cards.
Options also include a “customer relationship management” module that provides data entry screens for activities such as account openings, service requests, credit applications, and telemarketing. The screens accept transactions, store them, and later post them to the relevant operational systems and to the central data repository. A common entry system makes it easier to control the quality and completeness of the original input, and allows the system to present relevant marketing information to users as they are speaking with customers.
MKS offers sophisticated functions to manage single-promotion campaigns. It can randomly distribute a group of names into as many as twenty cells, each having different creative versions or promotion dates. The system can take multiple “snapshots” of customer information such as cumulative purchases or account balances, allowing reports that show changes over time and that compare the performance of the different cells. The system can also be set to repeat the same campaign at a fixed interval, to automate programs such as mailing to everyone with a birthday in the coming month.
But MKS is less adept at multi-promotion campaigns. One campaign cannot send three consecutive letters to the same group at regular intervals, this could be done through separate batch jobs. Nor can a single campaign address multiple, independently-defined file segments.
MKS also lacks user-defined calculations, user-defined reports, integrated mapping or statistical modeling. But this is a less notable absence, since these tasks can be done with third-party tools that read the underlying data directly. Customer Focus plans to add these functions eventually, and has already integrated a drill-down, graphical analysis of standard reports and a serviceable point-and-shoot query interface.
Speed in MKS is determined by the hardware and software used in a particular installation. The system originally ran on IBM mainframes with the DB2 database. Customer Focus recently became a strategic partner with AT&T GIS (formerly NCR) and is now adding support for that firm’s Unix and Teradata hardware on databases including Oracle and Sybase. OS/2 and Windows NT versions are also planned before the end of 1994. Users can work on “dumb” terminals or PC hardware with Microsoft Windows.
MKS databases are built and updated on the client’s own computer, although Customer Focus can provide staff to run the system if the client desires. Updates can be run in either a traditional batch mode or incrementally, with transactions and households added to the system as they are entered. The system offers consolidation at both the individual and household levels, but does not include special logic for business accounts.
MKS is built around a standard data model that carries about 3,000 bytes of data per customer, which is five to ten times the size of a traditional bank marketing database. This model includes business rules and processes as well as the physical data structure, and is customized for each client.
Mainframe modules of the Customer Focus system cost from $300,000 to $600,000, depending on the module and how many modules are purchased. Unix pricing is expected to vary with the size of the system and the amount of customization required. MKS can be tested through a “business assessment” project, which includes file analysis, repository design, batch programs for data extraction and consolidation, and a three month license for MKS marketing tools. This costs about $100,000 to $125,000.
Customer Focus International was founded in 1987, and provides consulting on repository development, business applications, process modeling and relational database operations. The company introduced MKS in 1992 and currently has one installation. The “customer relationship management” module has three installations. Initial marketing is aimed primarily at financial institutions, although Customer Focus hopes eventually to apply the system in other areas as well.
David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.
Leave a Reply
You must be logged in to post a comment.