2001 Aug 01
DataPulse Inc. DataPulse and MailWizard
David M. Raab
DM News
August, 2001

Database experts have long recognized that analyzing large sets of records–such as marketing databases–is fundamentally different from processing individual transactions. The logical conclusion is that different database engines should be used for each purpose. But while many specialized analytical databases have been developed over the years, none has found wide acceptance.

Instead, most firms have built analytical systems with the same relational database engines they use for transaction processing: products like DB2, Oracle and SQL Server. This has been made practical by advances in processing power, by incorporation of analysis-oriented features such as specialized indexes, and analytical data structures such as star schemas. The result is like putting an air spoiler on a pickup truck: you get better performance, but nothing like a true race car. Still, most IT departments are the technical equivalent of a one car garage, lacking the resources or desire to maintain two different vehicles. And since they spend most of their time racing pickup trucks against each other–that is, evaluating ways to use transaction databases for analysis–they rarely even recognize that vastly faster technologies exist.

But the advantages of specialized analytical databases are so great, and the apparent market is so huge, that developers keep introducing new ones. Some vendors are painfully familiar with past failures, while others are blissfully ignorant. All argue–rightly–that their products should be evaluated on their own merits.

DataPulse and MailWizard (DataPulse Inc., 732-577-8115, www.datapulsedb.com or www.lmlabs.com) are, respectively, data analysis and campaign management applications that run on DataPulse’s DP DBMS database engine. DP DBMS applies data compression, column-oriented data storage, and entity-oriented data mapping, each of which addresses a specific performance bottleneck in conventional database systems. Compression, by storing data in about 10% of its original space, reduces disk requirements and data movement. Column-oriented data storage, by allowing a query to read only the specific data elements it needs, further reduces the amount of data processed. Entity-oriented mapping, by defining data relationships in terms of business entities rather than conventional tables, reduces the effort to join information from different levels (such as customers, transactions and locations).

DP DBMS’s approach yields particular strengths in resource consumption and complex query capabilities. The system runs well on a low-end Pentium server with limited memory. By contrast, some analytical databases are highly dependent on fast processors and large amounts of memory. Compression reduces the need for costly data storage.

For complex queries, DP DBMS’s entity-mapping approach allows it to join data across multiple levels with relatively little reduction in speed. In many other analytical systems, joins have a major impact and there are limits on the allowable data structures and numbers of joins. DP DBMS also allows calculations within queries, as well as “virtual” calculated fields that are not physically stored in the database. Not all analytical systems offer such calculations, which are critical for some complex analyses.

On the other hand, the system is not the world’s fastest query engine. A query involving multiple fields on 100 million records might take 30 seconds to two minutes in DP DBMS. This is fast, but the quickest analytical products could return the same result in one-tenth the time–particularly if the data is all on the same level. Still, the truly relevant comparison is a conventional relational database, which could take several hours for the same query, and might not return a complex multi-level query at all.

Like other analytical systems, DP DBMS must load data into its own format. The system does this at five to ten gigabytes per hour, depending mostly on the speed of the disk drives. Other analytical systems are also limited by drive speed, so they load at similar rates, or slower if lots of processing is needed.

Even at a relatively brisk ten gigabytes per hour, load time can be a problem: one terabyte would take four days. Fortunately, DP DBMS can load incremental changes or additions to an existing file, rather than starting from scratch after each update. Not all analytical systems share this ability. The load process can also incorporate data cleaning and transformations, and can even flag data that has changed since the prior version. This makes it easier to identify significant events such as a new child or change of address.

Once the data is loaded into DP DBMS, it is accessed primarily through the system’s own interfaces. The system does have an ODBC interface to accept SQL queries, but performance is much slower. The system’s point-and-click query builder lets users construct expressions by selecting data elements and operators. Several expressions can be combined in a group, and groups themselves can be combined and nested. The result is a collapsible tree that lets non-technical users develop complex selections. The system also shows counts for each data element and for each expression, helping users to understand what caused their results.

The same query interface is used in the DataPulse analysis tool and MailWizard campaign manager. DataPulse also provides descriptive statistics and multi-level cross tabs. The cross tabs are comparable to multi-dimensional analysis systems, with nuances including multiple attributes per cell and interactive features to sort data, pivot or hide columns, and expand or collapse rows. But the real advantage is that users need not predefine which elements, measures or summaries to view, as in conventional multidimensional tools.

Users can also select any set of cross tab cells and transfer the underlying records as a group in the MailWizard campaign manager. Like DataPulse, MailWizard has a plain interface that manages to offer a solid set of capabilities. Users can construct multi-segment campaigns, with each segment defined by its own query. Each segment has a budget, unit cost, target quantity, and option to be scheduled for one or more repetitions. Segments can be split into random, Nth or sequential samples based on a fixed quantity or percentage.

When a campaign is ready for execution, its segments become available to an administrator who builds a mail tape from one or more campaigns. The administrator picks which segments to include, and can define global exclusion groups, select only one name per household, eliminate duplicates across segments, and exclude customers who have received previous promotions within a specified time. Output format and rules for assigning key codes are defined during system set up. Once the tape is generated, the system updates the campaign and segment statistics with actual mail quantities. It also stores promotion history records, which are available for future queries and analysis. A standard function is available to post responses and match these against mailing history, although reports would be custom developed for each client.

DataPulse, MailWizard and DP DBMS all run on a Windows 95 or NT server and use Windows workstations. The system was originally created by L&M Technologies, an application developer, for a large direct marketer which has been using it since 1997. L&M created DataPulse to market the products to other users, and is just beginning its sales efforts. Pricing begins at $250,000 for the complete system.

* * *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

Leave a Reply

You must be logged in to post a comment.