1994 Mar 01

MegaPlex Software Fast-Count DBMS
Expressway Technologies EXPRESSWAY
HOPS International Heuristic Optimized Processing System

David M. Raab
DM News
March, 1994

How fast is fast?

In desktop marketing, the speed champions have been proprietary flat file systems that query up to five million records per minute on a PC. Relational products such as Sybase or Oracle were much slower, though they might reach one million records per minute on powerful Unix hardware. Users needing still greater performance were pretty much forced to rely on the mainframes or multi-processor systems.

But several software-based alternatives now offer more speed than flat-file PC systems, the flexibility of a relational database, and the ability to run on low-cost hardware. Earlier columns have covered products from Red Brick Systems, Mercantile Software and Cross/Z International. This column lists three more.

First a note of caution: speed is a difficult thing to measure. Figures in this column are based on actual vendor tests, and have been confirmed in conversations with users. But performance in any particular situation will vary with the nature of the query, database design, hardware configuration, data characteristics, number of records found, and many other factors. Any performance figures should be considered only approximations, whose main point is that these systems are much faster than conventional database engines. Their advantage is especially clear at the core task of a marketing database: complex, ad hoc queries against large volumes of data.

Fast-Count DBMS (MegaPlex Software, 908-647-3273) is the only one of the current group to be embedded in a functional marketing database package. The product has been developed over the past three years in conjunction with David Shepard Associates, a leading database marketing consultancy, and since 1992 has been installed at about seven of the firm’s clients. Running on a Unix server, the system added a Windows-based graphic user interface in mid-1993.

Fast-Count provides speeds of 30 to 100 million records per minute for a count on a single field, on a server such as an RS/6000. Performance drops linearly as the number of fields in the query increases–that is, a second field would double the required time, a third field would triple the time, and so on. A profile report–showing counts for several combinations of variables–runs at about 15 to 50 million records per minute, regardless of how many fields are included.

Although the system does not meet the strict definition of being “relational”, it can report on data from several related files. When the files are already sorted on a common field, the join will be 15 to 50 million records per minute. The sort itself runs at a 500,000 to 1.5 million records per minute.

Like many very fast systems, Fast-Count relies on proprietary data compression techniques that reduce the amount of data that must be read. Fast-Count files typically occupy one-half to one-third the space of the original data. Because the system does not rely on precompiled indexes, performance does not depend on the user’s ability to predict which fields will be queried against.

To further improve performance, the system uses an extreme form of client/server architecture that passes only the results of a query–say, the actual count–back to the user’s machine. This generates less network traffic than traditional systems, which send back to the end-user all records that match the query, to be summarized on the end-user’s system. The benefit is especially great for very slow connections such as telephone modems. There is also an advantage to shifting as much processing as possible onto the server, since this is usually a more powerful computer than the end-user’s system.

Fast-Count data is manipulated with “Fast Count Language” (FCL), a proprietary programming language that allows the user to define databases, import data, make changes, do calculations, execute queries, generate reports and produce extracts for other systems. The language can be used to create entire applications as well as SQL-like queries. In addition, the system offers a separate “User Interface Language” that creates Windows-compatible graphical interface screens.

But users are not required to learn either of the system’s languages. Fast-Count is delivered with a full point-and-shoot graphical interface that allows the user to import data, develop queries and run reports without writing FCL code directly. It provides an adequate set of promotion selection facilities, including the ability to tag records selected for a given promotion and to extract a control group using Nth name or random selection. The system lacks specialized marketing features such as campaign profitability analysis or multi-segment selections from a single screen. These and other functions are typically embedded in custom screens written for each client.

Available reports include counts, profiles, field listings (which can combine information from multiple tables), and three-way cross tabs. Cells in the cross tab function can provide summary statistics on any field in database. Reports can include complex calculations drawing on mathematical, statistical and logical functions, and results can be permanently stored on the database. While there is no “forms painter” to create full-page reports, this could be done by writing an FCL program or exporting data to other Windows software. Advanced statistical, mapping or graphical analysis would also be done by exporting data to other systems.

Fast-Count offers on-line help that allows users to enter their own descriptions of fields in the database, as well as standard help screens that explain most menu options. Users can also enter and modify their own private data, which is stored locally. Shared system data is stored separately and protected from change.

Fast-Count is priced at $35,000 to $95,000, depending on the type of computer it will run on. A full installation, including customization, database design, and data loading programs as well as the software itself, usually runs from $100,000 to $200,000 and takes one to three months to complete. Annual maintenance is 15% of the software license, and includes toll-paid technical support from 9 a.m. to 5 p.m. Eastern time.

EXPRESSWAY (Expressway Technologies, 617-890-8670) is a specialized indexing technology rather than a full-blown database system. As such, the system reads data in existing relational or flat-file formats, creates indexes, and then resolves queries by accessing the indexes rather than the underlying data. Of course, this is just what any conventional indexing technique does: what’s special about EXPRESSWAY is that it builds the indexes much faster, uses much less space to store the indexes, and is especially efficient at handling complex queries that return large numbers of records.

Perhaps even more special, the system can be installed as “middleware” that processes standard SQL queries without any special adaptation of the system that generated the query or of the actual database. This means that EXPRESSWAY can easily be fit into existing systems. It also means that when EXPRESSWAY is presented with a query it cannot handle–say, because it contains a field that has not been indexed–it can pass the query straight through to the underlying database, rather than failing to process it altogether.

On a reasonably powerful Unix system, performance ranges from 120 million to 30 million records per minute for a simple query that returns only a count, and from a half million to six million records per minute for much more demanding selections that include comparisons and summations. On these demanding selections, an indexed conventional relational database would typically run at 100,000 records per minute or less.

The system achieves its performance through an enhanced version of “bit map” indexes, which very quickly read all the data in the relevant fields of the database. The result is that performance barely deteriorates in situations where many records are selected, there are many conditions to consider, or the query asks for “anything but” a specific value. These are exactly the situations where conventional indexing systems perform worst, because they are built mostly for transaction processing, where the goal is to be very fast at finding a single record when given specific information such as an account number. When faced with a query that is not easily narrowed to a few records, the conventional systems are often forced to read the entire file, record by record, which is extremely slow.

The other major performance hurdle for conventional relational databases is the processing needed to “join” related files as a query is executed. EXPRESSWAY can avoid this by creating precompiled indexes that hold these relations. Once this is done, a query that accesses multiple files runs at virtually the same speed as a single-file query.

EXPRESSWAY does have its limits. Most obviously, the indexes are not updated automatically as data is changed in the underlying databases. This means query results are not up-to-the-minute, although this is often not essential in marketing database applications. It also means that selecting the actual records requires a separate pass against the underlying data files. (Actually, the system has the ability to recreate the actual data from the indexes alone, but this is a slow process and not recommended for anything more than a few thousand records at a time). Index building is relatively fast–about a half-million records per field per minute on a Unix workstation. In April, the company will speed this further by adding the ability to update the indexes by passing only the records that have been added, deleted or changed, rather than the entire file.

The system also lacks support for the full range of SQL functions. Most notably, it cannot do comparisons or calculations on individual rows. However, it can do aggregations (count, sum, average, etc.), grouping, sorting and other standard functions. A query that contains one of the unsupported functions would be handled by the underlying database itself. EXPRESSWAY has added one function not found in standard SQL, which is the ability to limit the results of a query to a specified number of records, such as the top 100.

EXPRESSWAY currently runs on Sun, NCR, Data General and IBM RS/6000 hardware, and is being ported to others. The system can be accessed via a gateway that accepts Sybase commands–making it immediately usable by any querying software that is Sybase compatible. It can be linked to Sybase and Oracle, with Informix, Ingres and other databases soon to be added.

EXPRESSWAY was officially introduced in late 1993 and now has five installations. Pricing ranges from $25,000 to $100,000 per server, based on the server type and amount of data.

Heuristic Optimized Processing System (HOPS International, 305-672-2361) is the only one of these systems to support transaction processing as well as database queries and analysis. The system runs on a MacIntosh Quadra server, typically linked to a very fast RAID (Redundant Array of Independent Disks) hard drive subsystem. At one installation, HOPS handles over 250 gigabytes of data.

At the heart of HOPS is a proprietary sorting algorithm that runs at about 400,000 records per minute with 100-byte records. This is complemented by other proprietary techniques for data indexing, compression and disk management. Combined, these ensure that nearly any record can be found with a single disk access, which provides almost instant response. In relational or other multi-file situations, there would be a separate access for each file, but the system is still very fast assuming the relations between the files were embedded in a pre-existing index.

HOPS must import data before it can work with it. The data load runs very quickly–20 megabytes per minute or better–and compresses numerical data by placing it in a binary format. The system then builds highly compact indexes at a rate of around 400,000 records per minute for a 100-byte record. Unlike many systems built especially for data analysis–but essential for transaction processing–HOPS can add or change a record without rebuilding the entire file. Indexes can be automatically updated at the same time. The system supports other transaction processing requirements such as record locking, redundant servers, and commit/rollback processing.

Performance on database queries is always fast, although the actual rate depends on the situation. One extreme is a query on an indexed field to find a single record: response would be measured in milliseconds, pretty much regardless of file size. The other extreme would be to scan an entire file, which runs between two and four million records per minute. Most marketing queries would fall somewhere in between: using indexes to identify ranges of records that might meet a query, and then scanning the records within each range. Built-in optimization tools help HOPS to process each request in the most the most effective way, although HOPS International also sometimes does custom programming to improve results for a particular user’s application.

In fact, each HOPS installation requires some degree of customization. The system includes a large set of tools, including a ‘programmer interface’ for file management and calculations, and a ‘user interface’ to build point-and-shoot screens that let end-users generate complex queries. The system also includes a set of graphics tools to generate charts, plus a very powerful geographic information system. Complex procedures might be programmed by HOPS International in C or C++.

HOPS also has an end-user report writer with a columnar format–that is, the user defines one item for each column on the page, and each row shows the value of that item for a single record or group of records. The report writer offers considerable power, including sorts, subtotals, cumulations, mathematical and statistical functions, and formulas that incorporate multiple data fields. Logical constraints can be applied to each column to create a cross-tab report. Non-columnar outputs–such as mailing labels or personalized letters–would typically be created outside the system.

HOPS is now being ported to DEC Alpha processors, which are expected to yield a five-fold performance increase over the MacIntosh Quadra. Most tools are currently limited to MacIntosh workstations, although some applications have been built for Microsoft Windows and others will be added for Windows NT.

HOPS can be leased for $20,000 per month, or purchased outright for $500,000 plus $50,000 per year for maintenance and support. Customization and other consulting are additional, as are special modules such as the geographic information system. There are about fifteen current installations, plus several more on the way, with particular concentrations in insurance, retail and scientific data analysis. The typical sales approach is to take a two or three gigabyte set of sample data, load it into HOPS, and then demonstrate the system’s capabilities in a live session at the prospect’s office. HOPS International does this for the cost of the transportation and lodging.

* * *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

Leave a Reply

You must be logged in to post a comment.