David M. Raab
DM News
July, 2000
.
A few years back, marketing technology experts often debated the virtues of “open” vs. “proprietary” database management systems. Open databases, meaning products using the industry-standard Structured Query Language (SQL), were considered good because they didn’t lock users into a single vendor’s applications. Proprietary databases, which use their own private query language, often were significantly faster and more flexible than their open competitors.
Today, the dust has settled. The open databases have won: it’s just about impossible to sell a system built on a proprietary database. Vendors who do use such technologies position them as “query accelerators” that supplement rather than replace an open database. Or they bury the technologies deep within their systems and don’t talk about them at all.
But it turns out that the “open” road has some pretty high exit tolls. While SQL does allow the same application software to run on different databases, corporate IT departments still find it painful to switch from one database to another. This is partly because database vendors add non-standard features to SQL, so applications tuned for optimal performance on one database must be revised to run well on another. But the chief reason is that databases are run by experts who usually specialize in a single product. This means running multiple databases requires hiring more than one set of experts–a difficult and costly endeavor.
The practical result is extreme bias in favor not just of open databases in general, but of whatever specific database is dominant within a particular installation. Even companies that use several different databases are reluctant to further complicate their lives by adding a new one.
What this implies for database software vendors is that SQL compatibility doesn’t truly eliminate the barriers to adoption. Even demonstrably superior performance is no guarantee of success. In recent years, quite a few technically impressive, SQL-compatible products–including Red Brick, HOPS, Sybase IQ and Mercantile IRE–have failed to win a broad following.
Freedom Intelligence (Freedom Intelligence, 519-884-4491, www.freedomintelligence.com) faces this same challenge. The system builds a highly compressed, fully indexed database that can accept standard SQL queries and return results five to ten times faster than standard relational databases. It does this with data compression, sorting and indexing techniques that are often applied in such systems, plus some special tricks the vendor will not reveal.
Freedom Intelligence imports data from a conventional data source, either a relational database or a comma separated flat file. The connection is made through standard ODBC (Open Database Connectivity) drivers, which are available for almost any likely source. The system automatically displays the ODBC data dictionary and lets the user pick which elements to import and index. Optimal indexes are built automatically, so setting up the system requires little specialized technical support.
During the import process, the system builds in the specified indexes and creates its own, compressed copy of the original data. Having this copy available lets Freedom Intelligence output any data element, including items–say, first names–that are not necessarily indexed. It also lets Freedom Intelligence operate independently of the source systems themselves, so queries and selections do not interfere with other activities. But working from a frozen copy also means the data is not up-to-the-minute, as needed by most operational systems and increasing numbers of marketing applications. Data is loaded at about one gigabyte per hour. This should rise to two gigabytes per hour in the next release, due by September 2000. Even at the faster rate, load time could pose problems for very large databases.
Load time can sometimes be reduced by using incremental updates rather than building each new database from scratch. This is somewhat limited, since the system can add and delete records, but not modify existing ones. Incremental updates pass the new plus existing data at five to ten gigabytes per hour. The vendor says there is no degradation in performance after incremental loads.
The Freedom Intelligence data set takes one to three times as much space as the original raw data. This is efficient compared with conventional database indexes, but still a problem for very large installations. It has not yet been a critical issue, since the largest current installation holds just under 20 gigabytes of data. The next release is expected to handle several hundred gigabytes. This is still much smaller than today’s multi-terabyte enterprise data warehouses, however. Mindful of this limit, the vendor positions Freedom Intelligence as a tool to build subsidiary data marts, not a replacement for the central warehouse itself.
Once the data is loaded, standard SQL-based tools can query it through ODBC. The current version of Freedom Intelligence supports most SQL functions but is missing some features for complicated subqueries and a few string functions. The next version is expected to support the full 1992 ANSI SQL standard. Still, the existing capabilities already make Freedom Intelligence more powerful some index-based systems, which cannot do calculations on their compressed data and may have problems handling fields with large numbers of unique values. Freedom Intelligence also overcomes the limits that some specialized databases place on data structures: it can handle large numbers of tables, can join many tables in the same query without performance problems, and need not prespecify joins when the data is loaded. In addition, the system provides extensive text-search capabilities, including the ability to look for substrings and for groups of words in context. It also does well with hundreds of simultaneous users.
These features may make Freedom Intelligence the most capable, easiest to administer high-performance SQL database yet. Whether this is enough to overcome scalability limits, batch-only updates and IT department resistance remains to be seen.
The initial version of Freedom Intelligence was released in late 1998. It runs on Windows NT/2000 servers and has two live installations plus several pilot sites. The next release will add support for Unix servers, which will help improve scalability and performance. Pricing begins at $50,000 for a few users and five to 10 gigabytes and can reach $500,000 to $1 million for a large enterprise installation.
* * *
David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.
Leave a Reply
You must be logged in to post a comment.