2008 Oct 01
How to Judge a Columnar Database, Revisited
David M. Raab
DM Review
October 2008
.

Last December, this column ran a piece on “How to Judge a Columnar Database.” When someone quoted it to me recently, I realized it had already become outdated. The reason is that a new generation of vendors, including Vertica, ParAccel, Calpont, and InfoBright, has joined older columnar systems from Sybase IQ, Alterian, SmartFocus, Kx Systems, and 1010Data.

In general, the new systems assign dedicated disk drives to each processor (“shared-nothing”) while the older systems apply multiple processors to a shared storage pool. Each approach has its own strengths and weaknesses, which introduces some new differences to consider. In addition, the broader adoption of multiple processors and 64-bit memory removes some of the performance constraints that impacted earlier systems.

Let’s first revisit the original list of items to see which are still relevant. Then we’ll add a few new ones.

- load time, incremental loads, and data compression. These all reflect the need for a columnar database to restructure data originally stored in another format. They can be critical bottlenecks at large data volumes, and older systems varied widely in their performance. As a result, these were probably the most important considerations when comparing older columnar systems.

Today, multiple processors, larger memory space and more scalable disk storage have greatly improved load and compression rates in nearly all columnar systems. Substantial differences still exist, but performance of even the slower systems is likely to be adequate. As a frame of reference, leading columnar databases several years ago loaded around 10 gigabytes per hour, while today’s best products load 150 to 200 gigabytes per hour. Many can reach whatever load rates are needed by simply adding more processors. Additional processing power also allows greater compression of stored data, since systems can decompress it more quickly after it is read from disk. (Decompression is not always needed: many operations run on the compressed data directly.)

Bottom line: you still need to consider load and compression performance, particularly if you’re dealing with terabytes of data (and aren’t we all?) or need quick incremental updates. But these issues no longer head the list.

- structural limitations: some early columnar databases imposed significant constraints on data structure, such as requiring that all tables use the same primary key. These crude limitations are largely gone. However, some of the newer systems do have more subtle limits, such as performing better on star schemas than normalized architectures. If you expect to use a star schema anyway, you probably won’t have a problem with any modern columnar system. But if you use other structures, check carefully how well a given product will perform on them.

- access techniques: while many early columnar systems were not SQL-compatible, all of today’s products all offer some level of SQL access. (Some still offer their own language for functions that SQL handles poorly, such as time series analysis.) Still, there are many levels of SQL compatibility. You’ll want to dig into the details for each system, particularly if you want to reuse existing SQL queries or SQL-based reporting tools.

- performance: this is one issue that hasn’t changed. Columnar databases are all fast, but performance on particular tasks can vary substantially from system to system. Performance may also depend on system configuration, so it is especially difficult to test. But performance is probably why you’re considering a columnar system in the first place, so you’ll certainly want to be sure you know what you’re getting.

- scalability: any columnar database you’re likely to consider will handle a couple terabytes of data. But not all are proven at the fifty or hundred terabyte level. In addition, some systems are significantly better than others at handling mixed query types and large numbers of simultaneous users. If you have needs like these, make sure your chosen vendor has similar installations in production, or that they can demonstrate the necessary performance in a realistic test.

So much for the old issues. None has vanished but the frame of reference has shifted for many. In addition, here are some new considerations:

- fault tolerance: many of the newer systems store data redundantly, either within or across nodes. This is done largely for performance purposes, but can have side benefits of easy—possibly even interruption-free—recovery from hardware failures. Many columnar databases are used for analytical work where some downtime is acceptable. But if it matters for you, be aware that products differ substantially in this area.

- data types: columnar databases have traditionally analyzed conventional structured data. But a few also support XML, text analysis and even binary objects. As with fault tolerance, you may not need this, but should know that it’s available if you do.

- database administration: one of the traditional benefits of columnar databases was their simplicity. Basically users dumped in the data and the system organized it for them. It’s still possible to work this way, but many systems now provide options such as multiple index types or sort sequences. This lets users tune the system to their requirements, but also means database administrators must make the right decisions. It’s still true that any columnar system should be easier to manage for a given analytical application than a relational database. But you’ll want to assess the differences in administrative workload among the columnar products themselves.

* * *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

2008 Sep 01

Demand Generation vs. Marketing Automation: What’s the Difference?
David M. Raab
DM Review
September 2008

Last month’s column described the importance of lead scoring within “demand generation” systems. But perhaps we should step back to describe those systems in general. Many people still confuse them with “marketing automation” or “campaign management” products.

It’s an easy mistake. Both sets of systems maintain a contact database used to drive outbound marketing campaigns. Both provide reporting and analysis tools to understand promotion results. Both sometimes include marketing planning, content management, and project management.

The obvious difference is that demand generation software is nearly always used by business marketers, while marketing automation and campaign management are used primarily to reach consumers. But this distinction is less useful than it seems. Many traditional marketing automation systems are also used for business marketing. And many small consumer marketers use lower-end demand generation software.

The more meaningful distinction is probably between companies that market directly to their customers and those that sell through sales people. Traditional marketing automation systems are used primarily in financial services, travel, retail and communications companies. Their campaigns sell specific products, even though the sale may be completed at a retail store, bank branch or sales agent. By contrast, demand generation systems attract and nurture leads which will be handed to sales departments when they are ready to buy. The salespeople will then identify needs, select appropriate offers, and close the deal.

Another, even simpler difference is that demand generation systems work with leads—that is, people who have not yet made their first purchase—while marketing automation systems focus on existing customers.

The fundamental distinction between nurturing leads and managing customers drives the major differences between the two sets of products. These include:

- focus on Internet behavior. Demand generation systems drive prospects to the company Web site, monitor their behavior, and infer when they are ready to buy. Most of herding is done with emails, which themselves can report whether they have been received, opened, clicked on, etc. Demand generation systems track Internet behavior in great detail because it’s one of their two main information sources. (The other is user-provided information such as surveys). By contrast, marketing automation systems work primarily with promotion and purchase histories. Non-purchase behaviors such as Web site visits are given much less weight if they are considered at all.

- integrated Web pages and analytics. Demand generation systems provide tools to build Web surveys and microsites and to capture data from these directly. This reflects their focus on online media. Marketing automation systems can sometimes build Web pages, but they largely assume this will be done externally. Similarly, they usually rely on third-party Web analytics systems to capture information about visitor behaviors.

- tracking of anonymous visitors. Tagging anonymous Web site visitors with cookies, building a history of their behavior, and later merging that history with the visitors’ identities are central features of demand generation systems. Marketing automation systems may not even track anonymous visitors, and certainly do not consider this a core capability. Have I mentioned that they are primarily interested in communicating with known customers?

- multi-step, highly reactive campaigns. Treatments within a demand generation campaign can vary quickly and significantly in response to an individual’s behavior. Marketing automation systems consider this an advanced feature that only their most sophisticated users are expected to deploy. In contrast, this is a fundamental capability for even basic demand generation products. In fact, finding ways to simplify deployment of multi-step campaigns is one of the main competitive battlegrounds in the industry.

- limited segmentation. This is the flip side of campaign complexity. Demand generation systems start with limited information about their targets, so they build campaigns that adjust treatments as information is gathered during execution. Marketing automation systems begin with a much richer customer history, so they select treatments using complex segmentations when the campaign is set up.

- lead scoring. Demand generation systems support elaborate scoring calculations to measure when a lead is ready for sales. Although marketing automation systems often support user-defined calculations and predictive modeling, they lack specialized lead-scoring functions such as depreciating the points allocated to older events or capping the points generated by a particular type of event. This is another competitive arena for demand generation vendors.

- simple database structure. Both demand generation and marketing automation systems maintain databases with information about individuals. But the base structure in demand generation systems is usually just a lead table and contact history. A modern marketing automation system nearly always includes purchases, and often additional information such as account balances and customer service interactions. The theory among demand generation vendors is that detailed information will be kept in the company’s customer management systems. However, most demand generation systems do let their clients extend the data structure through custom tables. For example, some version of purchase history is needed to measure campaign return on investment.

Many of these differences are more a matter of emphasis than fundamental technology. The products within each group also vary widely. So you still need to identify your own business requirements and assess how each system would meet them. But understanding the distinction between the two categories should make it easier to narrow in on the products best suited to your needs.

* * *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

2008 Aug 01
Lead Scoring Takes Center Stage
David M. Raab
DM Review
August 2008
.
In case you haven’t noticed, the Internet has fundamentally changed how people gather information. This has affected business marketers in particular. Because Web sites now provide so much information that previously came from salespeople, marketers stay engaged with prospects for much longer. This means they must do a better job of understanding and responding to prospect interests, and of deciding when it’s finally time to turn them over to sales.

The change from simply generating leads to actively nurturing them is probably the main engine propelling growth of “demand generation” vendors like Eloqua, Vtrenz, Manticore, Market2Lead and Marketo. Their products, and at least a dozen competitors, manage traditional lead generation campaigns. But the goal is no longer just getting a name and handing it to sales. Instead, it’s to draw people to the company Web site, where they will join equally anonymous visitors from print ads, Web ads, trade shows, search engines, and other sources.

The real work of the demand generation system starts with that first Web visit. It begins tracking visitors’ behavior, trying to deliver the information they need at the moment they need it, and convincing them to surrender information about themselves in return. If this sounds like a seduction, that’s because it is one.

The moment of truth comes when marketing sends the lead over to sales. If the lead isn’t ready, then sales will complain about low quality. If marketing waits too long, opportunities may be missed. Like Goldilock’s porridge, the leads must be not too cold or too hot, but just right.

The instrument used to measure their temperature is lead scoring. Demand generation vendors recognize how important this is and are rapidly improving their scoring systems in response. Typical enhancements include increasing the scope of data that can be scored, adding precision to the score calculations—for example, by reducing the value assigned to each event based on recency—and making it easier to set up the scoring rules.

But these efforts face a fundamental problem. Traditional lead scores were built by marketing and sales experts deciding how what weight to assign to each attribute. This worked well when not much information was available: typically little more than source, company, job title, and BANT (budget, authority, needs, timeline), gathered at the start of the process. In fact, jointly defining the scoring rules was one of the best ways for marketing and sales to align their understanding of lead quality.

Today, the volume of data has exploded. Demand generation systems track each page view, document download, and email open. They combine information about different visitors from the same company, based on a shared Web domain. And they look at the timing of these events to understand when prospects’ interest is reaching a peak.

Rules of thumb collapse under so much detail. Marketers need formal data mining projects to identify the most important events and behavior patterns. These projects correlate prospect attributes and behaviors from the demand generation system with results captured in the company’s sales automation applications.

Assembling this data is relatively easy, since the demand generation systems are design for tight integration with sales automation systems, and Salesforce.com in particular. But these systems do not provide data mining and predictive modeling tools.

This is no problem for data mining, where most work is done by statisticians who prefer their favorite systems anyway. But for predictive modeling, most scoring formulas are too complex to replicate manually in other systems. The demand generation systems will eventually need to import scoring formulas from external modeling systems, or to call those systems to generate the scores and return them.

Other lead scoring enhancements will follow. Current systems require marketers to manually assign a weight to each event or class of events. The work involved limits how precisely the weights can be tuned to each item. But content analysis systems already exist that could automatically classify the actual message within each item, allowing more precise weighting with no manual effort. Similarly, existing systems that search the Web and assemble information about a company or individual could easily enrich the prospect profile with new scoring inputs.

Content classification and Web searches will initially be provided by third party systems. The demand generation vendors may eventually build these directly into their products, but a better solution in most cases will be to simplify integration with external specialists through APIs or Web services. This will let the demand generation vendors focus on their core products and let their clients benefit from continued progress in other fields.

These enhancements will be valuable even if they are not immediately integrated with lead scoring. Salespeople already use demand generation systems to generate automated alerts based on customer and lead behaviors, and then to list those behaviors for manual review. Better content classification and automated external search could make the alert rules more powerful and better organize the data presented for review.

The fundamental challenge for demand generation vendors will be to add these and other capabilities without making their systems too hard for marketers to use. This is a painfully common dynamic in the software industry: competitive pressures force vendors to add features, and complexity grows as a result. Demand generation vendors face an unusual counter-pressure from systems targeted at small businesses, which are purposely kept simple. We’ll see if this keeps them from following the usual path.

* * *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.