2009 Sep 01

Business Intelligence and the New Analytical Databases
David M. Raab
Information Management
September / October 2009

Business intelligence software has evolved to fit an environment dominated by relational databases. This mostly meant overcoming the inherent inefficiencies of executing analytical queries against normalized data models, typically by reorganizing the data into star schemas, cubes, indexes or other specialized structures. Sometimes the restructured data itself resides in a relational database and sometimes it doesn’t.

Because so many enterprises have invested heavily in these business intelligence applications, new database products must be compatible with them. But products built specifically for analytical use (ParAccel, Vertica, Netezza, Sybase IQ, etc.) often don’t face the constraints that the BI applications were designed to overcome. Attaching one of those engines to standard BI software is like towing a farm wage with a dump truck: you get better performance but are not taking full advantage of the technology.

So let’s do a little thought experiment and imagine a BI system designed with analytical databases in mind. Probably the most important feature of these databases is that they are relatively insensitive to data structure. (Caveat: each product is different; some are more sensitive to structure than others.) In many cases this means the system can efficiently execute queries against data in its original table structure, without the denormalization, indexing and aggregation used by conventional BI tools. This offers tremendous savings in time and effort, since much of the work in conventional BI systems is designing and populating these alternative structures. What’s worse, at least from an end-user viewpoint, this restructuring must be done by technical specialists. This means that any project must go through a time-consuming cycle of specification, development, review and modification as users struggle to understand their needs and the technicians struggle to understand the users.

But just because the analytical systems can query complex data structures, it doesn’t follow that end-users can understand those structures. Only a sadist would copy hundreds of database tables from an enterprise system and ask an end-user to query them directly. To take advantage of the analytical databases, a BI system needs a metadata layer to present the underlying tables in a way that makes sense to non-specialists. Indeed, one of the best reasons to retain existing BI software is that at least some of the products already contain extensive facilities to decouple the underlying physical data structure from the structures that are presented to users. To the extent that existing BI systems are not dependent on particular physical structures, they can provide excellent models for features that make end-users more productive.

One area where existing BI systems won’t provide much help is allowing end-users to connect to new data sources. In the current systems, this is a task for technicians, so the interfaces are designed with technical users in mind. But a major advantage of the new databases is how easily they can integrate new data with existing tables. A BI system designed to take advantage of this would allow end-users to do it for themselves. This implies easy features to physically load the data, explore it, define relationships between tables, and set rules for how key values are matched.

Data exploration is a particularly important opportunity, because the analytical databases let end users examine the new inputs immediately. This means users can quickly judge the data in each field, uncovering good and bad surprises that might otherwise remain hidden until weeks of technical effort had been invested in deploying them. A BI interface has to support this by providing appropriate data profiling and analysis tools. It must also support the next step, relationship definition, by making it simple to define trial matches between different tables and see how closely they correspond.

Value matching is particularly important as systems incorporate increasingly heterogeneous data sources. Exact matches between two columns are easy, but something as simple as a customer number stored as a numeric field in one system and a text field in another can cause trouble with conventional databases. Alternative date formats are another trivial difference that can still be difficult to overcome. Fuzzy matches, such as variant forms of names and addresses, are much more challenging but very important as users incorporate a wider range of inputs. In the conventional BI world, removing these discrepancies was another task handled by the technical experts as they set up the database. If the technical experts themselves are removed from the process, this matching must be handled automatically by the database or the BI system, or end-users must have tools to manage it for themselves.

Let’s add one more requirement. Although some analytical databases allow only a single logical data structure for each physical set of tables, most resemble SQL in letting users define table relationships within each query. Databases that allow this flexibility need interfaces that make it available to end-users, either when they are formulating queries or when they are setting up the presentation layer.

All of these features support the fundamental goal of letting end-users ask unanticipated questions without needing a technician to build a new cube, reorganize the data tables, or expose details that were lost in a previous aggregation. Today’s analytical databases make this possible, but new business intelligence systems are needed for them to deliver.

Next month’s column will look at some existing BI systems that offer these capabilities today.

******************************

Last month’s column described how business intelligence software could be adapted to take full advantage of today’s analytical databases. The premise was that current BI systems are designed to overcome weaknesses that applied to conventional relational databases but not the newer analytical engines.

It wasn’t very hard to come up with the list of features for these next-generation BI systems because they already exist. The pioneers are vendors like QlikTech, illuminate Solutions, ADVIZOR Solutions, Tableau Software, TIBCO Spotfire, Lyzasoft, smartFocus and Alterian who did not originally constrain themselves to be compatible with SQL. Instead, they developed interfaces that let them take advantage of their proprietary database engines in ways that SQL wouldn’t allow. Some of these vendors provide APIs to let external systems access their data, and some of these do support SQL queries. But SQL front-ends cannot take full advantage of their database features.

The main advantage these systems have over conventional BI software is the power they provide to business users. Adding a new source or building a new report doesn’t require a technician to modify the data structure or generate a new cube. Most of these products also provide at least some data preparation capability through a scripting language or process flow. This won’t replace the integration needed to build an enterprise data warehouse but it lets users outside of IT create applications that go well beyond simple reporting. Some technical skill is needed, but it’s on par with an Excel power user, not a database architect.

Of the systems listed above, only QlikView – with more than 11,000 installations and over $120 million revenue in 2008 – can really be considered a major force in the industry. The other products have much smaller bases, even though several have been around for more than a decade. This raises an obvious question: if the systems are really so great, then why haven’t they been more widely adopted?

One constraint is scalability. Many of these products originally ran on 32 bit architectures that limited the in-memory systems to two gigabytes of data and the disk-based systems to 2 billion rows. Load speeds were also originally measured in tens of gigabytes per hour. Those specs are more than adequate for many purposes, and probably larger than the data cubes underlying many conventional BI systems. Moreover, many of these constraints are now removed by 64 bit architectures and parallel processing. But handling the hundreds of terabytes in even a small enterprise data warehouse is still an issue. So, at best, these systems must coexist with a conventional database rather than simply replacing it.

The more important reason is probably IT department resistance. IT departments are generally conservative when it comes to unfamiliar vendors. But, beyond that, these vendors’ decision not to match SQL standards makes their databases seem dangerously isolated from the rest of the corporate infrastructure. Any data loaded into the systems can only be accessed via the vendors’ own tools or APIs, which raises a red flag for IT departments wary of reliance on a single product. In addition, the unique technology of these tools means that special training is needed to use them – an investment that IT departments seek to avoid. The small installed base of these products also means that IT can’t plan to hire experienced staff that someone else has trained already.

Of course, IT departments do sometimes embrace new technologies, particularly when they appear to solve a pressing problem for the IT department. But remember that the main advantage of these tools is that they empower business users to do more for themselves. In theory this would be an advantage to IT departments, because it would reduce their workload or let them shift resources to other tasks. But in practice, if an IT department has already invested in conventional databases and BI applications, these systems offer few new capabilities. Asked to examine the new systems, IT people often honestly shrug, “I can already do that”. Giving similar capabilities to other people doesn’t strike them as particularly important. And, no one likes to make their existing skills obsolete.

As a result, adoption of these new tools has largely been restricted to situations where IT isn’t involved or IT itself lacks an incumbent alternative. For the lower cost systems, this often means they are purchased by business departments for their own use, outside the IT budget and with minimal IT support. Another sweet spot is small to mid-size companies with few IT resources and little existing business intelligence infrastructure. On the relatively rare occasions when a large IT shop purchases one of these tools directly, it is usually because its technology solves a particular problem that has proven impermeable to conventional methods.

The SQL-compatible analytical databases avoid most of these obstacles because they mesh more easily with existing infrastructure. But, precisely because they continue to rely on conventional, IT-centric BI applications, these systems offer fewer benefits to business users than the more radical alternatives. The vendors selling these systems and the business people eager to use them share the challenge of convincing IT departments and senior management to approve them. Next month’s column will discuss approaches to help make this happen.

******************************

The story so far: analytical databases and new business intelligence interfaces deliver radical improvements in BI price, performance and ease of use. But IT departments often fail to appreciate the benefits of systems that reduce the need for skilled technical support. Far-sighted business and technology managers need ways to present the advantages of these systems more effectively.

Before going further, it’s important to be clear that IT departments are not a villain in this tale. They have sound reasons to be cautious in accepting any new technology, since they will be held responsible if it fails. Systems that promise to give more control to end-users are legitimately worrisome because end-users will make mistakes that IT professionals would avoid and may undertake projects they do not realize are beyond their true competence. IT managers worry, often correctly, that they will still be blamed for any problems that result.

Of course, IT departments make mistakes too, including some that end-users would not. This is why the groups must work together. The goal is finding the right balance of tasks for the two groups. New technologies call for an adjustment of this balance. That is precisely the challenge posed by the new analytical systems.

Enough philosophizing. Companies don’t assess technologies in terms of the proper balance between end-users and IT. Most have a much simpler approach that boils down to assessing benefits vs. costs.

In the case of business intelligence systems, the benefits are often somewhat vague – things like “better decisions” – and particularly difficult to quantify in advance. When the question is one BI technology vs. another, companies can easily assume that the outputs, and therefore the benefits, will be the same regardless of which technology is chosen. Even though they may recognize this is not completely true, it’s a helpful simplifying assumption because it lets them focus solely on comparing the costs of the competing solutions.

One approach to promoting the new analytical systems, therefore, is showing that they have a substantially lower total cost of ownership (TCO).

As with any TCO calculation, the trick is in knowing which elements to include. The main benefit of TCO is that it helps buyers look beyond the out-of-pocket investment – typically the license fee in the case of software – to include the cost of services and on-going support. But this definition itself is still too narrow, because it doesn’t capture the time end-users spend working with the system. One of the main benefits of the new analytical technologies is giving end-users a greater ability to do things for themselves. Although this might seem to increase the amount of time they spend, in practice it eliminates the effort devoted to explaining their needs to technical specialists, reviewing what those specialists produce, and then often revising the requirements based on the results. The result may well be a net decrease in the time that business people spend to complete a given project.

There is also an intangible, but quite real, benefit from letting business people work directly with the data. Doing this provides an opportunity to gain insights that they miss if they only review the results of someone else’s labor. Even though the value may be hard to calculate, it’s intuitively clear that an hour spent analyzing data is a more valuable use of a business analysts’ time than an hour spent explaining his specifications to IT.

In fact, it’s possible to argue that a business analyst generates value only while doing analysis and generates cost while doing anything else. By this logic, a system that lets analysts spend more time on analysis and less on meetings simultaneously increases value and reduces cost. However, that’s a rather extreme position related to the analytical systems because much of the new “analysis” time is really spent doing system development. Still, most analysts would probably feel working through development issues is still more productive, in terms of learning new information, than writing up specs and explaining them.

Finally, although it should be obvious that shifting work from IT to business users will reduce IT costs, a conventional TCO analysis may not show this. It’s quite easy, and in many ways only fair, to prepare an “apples to apples” comparison that assumes IT will continue to provide the services it does today. But an experienced BI specialist can often perform a given task almost as quickly with the existing systems as with the new ones. Comparing those two times ignores the fact that with the new system the task would be performed by a business user instead.

In short, to properly capture the benefits of the new analytical systems, traditional TCO analysis must be extended to include the costs of end-user time and to capture the shift in labor to end-users from IT.

Conventional TCO analyses also ignore the cost of waiting for an answer. In cases like inventory and price optimization, the company can calculate the precise penalty for every second of delay. Most BI applications are less time-sensitive, but there is still a real difference between getting an answer today and getting that same answer one month from now. Next month’s column will look at ways to factor this time element into the system evaluation.

******************************

Last month’s column described how total cost of ownership (TCO) analyses must be expanded to properly capture the savings of new analytical technologies. It’s tempting to measure cost alone because quantifying the value from competing BI solutions is so difficult. But value is too important to ignore completely. Even if you can’t measure it directly, it’s worth finding an indirect approach.

One method is to consider time: even if we don’t know what an answer is worth, we are pretty certain its value is higher if we get it sooner. Phrases like “time to answer” or “time to insight” to clarify that they represent an end-to-end measure rather than individual components such as response time.

This is important for the new technologies because many of their advantages come from eliminating non-computer tasks such as design time or meetings. These are not captured in traditional benchmarks or processing specifications. Traditional analyses also often ignore the time spent waiting for technical staff to deliver a solution or the delays from several rounds of rework as users react to initial results. A good “time to result” analysis might provide two broad measures:

- time to first query. This measures elapsed time from the decision to use a technology to when the system can process its first query. It include contract negotiation, hardware acquisition, software installation, database design, data loading and indexing, front-end tool integration, and training for both technical and business staff. Although it may seem odd to include something as non-technical as contract talks, bear in mind that some cloud-based solutions can literally reduce these from months to minutes. That is a difference worth capturing.

- time to answer. This measures the time to complete an actual BI project, including problem specification, data preparation, result analysis, and subsequent rounds of revisions.

Both of these analyses would include conventional measures of labor hours and machine processing time. But they also incorporate the waiting times – for hardware, training, meetings, approvals, and fitting tasks into a busy schedule – that are often ignored but determine when a project is really completed.

Like any benchmark, each “time to result” analysis must be tailored to a company’s specific situation. Analysts will need to define scenarios with the particular data sources, staff skills, and questions involved. One practical way to develop these scenarios is to build a narrative based on previous projects, starting with the original specifications and tracking how these evolved. This narrative can then be converted into specific tasks, such as adding new sources, refreshing existing data, redesigning queries, and creating new reports. Wait times in particular are difficult to assess, because they depend on the priority assigned to any one project. But records of past service levels should provide an estimate of what can be expected on average.

The “time to result” analysis must also distinguish set-up tasks from repetitions. Hardware for a new system is only ordered once, while data will probably be refreshed at regular intervals. The theoretical ideal would be to simulate a year’s worth of projects, taking into account the frequency of different tasks as well as the time for each task. In practice, few companies are likely to be so thorough. But they should still keep these distinctions at least informally in mind.

Set-up tasks also highlight the difficulty of comparing incumbent technologies with new solutions. The first project with a new technology will incur training costs and delays, while doing the project with an existing technology will not. This is a real difference, so it’s wrong to simply ignore the benefits of using existing resources. But it’s also only a one-time cost, which can legitimately be amortized over several years of projects. Again, this requires envisioning a project portfolio of some sort so you can estimate how many projects to include.

Note that set-up costs can sometimes work in favor of a new technology. If the company has steadily expanding or rapidly fluctuating needs, then a technology that makes it easy to add new capacity or train new users will eventually be more economical than an existing technology that is harder to expand. Cloud-based systems, with their low set-up costs and brief waiting periods, are the extreme example of how set-up costs can provide an advantage over the incumbent.

Of the two measures proposed above, “time to first query” applies more to database engines and “time to answer” applies more to BI applications. Even though the new analytical database engines are broadly similar in using columnar data structures on “shared nothing” parallel hardware, there are still substantial differences in how they deal with database design, compression, indexing, incremental updates, new data sources and other tasks. These can impact both the processing and waiting components of “time to first query”. The impact of these differences also varies greatly depending on the volumes and complexity of the data being processed.

“Time to answer” is generally more sensitive to the amount of work that end-users can perform for themselves. This depends primarily on the interface built into the BI application. Today, the most user-empowering interfaces are attached to the non-SQL analytical databases like QlikView, iLuminate, and ADVIZOR. But as BI systems take advantage of the flexibility provided by the SQL-based analytical engines, time to answer will be an increasingly important measure for those databases as well.

Summary

This series started with a simple question: what would business intelligence applications look like if they weren’t designed around the constraints of conventional relational databases? We’ve seen that the systems would be much easier and cheaper to operate and that they would make end-users more productive because they could do much more for themselves. We also described some existing systems that match this description and looked at some of the obstacles faced by advocates trying to get them adopted. Finally, we suggested how two evaluation methods – expanded Total Cost of Ownership and new “time to result” measures – can highlight the new systems’ advantages.

That’s a lot of ground to cover, but consider the topic: reinvention of business intelligence as we know it. Companies must make many changes to adapt to the new technologies but the results – a huge improvement in the value delivered by business intelligence systems and a huge reduction in cost – is well worth the effort.

*      *      *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

2009 Jun 01

When Does On-Demand Business Intelligence Make Sense?
David M. Raab
Information Management
June 2009

Back in February, this column asked “Does On-Demand Business Intelligence Make Sense?” and answered a tentative “Yes”.    The gist of the argument was that the major obstacle to BI success is the skilled labor needed to build the systems, and on-demand vendors can reduce this through strategies including automation, end-user-driven interfaces, advanced database engines, and pre-built solutions to specialized problems.

True as this may be, it is coming at the issue backwards.  On-demand BI should not be a solution in search of a problem.  The real goal is to identify situations where on-demand BI is a better choice than conventional solutions.  In other words, the right question is not whether on-demand BI makes sense, but when.

The general answer is that on-demand BI is the right choice when it meets two conditions.  First, it must solve problems that conventional solutions cannot.  Second, it must not face insurmountable new problems created by the on-demand model itself.

Regarding the first condition, we’ve already identified the required amount of skilled labor as the major problem faced by conventional BI solutions.  But labor isn’t the only potential roadblock.  A project might cost more than it’s worth, take longer than the time available, or require more hardware than the company can immediately deploy.  On-demand solutions generally offer advantages in these areas, so projects that face these issues are good candidates for an on-demand approach.

The second condition addresses on-demand weaknesses.  Common objections to on-demand systems include: the perceived security risk of moving data off-premise; difficulties if close integration is required with other company systems; the cost of outages if the on-demand system goes down or you can’t connect to it; the risk of becoming dependent on a system you don’t control; and the impossibility of enhancing or customizing the application if it doesn’t meet your present or future needs.

This is where picking the right application and the right on-demand vendor become more important.  Let’s start with security.  Most on-demand vendors meet high security standards because their customers demand it.  Still, particular applications or data sets may require even tighter security than a particular vendor can provide.  This might limit your choice of on-demand vendors to those who can prove they meet that standard, force you to load only less-sensitive data, or prevent you from using any on-demand system at all.

Similar considerations apply to the other issues.  Tight integration between an on-demand system and other systems is much easier than it used to be, because some on-demand vendors have provided APIs or otherwise engineered their systems to permit it.  But the details vary from vendor to vendor.  On-demand BI systems in particular are generally not built with real-time external integration in mind.  Nor, for that matter, do most BI applications require it.  You have to look carefully at what you need and what the systems you’re considering can do.

The cost of outages also relates primarily to applications that are closely coupled to other company systems.  For example, a BI system that made real-time credit decisions about new orders could cause major problems if it were unavailable even for a few minutes.  But most BI systems are used for statistical analysis rather than operational transactions.  Although you might be able to find an on-demand BI vendor who could guarantee to meet stringent availability standards, an application that required extreme reliability would probably be a poor choice for on-demand.

Vendor dependency is an issue with any externally-owned system, not just on-demand.  For that matter, reliance on internally-built systems also raises dependence issues.  But on-demand does compound the usual risks of vendor lock-in (unreasonable price increases, unresponsive service, deterioration in quality) with concerns that the system could become unavailable if the company went out of business or its system crashed.  Again, most BI applications are not mission-critical in the short term, so the worst-case scenario – permanent loss of access – is usually an acceptable risk.  Where continued access is truly essential, you may be able to negotiate appropriate back-up arrangements as part of the on-demand agreement.

The final issue is inability to enhance or customize the system.  Unlike the previous concerns, this applies to BI applications in particular.  While requirements for operational applications are usually well known and fairly stable, BI requirements are often poorly understood until users can load and analyze their data.  In addition, the requirements change frequently as new projects and questions arise.   This means that a BI system selected with a particular task in mind may not meet future needs.  The solution lies primarily in the vendor choice, and in particular in matching the vendor’s approach (automation, simple interface, flexible database, predefined solutions) to your situation.  For example, if you expect to work with many large files, a BI solution built around a flexible, scalable database is most likely to accommodate future needs even if you can’t define them precisely.  But if your major constraint is a lack of skilled staff, a highly automated system may be your best bet.

In the end, on-demand systems cannot solve every BI problem.  Some BI projects are best built in-house with conventional technology, and others should be outsourced completely.  You must assess the requirements for each project and match it against the alternatives to select the most suitable approach for every case.

*                            *                           *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics.  He can be reached at draab@raabassociates.com.

2009 May 01

Marketers May Shift from Analytical to Transactional Databases
David M. Raab
Information Management
May 2009

Marketing systems were early adopters of analytical database engines.  This was purely from necessity: list segmentation and analysis were the core application of most marketing systems, and standard relational databases require much tuning and considerable hardware to handle them effectively.  Since marketers rarely had the technology budget to pay for this, they were forced to seek the lower-cost alternatives provided by specialized databases, typically using columnar structures.  Corporate IT departments were rarely enthusiastic about these non-standard systems, but, since marketing was usually a low priority, IT found it easier to let the marketers buy what they wanted and take responsibility for the consequences than to step in and do the work themselves.  No one was quite happy, but an uneasy truce prevailed and everyone got their work done.

Today, specialized analytical databases are a hot fashion in corporate IT.  These include not just columnar structures but also in-memory and shared-nothing parallel processing systems.
So you’d think that marketing systems using of those databases would be the center of attention.

Not necessarily.  It’s true that big marketing systems with big analytical databases are more common than ever, and that IT is more comfortable with them.  But the classic list segmentation and selection functions are less important because today’s marketing is increasingly focused on managing real-time interactions with individuals.  Those interactions largely require conventional transaction processing technology.  As a result, the analytical databases are receding to the role of supporting systems that prepare and stage data for front-line interactions.  The really interesting business and technical challenges are making the front-line systems work better.

The implications of this run deep.  For marketers, it means the traditional divide between marketing and sales is now much less distinct.  Customers and prospects all interact with the same company systems (Web sites, call centers, email, etc.) and external social networks.  To ensure that each visitor is handled correctly, marketing (responsible for prospects) and sales (responsible for customers) must jointly define the business rules and treatments.  This is much more cooperation than those groups are used to.

The change also calls into question the fundamental architecture that has kept marketing databases separate from sales and customer service systems.  Even though Customer Relationship Management (CRM) vendors have long claimed to integrate all three, a look under the hood has usually found that sales and service ran on a transaction-oriented database while marketing ran on a separate, analytical database.  This was partly due to the technical requirements of marketing queries, which made the analytical data structure more effective.  But the separation between sales and marketing activities meant there was no particular penalty for running fundamentally independent systems.  Companies needed just a bit of data synchronization, which might be done in real time but could often run nightly without causing serious problems.

The equation changes drastically when marketing and sales do need to integrate.  Now the advantages of a separate, analytically-structured marketing database must be weighed against the significant impediments to sales and marketing coordination that database creates.  This gives companies more reason to consider alternatives.

The simplest approach is to just get rid of the separate analytical database altogether.  This is increasingly viable because today’s relational databases and hardware can more easily support transactional and analytical processing.  The very largest marketing databases will probably still run on separate analytical systems, but companies with smaller files will increasingly be able to get adequate performance for marketing applications from their transactional databases.  Even if this requires a bit of tweaking to add some extra indexes or summary tables, the overhead will usually cost less than maintaining a truly separate marketing database.

Another option is to keep the analytical database but shrink its role.  Under this scenario, the analytical database is truly used for analytics – that is, research and reporting – while the quasi-operational tasks such as list selection migrate to the transactional system.  This is arguably a more logical structure anyway: since some of the tasks were assigned to the analytical database because it was the only system the marketers had available, not because they required an analytical structure.  For example, many analytical systems work best with batch updates, but have been forced to take real-time updates so that marketers can work with current data.

The fundamental point to remember is that whatever happens to the analytical database, marketing will do an increasing portion of its work on the transactional systems.  That’s the inevitable result of managing individual interactions at touchpoints such as the Web site.  In theory, marketing systems could evolve to include their own transactional components.  But that makes little sense when most companies already have these in the sales and service components of their CRM systems.  It’s more likely that the long-standing promises of CRM systems to encompass marketing will finally be fulfilled, and separate marketing systems, like separate analytical databases, will play a smaller, more specialized role.

*                            *                           *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics.  He can be reached at draab@raabassociates.com.