2006 Sep 01
New Trends in Predictive Analytics
David M. Raab
DM Review
September 2006

“Predictive analytics” is widely touted as one of the Next Big Things in business intelligence. Yet most discussions of the topic really describe the Same Old Things that statisticians and business analysts have been talked about for years. These are models and scoring systems that predict the likely behavior of customers, viagra sale prospects and (outside of marketing) other entities such as prices and demand. These always were, buy and still are, there immensely valuable applications of business data. But they are not new.

The generic descriptions of Predictive Analytics do include a few hints of fresh approaches. Even these are not truly new ideas, but they are just now becoming practical for broad implementation. This is mostly due to the relatively recent availability of infrastructures such as data warehouses and integrated customer management systems needed to supply the necessary data and deliver the predictive analysis results.

These newer components include:

– real-time execution. Traditionally, predictions were made in batch processes such as scoring prospect pools for outbound marketing campaigns or assigning a customer rank to guide customer service treatments. Scores were static, calculated either once or at regular intervals such as monthly. This largely reflected the difficulty of assembling the underlying data—remember when most data warehouses were updated monthly?—and of building the predictive models and applying the scores.

Today, predictions are increasingly calculated as they are needed and as new data is generated during real-time interactions. Pricing is adjusted based on current inventory and continually-revised demand forecasts. Offers are optimized through constant testing of alternatives. A prospect’s most recent search request is factored into the choice of which products to offer next.

– front-line deployment. Real-time analysis is most important when it can change the course of an on-going interaction. This means that analysis must be pushed to front-line customer interaction systems such as call centers and Web pages. These systems must be able to capture the interaction data, transmit it to the predictive analytics engine as part of a prediction request, and receive the engine’s output. Or, the engine must itself be somehow embedded within the front-line system, perhaps by having the front-line system do scoring calculations using a formula the analytics engine has generated. Service oriented architectures (to mention another Next Big Thing) make this sort of integration much easier than it used to be. What’s new, and important, is the ability of the front-line system to choose which predictive analysis capabilities it will execute, rather than passively accepting whatever predictions were prepared in advance. As a result of this new ability, the front-line system can string together multiple predictions as needed to guide the course of an interaction.

– customer-level evaluation. Traditional scoring methods made choices about groups of customers, such as which groups to include in a promotion or which groups would be offered each product. Of course, the groups were still made up of individuals, but everyone within a group was treated the same and all analysis of results was conducted at the group level. A model that performed well on average was considered a success.

Newer predictive analytics applications make decisions about specific individuals in specific situations. New decisions are needed as a situation evolves, so several scores may be generated during the course of a single interaction. This means that customers cannot be treated as members of large, undifferentiated groups, because small differences in individual history become important factors in selecting treatments. Success is not measured in terms of how accurately a specific model predicts results but in the value attained from an interaction, which may have executed many models. Better still, though harder to achieve, success should really be measured across many interactions by tracking the long-term change in customer value.

These three changes—real-time execution, front-line deployment and customer-level evaluation—pose significant challenges for traditional predictive analytics techniques. When predictions were created by expert users and deployed in a small number of large projects, the users could be trusted to carefully monitor the quality of the incoming data, models themselves, and final applications. But in the new situation, many more models are used by much less sophisticated users without a simple way to measure the quality of the results. The only practical way to create the required number of models is to rely on automated methods, but this removes the expert model-builders who are most likely to identify problems or opportunities for improvement. At the same time, stringing together multiple predictions during an interaction makes it likely that certain combinations of outputs will yield poor results, but hard to find those combinations among the majority that work correctly. Bad data on individual customer records, which has little impact when the customers are lumped into large groups for simple treatments, becomes more dangerous when it is used to generate specific recommendations during one-on-one interactions.

In an automated front-line system such as a Web site, there are no human users to notice when things go awry. Even in a call center, agents are likely to be trained not to question system recommendations and, once they have gained confidence based on its initial success, are unlikely to examine them critically. Nor, when the ultimate measure of success is long-term customer value, can they intuitively judge whether the recommendations are anything near optimal. Making error detection still harder, the systems must be designed to function even if the models are failing—you don’t want to stop taking orders just become cross-sell predictions are unavailable. This typically means supplying default recommendations in place of the model-generated ones. Although this is operationally appropriate, it also means that end-users may not even be available that a problem exists.

Addressing these challenges imposes new requirements on predictive analysis systems. The most obvious is the need for fast, efficient model building and deployment. This typically requires using non-traditional techniques to develop and validate the models, since traditional methods tend to be slow and labor-intensive. The resulting models may not be as accurate as those built with traditional methods, but it’s usually a worthwhile trade-off to give up some accuracy in return for getting more models more quickly.

Almost equally important, the new system must be able to monitor its own performance and adjust when old models stop working well. There are many reasons such a fall-off may occur, ranging from changes in source data, to temporary interruptions in data feeds, to competitive and environmental factors, to evolution of the underlying customer behavior itself. Different problems call for different solutions, but the fundamental requirement is always the same: the system must compare its results with a minimum acceptable performance level and react when that level is not met. This implies an automated feedback mechanism to capture results from front-line systems and then to determine whether those results are meeting expectations. When results are not satisfactory, the system needs at a minimum to inform its administrator and switch to a default mode, such as rule-based recommendations. Ideally, it will also run automated diagnostics to identify why the recommendations are not performing and take corrective action such as rebuilding the models with new data.

The same fundamental ability to monitor results supports automated optimization. The simplest form of this is to create competing models, test them against each other, and adopt the winner as the new standard. If this is done continuously and automatically, such an approach naturally adjusts over time to whatever changes cause old models to fail. A more sophisticated approach to optimization moves beyond head-to-head testing of individual models, to find when a single model can be replaced with multiple models, each of which applies in a particular situation. This often makes sense in start-up situations where the amount of historical data is initially limited but will grow over time.

A third major requirement is that the system easily integrate individual models into the larger flow of customer interactions. A single interaction may itself involve multiple predictions, and a long-term customer relationship will involve many interactions. Part of this integration is technical: it must be easy for managers to define an interaction process and specify the points within this process where predictions will be requested, received and used. But the harder part of the integration comes back to response analysis: when assessing results, the system needs to be able to measure not simply the independent performance of individual models, but the ultimate result of several models working in concert. This means the response analysis function must identify and link prediction requests that are part of the same interaction and then relate these to an ultimate objective.

The linkage might involve nothing more than an interaction identifier similar to the session ID used to link different events during a Web site visit. A more powerful approach would actually import the structure of the interaction—that is, the rules governing the interaction flow—from the front-end system itself. This would allow the system to simulate the outcome of alternative decisions, which is necessary for true optimization.

Optimization also requires that outcomes be judged against an ultimate objective. This could be relatively simple, such as maximum revenue or profit contribution, or something more complicated such as an estimated change in long-term customer value. Of course, building a customer value model is a substantial predictive analytics project in itself, so this model might exist apart from the portions of the system that use predictive analytics to guide interactions.

It’s easy to get carried away with the vision of an autonomous, self-optimizing predictive analytics system. In reality, and for the foreseeable future, such systems will operate within the narrow frameworks provided by human-designed interaction flows and treatment alternatives. The potentials for both benefit and harm are therefore somewhat lower than they might seem. Yet even within these constraints, real-time, front-line, customer-level predictive analytics offer substantial new opportunities to increase business value. So they are definitely worth doing—so long as you take the time and effort to ensure they are done right.

* * *

David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

Leave a Reply

You must be logged in to post a comment.