David M. Raab
DM Review
January 2008
One of the bedrock goals of data warehouse projects is a “single version of the truth”. Yet truth is rarely so simple. In the classic example, seek a “customer” is one thing to a sales person, another to the shipping department, and something else to accounts payable. Good data warehouse designers recognize this and build different definitions into their systems, so users can access whichever version they need whenever they need it.
Yet the notion of a single version of the truth persists—and warehouse teams invest huge resources negotiating shared business models to define it. Why?
The problem is often described as “dueling spreadsheets,” where managers argue over whose data is correct. This is apparently something to avoid at all costs.
Personally, I love a good debate over data. But if you do want to prevent those arguments, you have to understand what causes them. Just providing a “single version of the truth” won’t do the trick, precisely because any data warehouse rich enough to be useful will contain enough variations of the truth that it, too, can produce conflicting results.
Let’s start at the beginning. Managers rely on whatever data sources they have available. In the absence of a warehouse, these are usually their local operational systems. Managers use these systems not just because they are handy, but also because they understand their contents. Since learning about a data set is often the hardest part of an analytical project, it’s perfectly reasonable for managers to rely the data they know.
“Dueling spreadsheets” happen because each manager’s local data set is an incomplete view of an entire problem. Call center managers can see call center information, and might do an analysis that shows how to minimize call center costs. But the service manager will see service costs that result from poor call center treatments, such as dispatching repair people for problems that could have been resolved over the phone. Each manager can analyze her own data correctly and reach opposite conclusions about the best course of action.
Putting all that data into one warehouse wouldn’t solve the problem. The single version of the truth (that is, a shared data model) will include both call center data and service data. If each manager simply extracts her own department’s information, she will still end up with conflicting results.
The only thing that will change this is if both managers pull both departments’ data. Indeed, each manager really needs all the relevant data, which probably comes from many departments. Here is where the central data warehouse truly adds value: it makes all that data accessible in an integrated format.
But this brings us back to the original problem. Managers will use the data they find most familiar. Even if they have access to a comprehensive warehouse, pulling data for all different departments requires understanding where to find that data and how to combine it. Managers are unlikely to take the time to learn how to do this. Instead, they’ll either go back to their familiar local sources or pull the equivalent data from the central warehouse. Either way, they get the same, incomplete result.
This problem can be mitigated but not really solved. It can’t be solved because managers don’t have the time or inclination to learn about proper warehouse procedures. Mitigation means making it easy for managers to see the data they really need, even if they didn’t think to look for it. It also means making it at least as easy to get that data from the warehouse as from local systems.
Waiting for the IT department to build a new data cube definitely does NOT count as easy, particularly if the manager could already pull it from the local system for herself.
It’s tempting to “solve” the problem by mandating use of warehouse information, but that probably won’t work. Many managers will just do without the information rather than invest major time in learning a new system. Or they’ll look at data from the local system and not show it to anyone else. Or they’ll ask an analyst to do the work for them, but only when it’s worth the extra time, cost and trouble. Remember we’re talking here about managers, who have considerable discretion in how they do their jobs.
Analysts are another story. Learning new systems is part of their job and, if they’ve made the right career choice, it’s something they enjoy. Moreover, they are part of a community of other analysts that should enforce its own standards for the quality of work. Therefore, it’s perfectly plausible to require that analysts draw their data from a warehouse. Not that this should be a problem: all possible data assembled in one place is an analyst’s fondest dream come true. Requiring them to use a warehouse should be about as hard as requiring a kid to visit a candy shop.
In short, the warehouse is an analyst’s tool. Getting managers to use it adds requirements which companies may or may not decide to meet. Achieving a “single version of the truth” takes more than a unified data model: it means a thorough change in how companies analyze their data. Warehouse teams that fail to recognize this can never meet expectations.
* * *
David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.
Leave a Reply
You must be logged in to post a comment.