David M. Raab
DM News
November, 2005
.
Not so long ago, customer data integration referred to matching customer records from different sources when assembling a marketing database or data warehouse. It was a challenging task but one with limited implications. After all, the data warehouse was used primarily for analysis, not day-to-day customer interactions. So any consolidation errors were hidden from the customer and most company staff as well.
But customer data integration today extends beyond the traditional batch updates of data warehouses. Companies want to assemble and distribute up-to-the-minute customer data not only for marketing and customer service, but also to meet government requirements for surveillance, privacy, risk management and corporate reporting. With legal as well as financial issues at stake, accurate and efficient consolidation has become an urgent need.
The basic techniques for real-time customer data integration have been understood for some time. Operational systems need a single, central reference source that combines information from all inputs. The chief difference among competing solutions has been the amount of data stored within this repository. One option is to store all the information–either in a data warehouse or one dominant operational system. The other extreme is to store almost nothing centrally, except possibly links between related records in different systems. Complete customer data is then assembled as needed by querying the source systems directly. Intermediate solutions store some data centrally and gather the rest on demand.
Each approach has its advantages. Centralized storage means all data is immediately available and allows more sophisticated consolidation processing since the work can be done in advance. But it requires conforming to the data model of the primary operational system, which can be constricting, or copying a great deal of data into a separate warehouse. Distributed storage always returns the freshest data and can access detailed information without the cost of moving and storing it. But on-demand access and consolidation can face performance issues.
Siperian Hub (Siperian Inc., 650-571-2400, www.siperian.com) takes the middle approach. The system builds a consolidated customer record, using sophisticated business rules to pick the most reliable information when different sources conflict. This then becomes available to other systems as a master customer record. Siperian can also capture and present multiple hierarchies used to classify customers, such as households, businesses, geographic regions, product lines, and industry groups. A further extension, due by the end of this year, will help synchronize data among the source systems themselves.
The heart of Siperian Hub is Master Reference Manager. This performs four main functions: importing data from multiple sources; matching records that refer to the same customer; selecting the best data as a customer master; and making the master available to other systems.
The import functions are fairly straightforward. A graphical interface lets users map external sources to the Hub’s data model. The model is based on Siperian templates but customized as necessary for each client. A key feature is that it keeps enough detail to reconstruct original input for audit and roll-back purposes. Mapping itself is largely manual, although Siperian does have prebuilt maps for common source systems such as Siebel. Data is loaded from source systems in batch or near-real-time message queues using Web services or enterprise Java beans.
The imported data must be cleansed and matched. These are demanding specialties, so Siperian uses third-party software. Cleansing, such as ensuring postal codes match city and state names, can be handled by products such as Trillium or FirstLogic. For matching, which is more tightly linked with Siperian’s own processing, the vendor has integrated technology from Identity Systems (formerly Search Software America). Siperian provides its own graphical interface to let users review and resolve questionable matches.
Building the master customer record is Siperian’s own core expertise. Users set up a “trust framework” of rules that determine which version of each data element should be adopted. The rules generate a score for each element, based on its source, recency, and syntax (completeness, format, appropriate characters, etc.). It also applies a decay rate that reflects how quickly different values are expected to change: for example, email addresses decay much faster than names. Scores are calculated separately for each data element. Users can also treat a set of elements as a block to avoid inconsistencies such as mixing the street name from one address with the city from another.
The rules are built during system implementation based on user judgments. Siperian does not provide formal statistical analysis of inputs or rule results. It does have a standard rule set for the pharmaceutical industry and is building sets for financial services and publishing.
Contents of the master customer record will change as new data appears. Siperian provides features to record these changes, trace the original source of each value, and roll back to earlier versions if necessary.
The master customer record is stored in a conventional relational database table, usually Oracle. External systems can query the table directly or access it through Web services XML APIs or Java APIs. Siperian can also push changes to message queues where external systems can read them.
Siperian supplements Master Reference Manager with Hierarchy Manager. This links records in multiple, independent hierarchies imported from source data. Starting from a single record, users can navigate its different hierarchies using the data steward interface. Other systems can also access the hierarchies via calls to Siperian APIs.
An integrated version of Activity Manager, due for release in early 2006, will transmit customer activity data from one source system to another. It will manage complex processes, such as creating a new account in one system when a transaction in a different system indicates it is required. Activity Manager will incorporate a mix of Siperian and third-party technology.
Siperian Hub runs on Unix, Linux or Windows servers and integrates with all major application servers. Pricing is based on project scope and complexity and ranges upwards of $600,000 for a perpetual license. Siperian released its initial product in 2002 and has 18 clients for Siperian Hub.
* * *
David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.
Leave a Reply
You must be logged in to post a comment.