That giant sucking sound you hear (well, viagra one of them, anyway) is matching software vendors being vacuumed into larger companies. Within the past year, Nokia purchased Intellisync, Business Objects acquired Firstlogic, Informatica bought Similarity Systems, and IBM alone has added Ascential, SRD, and Language Analysis Systems. Go back a bit further and you’ll see Pitney Bowes buying Group 1 Software (2004) and SAS Institute acquiring DataFlux (2000). There are only a few independent matching software vendors left and they are not exactly household names: Data Mentors, Innovative Systems, Data Lever, Intelligent Search Technology, Choicemaker Technologies, Netrics. Trillium Software seems a prominent exception, but even Trillium is owned by a larger firm, Harte-Hanks.
In one way, big software companies’ interest in matching software is a positive development. It shows that they fully recognize the importance of sophisticated matching in building enterprise systems—something that was not always the case. Given that many companies still rely on primitive, home-grown matching techniques, wider deployment of high quality matching software should result in major improvements in the over-all quality of enterprise customer data integration.
But vendors’ decisions to buy and incorporate their own matching systems does reduce the pressure to produce the best matching systems possible. It’s yet another installment in the endless soap opera of “suite vs. best-of-breed”: most buyers of a major business intelligence or data integration solution will accept whatever matching product the vendor includes, whether or not it is really the best tool for their particular needs. With a built-in market assured, the in-house developers supporting the integrated matching systems have little incentive to increase quality. Most will also face pressure to reduce costs and to focus their remaining resources on integration with their new parents, not on product innovation.
Where does this leave potential buyers? As with any suite vs. best-of-breed decision, they must decide whether the extra value provided by a best-of-breed product is worth the extra cost of integrating it. In the world of matching systems, this is a particularly difficult judgement because it’s so hard to compare the quality of matching systems to begin with. Few buyers know how to set up a proper test, and even they can make valid comparisons only after substantial investment in tuning the different tools for their particular data. Unless a project is specifically focused on finding the best matching system available, it’s hard to carve out the schedule and technical resources to make a thorough competitive assessment.
So let’s assume you end up with whatever tool your suite vendor has provided. First of all, don’t be too concerned: the products the vendors have purchased are all pretty good. But this doesn’t mean you should just label the problem as solved and walk away.
In fact, your work has just begun. You need to be sure you get the most out of the tool you’re using. Here are a few tips:
– Tune, tune, tune. Some systems are more automated than others, but all matching systems must be adapted to the particular data they are working with. Sometimes this is just a matter of running sample files through the software so it can build a statistical profile of likely values. More often, you will be looking at specific matching rules, tweaking them, and assessing the results. In all situations, you need to ensure that your test files contain a good sample of the actual data your system will be matching. It’s a painfully common error to work with the most easily available data, not the most representative. If your system doesn’t provide much in the way of tuning assistance, take a look at software from DataDelta, which specializes in this sort of analysis.
– Take a broad view. A rented mailing list will contain just names and addresses, but your internal source files are likely to contain other data elements that can be useful for matching, such as telephone and account numbers. Default matching logic probably won’t take these into consideration but any decent system can expand its scope to include them. These can be tremendously powerful in helping find otherwise unidentifiable matches.
– Allow multiple answers. It’s easy to think that matching decisions have one right answer: either two records refer to the same person or they don’t. But most matching deals in degrees of certainty, and different levels of accuracy are appropriate for different purposes. Sending two catalogs to the same person is pretty cheap; failing to link related accounts could cost you a million dollar customer. So don’t be afraid to deploy different matching rules for different applications. Where larger groups such as households are concerned, there’s even more reason to employ multiple definitions.
– Rely on local knowledge. No statistical algorithm or generic rule set can accurately capture national and cultural idiosyncrasies in treatments of names and addresses. Software that deals seriously with international matching will have local editions with relevant rules and reference tables. Be sure to employ such add-ons or, if necessary, have them created.
– Recognize the system’s limits. Software built for name and address matching can sometimes be used to match other types of data. But how well it does this depends on the particular technology. Look carefully at how your matching system works, and above test it with live data, before assuming it can be applied to tasks outside of its primary domain.
– Go outside if necessary. You may find yourself with a matching problem that your default system just can’t handle. If so, be aware that most modern matching software is designed to integrate with other systems via APIs and, increasingly, through service calls. This means that integration of a better-suited external system will probably be easier than you think. Given the importance of matching correctly, don’t be afraid to look for a solution that truly meets your needs.
* * *
Leave a Reply
You must be logged in to post a comment.