2007 Jun 01
Netrics Netrics Matching Engine
by David M. Raab
DM News
June, viagra 100mg 2007
Most direct marketers think of data matching in terms of merge/purge: a way to identify and remove duplicate names across multiple lists. But merge/purge is rarely a concern in the larger world of data processing, store There, remedy matching is a component of customer data integration (identifying data in different systems that belong to the same customer) and master data management (consolidating data relating to all kinds of entities). Matching is also part of search applications that help users find people, products, documents, locations and other entities even when they don’t have complete or fully accurate information.

These are complex applications with many moving parts: multi-table data structures, relationship hierarchies, data acquisition, indexing, ranking and display. But matching remains a critical core function.

The specific purpose of matching is to find records that refer to the same entity, even though the records themselves are different. In a strict sense, matching involves direct comparisons of data strings. But in the real world, this is often supplemented by external reference data such as a list of all known products or all the names used by a business. This external data often allows connections that could never be inferred from strings alone, such as the fact that the John Jones who used to live in Chicago is the same John Jones who now lives in San Diego. For names and addresses, external knowledge allows parsing of data into elements such as first name, last name, and street number, so the same elements can be compared across different records. This external knowledge includes information about specific words (“David” is likely to be a first name, “Nebraska” usually is a state, “Bob” is a nickname for “Robert”) and information about common formats (“the final line in an address is likely to be in the order of city, state, and postal code, unless the first word is ‘attention’”). As that last example suggests, external knowledge implies rules as well as simple lists, and can get very complex.

In practice, parsing and standardization based on external knowledge are critical to successful name and address matching. But even the most sophisticated knowledge-based processing cannot remove all errors in a set of data. In fact, standardization and parsing can introduce errors of their own. To make matters worse, external knowledge may not be available once you move beyond well-understood structures like mailing addresses. So, in the end, there is always a need to compare two strings and decide whether they are similar enough to call them a match.

What differentiates matching engines is how they make this comparison. Simple matching systems often create a “match key” by extracting a few significant digits (say, first name initial, first three consonants in the last name, house number, city and state) and allowing a match if these are the same. Other systems use phonetic standardization such as Soundex to compensate for spelling errors. Some allow a match if strings have no more than a specified number or percentage of differences among the characters. Still others apply statistical techniques that take into account not only the similarity of the strings, but how common they are: so a common name like David Jones not be considered a likely match for David James, while an unusual name like Zydrunas Ilgauskas might match with Sid Iglakis. Often the systems assign separate match scores for different elements and then use weights or rules to assign a match score for the record as a whole.

Netrics Matching Engine (Netrics, 609-683-4002, www.netrics.com) applies a mathematical technique called “bipartite graph matching” to measure the similarity of strings. The general idea is to mimic human decisions by finding similar sequences of letters, even if they occur at different locations within two strings. This can compensate for data entry errors and deal with information that has not been parsed into separate fields. It also means the method can be applied to problems other than name and address matching. Netrics says its approach is more accurate than simpler methods such as matchkeys and Soundex, and more efficient than character-difference comparisons.

Like other matching engines, the Netrics engine returns a score that shows the similarity of the strings it compares. The system can also highlight matching blocks of text, making it easier for people to review why the system found a similarity.

Netrics also provides a Decision Engine that can use similarity scores to decide whether a pair of records is considered a match. The Decision Engine starts with examples of known matches and non-matches. With name and address records, these would typically be parsed into separate elements, although they could also be unparsed text blocks. The sample records are run through the Matching Engine and then the Decision Engine, which infers the decision rules (basically, weights and cut-off ranges for element similarity scores) that distinguish matches from non-matches. The system automatically adjusts its rules until its own decisions are acceptably consistent with the “correct” answers provided as part of the input. Users can provide additional examples of particular types of matches if the system performs poorly at identifying them. A couple thousand sample pairs are typically required for training. The Netrics approach is considerably easier than having users specify the rules explicitly.

Netrics is used both to search for individual records in a reference file and for batch deduplication such as merge/purge. It loads the data into system memory, which allows quick performance. The system has been tested on databases with hundreds of millions of records, returning as many as 25 matches per second. The product was released in 2000 and has more than 100 installations, mostly in healthcare and government agencies. About half the installations involve name and address matching, while the balance involve other types of data. The software is usually purchased through business partners, such as applications providers and systems integrators, who incorporate it into products they deliver to their clients. Pricing is based on the number of processors in the host computer, starting at $50,000 for a two-processor server.

* * *
David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

2007 May 01
Knotice Concentri
by David M. Raab
DM News
May, treatment 2007
Broadly speaking, sale there are two kinds of customer management systems. Campaign managers generate lists for outbound direct mail, email or telemarketing. Real-time interaction managers react to individual customers during a Web site visit or telephone call.

The technologies needed for the two approaches are significantly different. Most interaction managers use the single profile table to ensure quick performance, while most campaign managers access multi-level customer databases for complex segmentation and in-depth analysis. Even systems that do both interaction management and campaign management run different internal processes for each. Many companies deploy entirely separate systems for each approach in each channel.

Actual message delivery is usually handled by yet another set of systems, such as digital printers and Web server software. Email and text messages are exceptions: many customer management systems can transmit these directly.

This fragmentation has organizational as well as technical roots. But the costs are severe: beyond the considerable expense of supporting multiple systems, businesses must rely on administrative processes to coordinate the treatments received by individual customers. Inconsistencies that slip through can reduce value and sometimes do serious harm to important customer relationships.

Concentri (Knotice, 800-801-4194, www.knotice.com) attempts to unify customer management systems for email, Web, mobile phones, and interactive TV. Structurally, it maintains a shared customer database with levels for profiles, activity history, and responses to forms such as surveys. This falls between the simple profile table of interaction managers and the complex marketing database of the campaign managers. Data matching, transformations and consolidation must be done outside the system before the tables are loaded.

Functionally, Concentri supports both batch selections for outbound campaigns and real-time responses for interaction management. These are managed through a single campaign interface, which lets users define selection rules that can be attached to email campaigns, SMS mobile text messages, Web pages and WAP mobile web content. The system delivers the email, mobile messages and Web pages directly.

Concentri’s use of segmentation to both select lists and control real-time content is the specific trick that lets it combine outbound and interactive marketing. The segment definitions themselves are fairly conventional: users select from pull-down lists of data elements, operators and values to build logical expressions which can be combined into multi-condition queries. The data elements can draw from the customer profile, activity history, and form responses. Concentri provides a standard database structure. Non-technical users can add custom elements through an administrative interface.

For outbound campaigns, the segments can create lists which are either frozen to store a specific set of customers or reselected each time the list is chosen. Email and mobile messages can be sent to the list on a regular schedule or when triggered by behaviors captured in the system.

For interactive campaigns, the segment can be treated as a “content display rule” which is attached to a specific piece of Concentri-created content. For example, content displaying vegetarian products could be linked with a display rule that selects only vegetarians. What makes this interactive is that the data determining who matches a particular display rule is updated in real time by the system’s activity tracking and form capture components. So a customer who answered a particular question a particular way could automatically be shown Web content that reflects this information, as well as sent an email with suitably customized contents.

This approach takes some getting used to. Conventional interaction management systems work a bit differently, defining a sequence of customer actions and system responses such as a telemarketing script. Concentri’s approach is more similar to Web advertising systems that use business rules or call an external recommendation engine to pick content relevant to a particular customer. Either way, Concentri should provide the practical benefit of real time interaction management—that is, the ability to react to customer behaviors as they occur.

Concentri also differs from conventional interaction management systems in providing functions to create the content it delivers. The system includes base templates for email, Web pages, Web forms, text messages, WAP pages and WAP forms. Users can modify these with Concentri design tools and attach specific contents to regions within the templates. Display rules are attached to the content, not the template.

Content elements can either be created for a specific channel or defined as “master elements” used in templates across multiple channels. Sharing these master elements provides a consistent customer experience and makes it easier to understand which messages have been sent to each customer. Concentri automatically adjusts how master elements are displayed in each template to accommodate different channel formats.

Content management includes practical refinements such as Web page previews, sending test messages, and simulation of how emails would be displayed in different email clients. Web contents can either be embedded within an external Web page as “live zones” that call Concentri when the page is rendered, or displayed with a Concentri-built page that is reached through a link on an email or other Web page. Either way, the content resides on a Concentri server.

Concentri automatically keeps adds the contents displayed to each customer to its activity history. At present, it uses third party cookies to track Web visitors, although the vendor is exploring use of the more-reliable first party cookies. The system can also use standard interfaces to import data on customer behavior captured by Web analytics systems like Omniture and WebSideStory. It can insert those systems’ tags in Concentri content, allowing them to capture Concentri activity.

Reporting includes conventional Web, email and mobile measures such as page views, messages delivered, and email opens, clicks and conversions. Concentri also reports on total impressions for each piece of content, in total and by campaign.

Concentri is offered as a hosted service by Knotice. Pricing depends on channels used, activity levels, and level of support. A system used across all channels would cost $350,000 to $500,000 or more per year, depending on volume. Knotice was founded in 2003 with a focus on cable TV and broadband companies. Concentri was introduced in 2006 and has about 20 installations.

* * *
David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.

2007 Apr 01
ClickFox, order Inc. ClickFox
by David M. Raab
DM News
April, click 2007

Companies can save a lot of money if customers use self-service systems like Web sites, kiosks and interactive voice response (IVR). But they can lose even more money if the systems are so annoying that customers take their business elsewhere. Since the systems run unattended, special efforts are needed to understand how customers interact with them and identify potential problems.

The key tool in such efforts is a “funnel analysis” that tracks how customers enter and exit particular paths such as a service request or checkout process. Funnel reports are built by identifying the sequence of stages a customer passes through during a process. Analysts pay special attention to transition points where customers drop out or take undesired actions such as asking for live help. They must also look at the process as a whole to ensure that changes which appear to be improvements at one stage do not harm performance somewhere else.

Most self-service systems maintain some type of log file that records the individual events and associates them with customers. But the Structured Query Language (SQL) used with conventional relational databases does not easily identify the pairs of records that indicate movement from one funnel stage to the next. This means that special analytical tools are needed to convert the log files into meaningful information.

For Web sites, vendors like Coremetrics, ClickTracks, Omniture, WebSideStory and WebTrends have built such systems. They capture the required data either by reading the Web server log files or by tagging selected Web pages with small Javascript programs that they notify a data collection server when the pages are rendered. Either way, page views are tied to individual customers with cookies (small files placed on the user’s PC by the Web server) that send a session ID and/or customer ID with each page request. The analysis system uses these to knit the page views into a picture of how customers moved from page to page during the session. To create a funnel report, users identify the specific pages or groups of pages that represent each stage. The system then extracts movements among those pages from the larger set of data. Page views are often supplemented by additional information, such as purchase amounts or customer attributes, which is captured in other systems and matched back using the customer or transaction IDs.

ClickFox (ClickFox, Inc., 877-256-3761, www.clickfox.com) provides customer path analysis across many types of self-service systems, although its particular focus has been IVRs. It classifies the events captured in the system’s log file, using a model that shows how each event fits into the general flow of the system. ClickFox then imports the system logs which show the events experienced by individual customers. It maps these ont the model and displays a visualization of their path through the system.

The key to this is building the model. ClickFox reads the log files to build an initial map of the application, which it presents to users for clarification and fine-tuning. ClickFox engineers then create the actual models. Models can also be imported from third-party flow design tools such as Cisco Audium.

ClickFox can display the path for a single customer interaction, paths for similar customers, or paths for similar interactions. Users define customer segments and interaction types with a combination of event log data and information imported from other sources. Such information might include customer attributes, transactions, revenues, or costs.

Users can also identify a set of events that make up a particular funnel or task, such as opening a new account or placing an order. Other events can be labeled as task outcomes. This lets users analyze particular customer activities and identify problem areas.

One ClickFox model can track customer activities across different systems, so long as events within the individual systems have been mapped and interactions relating to the customer can be linked. If the different systems use different customer IDs, ClickFox can maintain a cross reference table that captures the relationship. Creating the table itself—that is, identifying which IDs in different systems relate to the same customer—must be done externally.

Analyses can combine segmentation and task definitions to examine whether different sets of customers react differently in particular situations.

ClickFox reports can show any behaviors captured in a model. Interaction systems cannot provide many of these because the behaviors are defined by model categories. These include task success rates, distributions of outcomes, and behavior by segment. The system can generate financial evaluations such as return on investment models, although this requires some custom development. It can also issue email alerts when specified conditions occur. Although log files can be uploaded frequently, the system does not report on results as they occur in real time.

The system can compare tests of different interaction system rules, so long as the test cases can be separated into distinct segments. But, because its reports are limited to actual log data, it cannot perform “what if” analysis to estimate the impact of proposed rule changes.

ClickFox also has some “automated intelligence” that flags behavior patterns which appear to indicate problems. These might be frequently skipped stages in a standard process or frequent cycling between two stages. But the vendor reports these features are used less often than data visualization to determine which situations to explore.

The system holds the log data in a proprietary file format for better performance. It has processed IVR records from more than 20 million calls per month. Although the primary focus has been on IVR data, ClickFox says about half its clients now combine logs from more than one channel.

The company was founded in 2000 but until recently has worked mostly on consulting projects. It is now expanding aggressively and has about 20 active customers.

Most customers use a hosted version of the system, although the software can also be licensed and run in-house. Annual fees can range from $150,000 plus services to several million dollars, based on the number of sessions tracked.

* * *
David M. Raab is a Principal at Raab Associates Inc., a consultancy specializing in marketing technology and analytics. He can be reached at draab@raabassociates.com.