Comparison is an integral part of data analysis. Effective marketers compare changes to their data over time (i.e. how campaigns progress or customer sentiment evolves) and they conduct side-by-side comparisons of similar data elements (such as conversion rates for one offer versus another). But comparative analysis is only effective when you understand the data you are comparing and the relationships among different data elements. Every data-driven marketer should consider a couple of questions when conducting data analysis: “what, exactly, is my data measuring?” and “am I making valid comparisons?”
Previously I talked about the importance of knowing the relationship between your data and the real-world things your data describes. In this post I’m building on the concept of knowing what your data actually measures to help you make better decisions. The same principles apply whether you are a marketing practitioner using data to improve a specific function or a data scientist responsible for transforming data on behalf of an entire organization.
Each piece of data in your database represents a real-world concept. Oftentimes, these concepts are not well defined and their meaning is implied only through basic naming conventions. For example, it’s easy to assume that a “visitor” to your website is a living, breathing human being; but even that simple assumption is often wrong. Layered on top of the conceptual disparities are differences in measurement tools used to capture data in the first place — differences that are usually invisible to the end data consumer. Measurement technology is ultimately what defines the boundary between what is counted as a particular data element and what is not. For example, you could define a “video impression” as the press of a play button, the delivery of a video asset, initiation of video rendering in a player, or an arbitrary milestone through which a video asset has played. Each of these definitions yields a different outcome in the data that impacts your analysis and conclusions.
Establishing clear definitions for your data should be the primary pursuit, but given that much of the data you’ll work with is ambiguous, here are a few common pitfalls to avoid. These mistakes will lead to bad comparisons and poor decision making from your data.
You’re treating different things as if they were the same
This may be the most common mistake when it comes to using data. Too often marketers make poor assumptions about what their data really represents. A good example is the treatment of browser cookies as individual consumers — a common marketing practice for years. This technique met with satisfaction despite inaccuracies because better solutions were hard to come by. Fortunately, better measurement solutions are emerging as personal consumer devices proliferate and marketers provide stronger incentives for consumers to share identity information.
Digital ad impressions are another example of ill-defined data elements. The “impression” label is misleading since ad impressions have historically been counted whether an ad is visible to a human subject or not. Better data is emerging in the form of “viewable” ad metrics to account for the considerable differences between viewable and non-viewable impressions, but unless your data consistently distinguishes the two your analyses may be skewed. The industry is rallying around standard definitions of viewability for different ad types but marketers should remain vigilant to the adherence of these standards by different data and technology vendors.
Marketers should also be mindful of differences in measurement granularity for the same or similar data elements. For example, adding weekly unique visitors throughout a month will not produce a count of monthly unique visitors. They’re not the same thing. You also cannot presume that your customer on ‘Digital Drive’ is a certain age and gender because your demographic-by-zip-code database suggests specific values given the address. Granularity plays a principal role in the application of time and location data but also applies to other data types.
You’re measuring the same things in different ways
There’s a difference between measuring the same thing and measuring the same way. Even if all your marketing technology vendors use the same definition for a data element in concept, each one will produce varying results due to differences in measurement capabilities. Differences in data collection code or technology, location and availability of data collection centers, and compatibility with consumer devices all play a role in determining how complete and accurate the data will be.
Aside from differences in measurement technology, it’s also important to understand when you’re measuring different events. In my previous video impression example, each of these four moments is a different event that should be understood separately. Variances in these metrics might be more valuable for technical diagnostics than for marketing analysis, but it’s still important when combining data across sources that you’re comparing the same events. Even events that occur milliseconds apart can produce sizable variances. IAB guidelines for Interactive Advertising treat a 10-percent variance between advertiser and publisher metrics as acceptable for measurement.
You’re analyzing incomplete or overlapping data
If you’re pulling data together across many sources the likelihood of having incomplete, orphaned, overlapping and redundant data is high. Before making comparisons ask questions like, “Does my campaign data include results from every DSP and media source used in my campaign?” and “Is my cost data aligned against the right conversion revenue or is there a mismatch?” Also, if you’re filtering results using metadata you might be missing critical data for your analysis — if certain metadata is not one 100-percent available or consistent. Look carefully the next time you filter products on a retail site like Amazon and compare the number of results available for a given metadata category and the overall set of product results. When metadata is not available for every item you may be removing instances you assumed were part of your analysis.
Hierarchies and classifications appear throughout most complex databases. These structures can improve your understanding of data and even help you make better comparisons in your analyses. However, hierarchies and classifications can also cause confusion when misunderstood or misapplied. Suppose your company reports sales figures for its western sales territory. You might compare your company’s performance against general economic data for the ‘US West’ region; but if your source of economic data uses a different combination of states in its definition of “US West” you might draw all the wrong conclusions. You should always understand whether your data classifications are mutually exclusive or overlap in some areas, and whether the classifications used by different data sources are the same.
Poor data hygiene is another cause of data mismatch. If your company follows a standard convention for object names (like campaigns) but practitioners don’t follow those standards when creating campaigns it’s easy to lose some of your results within the large expanse of enterprise data. Your diligence in properly naming and classifying data often makes the difference between working with complete or incomplete data for your analyses.
There’s a lot to think about when it comes to using data effectively. One simple piece of advice that will always help: seek to really understand what your data represents.