Understanding Causation vs. Correlation in Marketing

How to Infer Causation 

A key component of marketing success is the ability to determine the relationship between causation and correlation. Namely, the difference between the two.

First, let’s define the two terms:

Correlation is a relationship between two or more variables or attributes. From a statistics perspective, correlation (commonly measured as the correlation coefficient,  a number between -1 and 1) describes both the magnitude and direction of a relationship between two or more variables.

Causation indicates that one event is actually the direct result of the other(s). It is the basic notion of “cause and effect” – in which one event is identified as a consequence of the other. Essentially, causation is the “why” for any given outcome from a marketing action.

The ability to properly gauge causation supplies you with the fact base that’s necessary to make informed, sound marketing decisions. Therefore, causation is all about finding the exact “marketing genetic code”  or data elements of specific customers that are predictive of future behaviors.

Meanwhile, understanding buyer behaviors and attributes help to predict their future relationship with your company and how to best manage that relationship.

Causation can be proved through rigorous experiments and testing. By doing so, you can firmly deduce that there are underlying reasons behind the connection between variables.

If these indicate positive behaviors, they should be further explored and taken advantage of.

Correlation Does Not Always Indicate Causation

While the definitions themselves are relatively straightforward, improper use of exploratory data analysis techniques can lead to a wide range of inaccurate conclusions. This occurs during instances where events are correlated, but the correlation is not due to a causal relationship.

In marketing, simply assuming that correlation implies causation without rigorous testing and experimentation can prove to be problematic, and ultimately lead to costly mistakes.

Below is a famous example in which there is a correlation between two factors, ice cream consumption and educational performance scores, but not causation:

While a simple glance at the correlation coefficient of these various countries indicates a high correlation between ice cream consumption and educational performance scores, simple logic indicates that the two things have absolutely nothing to do with each other.

Nonetheless, if you were to hypothetically test these two variables against each other, how would you do so? The best way to prove (or disprove) causation is by setting up a scientific experiment.

  • Tell half of the subjects in each country to eat ice cream everyday for the duration of the experiment.
  • Tell the other half of the subjects that they cannot eat any ice cream at all.
  • Run this experiment for a calculated period of time. At the end, have all of the subjects take the same exam.
  • Examine the results of those students who consumed ice cream, versus those who did not.

From there, you will have the opportunity to answer the question – did the consumption of ice cream make a difference for the children enrolled in the study during this particular time period?

Another example of correlation not being causation is the idea that smoking is correlated with alcoholism, but does not cause alcoholism.

The same approach applies to marketing examples. The following is a real-life instance in which we implemented testing to prove causation for one of our clients at BuyerGenomics.

A Marketing Example

A question that many specialty retailers ask is whether or not they should send catalogues to their customers, in addition to their other marketing efforts.

One of our clients, a direct to consumer retailer with mass distribution, had a recurring program of sending catalogs to their customers. While they knew that there was a correlation between their catalog program and sales, they wanted to see if there was actually causation between the two, and if the catalog was delivering the desired return on investment..

In order to do that, they selected a hold-out sample as their control cell, selected from the universe eligible to receive the catalog. For this group of customers, they chose to not mail them any catalogs at all. Meanwhile, the remainder of their customer base received catalogs just as they normally did.

Keep in mind, by administering this test, the company was willing to potentially sacrifice some short term sales for the sake of information that would prove useful in the long run.

In other words, the test required a small investment that allowed their marketing manager to determine the value of their marketing catalog. Through this, they were able to understand how much leverage they had for future catalog campaigns.

In turn, it was key for the sample group to be representative of their customer base as a whole. In addition, that control cell had to be as small as possible (to minimize lost sales) while still providing statistically significant results.

Therefore, a relatively small sample group was selected, while catalogs were mailed out to the remaining customers.

What was the result?

The customers receiving the catalog delivered sales of several multiples above those in the control cell – a validating experience for all involved!

By realizing the incredible effectiveness of their catalog campaign, they have catalog attribution evidence to support future campaign strategies.

Using a PMA to Infer Causation

Predictive Marketing Automation (PMA) platforms provide marketers with the tools necessary to perform similar tests and experiments as described above.

With a PMA, you can perform these five key steps to design well-conceived experiments that tease out causal variables:

  • Create test and control cells.
  • Target each cell differently with marketing communications.
  • Match the sales back to the test and control groups.
  • Compare the results.
  • Based upon the results, roll out the winning strategy to the entire population.

While this is a relatively straightforward process, following a consistent process to create and track your random samples is vital. Maintaining the integrity of your samples and marketing treatments will ensure that your experiments deliver the intended insights.

It is also important to have a stable database environment to work from and that your database administrator keeps you informed of changes to your systems environment. This is key, because if there are changes in the way that data is collected or a certain variable’s value has changed over the course of a test, the accuracy of the whole experiment can be thrown off.

Make no mistake – unanticipated new data or system aberrations can wreak havoc on the whole exercise.


When implementing marketing plans, you always want to have the best information possible.

Determining causation allows you to understand the levers at your disposal to impact customer behaviors. Equipped with this knowledge, you can better plan, develop, target, and implement your marketing budgets.

While correlation on its own offers clues, marketers should not be basing their plans on correlation alone. Instead, they should be based upon experiments designed to determine causation.

Ultimately, continuous marketing tests serve to evolve the knowledge of your organization. A major benefit of a PMA platform is that it provides the data ecosystem that allows you to perform and even automate these tests more rapidly and efficiently than ever before.

From there, you can learn and evolve more quickly – basing your marketing decisions on data-supported facts instead of mere guesses, hearsay, or gut feelings.

There has never been a time in recent history when this capability has been more pivotal. Our digital devices allow marketers to collect more data about consumers than ever before. Meanwhile, customer profiles are also richer than ever. Therefore, the companies who best understand these resources and how they relate to their customers will be the clear winners in the years to come.

About the Author:

Gary’s background includes over 30 years of analytics & database innovation for several leading Fortune 500 companies and Madison Avenue advertising agencies. Gary has been a frequent lecturer and author on the topics of database marketing and applied statistics. His articles have been published in DM News, Direct Marketing and the Journal of Direct Marketing. He recently was President of the Direct Marketing Idea Exchange and served on their Board. Gary received his M.S. in Industrial Administration from Carnegie Mellon University.

Any further questions or insight? Email Gary at gbeck@buyergenomics.com.