The Best Type of Customer Data
- Purchase behavior is king. It is the simplest form of behavior but it is the most powerful i.e:
- What they bought.
- How much they paid for it?
- When did it happen?
- Through what channel?
- Was it in store or online?
- The best kind of data all depends on what you are trying to predict. But in general past behavior is the best predictor of future behavior.
- Income is a very popular type of data but it is a dangerous one. For example “One may spend everything they have and then borrow some. Someone else may hoard it in the bank, and then a third person may donate it all to charity. There couldn’t be three more different people.”
- Attitudinal data is the hardest data to get but is incredibly powerful.
- Different data becomes more important in different parts of the Customer Journey or the Buyer Lifecycle.
Below is a lightly edited transcript of Episode 22 of the Inevitable Success Podcast with Damian and special guest Stephen Yu. (Listen Here)
Damian: Today I’m going to be having a discussion with Stephen Yu and we’re going to be talking about Data, particularly, ‘what is the best type of data’. In the age of Google and Facebook where everybody is trying to sell their data as the absolute best data that you could buy, there seems to be this premise that there is a best type of data. So I would ask you Stephen, what do you think is the best type of data out there?
Stephen: That’s an interesting question because there’s no such thing as the best type of data because it depends on what kind of problem you’re trying to solve. I say this all the time that when you get into data and analytics planning, you’ve got to define the goal first. You have to ask yourself, ‘what are you trying to solve here?’ ‘What is your problem statement? Is it about acquiring customers, expanding your footprint, or acquiring best customers or best potential customers? Well, it could be that I already have these guys. So how do I milk more money out of them? How do I sell more? How do I convert one timers to two timers? How do I cross-sell or upsell? If they try to leave, how do I hold them back?
In each stage of the customer’s journey, you will yield different data, like little bread crumbs. When you try to face those different problems throughout the customer journey, it calls for different kind of data. So, for me to say that, ‘Yeah, this is the most powerful data,’ it really depends on when you need it and for what purpose. In all my career, for example, the most useful data for when trying to predict what the person’s next move will be, is to look at their past behavior, but of course people change.
Damian: Right, that almost makes me want to ask, “what kind of behavior?”
Stephen: Purchase behavior. It’s the simplest form of behavior that I use but it’s the most powerful one.
What they bought. How much they paid for it? When did it happen? Through what channel? Was it in store or online? That’s very powerful. And the joke is that, you know the whole notion of big data happened because some companies wanted to collect every single click, ‘every breath you take, every move you make, every night you stay, I’ll be watching you,’ that type of data collection. Hence the birth of Big Data. It goes back about 10 to 20 years ago and the whole big data hype is about 10 years ago, maybe 8 years ago. I mean nobody talks about big data, it is kind of out of fashion to even talk about it.
So, what was that big data thing even about? Let’s talk about that for a second. At that time they said, ‘Well, bigger is better. If you have everything, you can predict anything.’ That’s like saying that, ‘I don’t know what problem I’m going to solve, but I’m going to hoard the data – I’m using a hoard purposefully here – and hopefully I’ll find a way to figure out everything.’ And you know what? It didn’t work out that way. We all know it because at the time the whole definition of big data was – remember the 3 V of Big Data? The 3 Vs stood for first, Volume – it’s got to be large; anything and everything. Second was Velocity – it should move really fast; fast collection. The third was Variety. Let’s collect all kinds of data.
Now this is exactly what led to your first question today what kind of data is the best data. Well it depends what are you trying to predict. But in general, I’ll say past behavior is the best predictor of future behavior. In that case, if I may divide them into behavioral data, I would choose transactional data, browser data, movement data. Those are all behaviors. They are more powerful than simple demographics and two dimensional data which are like, you know, ‘What do they look like?’ What is your income?’ ‘Where do you live?’ ‘Do you have kids?’ ‘How big is your house?’ ‘Do you rent or own?’ You know all that kind of stuff; ‘What kind of car do you drive?’ That’s what they look like? They are less predictable, but they’re more available.
So, in other words, behavioral data is more powerful. But you cannot know what everybody did last night. Therefore, you have to backfill it. Yeah, I don’t know what he did last night or what he bought last month but you can use his income as a proxy, for example.
So, therefore, there’s no such thing as one kind of data; they all work together depending on where you are, in what stage, the answer is different.
Damian: Incomes are an interesting one. You can take two different people that make the exact same income and what they choose to spend their money on is different because they have completely different value sets. One may spend everything they have and then borrow some. Someone else may hoard it in the bank, and then a third person may donate it all to charity. There couldn’t be three more different people.
Stephen: That’s why income is one of the most popular data people use, but it’s the most dangerous predictor. And the joke that I make is that, ‘do you think that the difference between a Mercedes buyer and the Tesla buyer is income?’
Damian: I couldn’t. You’re so right.
Stephen: If you just go out and try to buy a demographic data from all the reputable collectors and vendors right they sell about close to 3 to 400 variable’s per household.
Damian: And those are the reputable collectors.
Stephen: I used to be one of those guys too by the way. And guess what,
Having to pick income to predict anything and thinking that that’s the most popular predictor for all, that’s not true because like I said, you’re absolutely right, depending on what they are about [their values], the way they spend money is very different. And also, when you have 3 to 400 variables using age and income, you know what I used to make an analogy about, “You know how you take kids to a diner right and they give you about three different color crayons. That’s all they give you. And that’s quite enough for about two or three year old toddlers. You buy 64 color crayons for these kids and they’re using about five colors anyway. So you wonder what did I buy all these different colors for. And I felt the same way when I sell 300 variables and people use like maybe ‘homeownership, maybe’ income,’ maybe ‘age’ and that’s it; it breaks my heart. Why does that happen? Because people are relying on their gut feelings when they do these things. You don’t know what the most powerful variable is because he’s just making it up.
Damian: Yeah a lot of these things are kind of almost the same variable too. I can’t think of anything off the top of my head, but like for example – if you’re a renter and you live in a city, you don’t own a lawn mower.
Stephen: So that’s intuitive selection. You know I used to say, ‘Yes we talk about predictive analytics and how statistical modeling is really helpful to predict all kinds of things. You know one of the best reasons why we build models these days is because we have too many variables; when you have too many variables you cannot really say one thing in a simple way. Simple things like, OK I have this magazine subscription base, who is most likely to cancel? Well, you need some payment history, you need the delinquent history, you need the historical data now, because that’s a different problem statement. But let’s make it really simple – who’s more likely to be a luxury cruiser? Well income could be it, right? Age could be it, because they are more likely to go on a ship and they’re mostly older. But how do you measure wealth? TAre you trying to use income or their net value or net worth – those are very different indicators right? So if you ask a human being, you just guess. But the whole notion of analytics is that you don’t want to guess. You want the mathematics to take over and let it pick the best predictor. So, our job is to make all these things, numbers and figures, in the best shape so that the mathematics can happen seamlessly. That should be the goal; not to decide, ‘what is the most powerful product of all time?’ because it depends.
Damian: You know it’s funny. One of the ideas that you made me think of me of a story when I was at a client meeting and the idea here is that you can’t project your view of a metric onto that customer. So for example, we were in a meeting and there was a conversation when we looked at a model and it came back with the net worth of the individual and the income that they were making. The marketer made a comment like, ‘well nobody who makes that kind of income could have anywhere near that net worth.’ And actually, I thought that said a lot about that person because they’re projecting their saving habits making that kind of money onto the customer. Meanwhile what that really just said is that maybe that customer has different spending habits. Maybe they live in different parts of the country where there’s more affordable housing or something. It could be a million different things. The bottom line is you cannot take your own situation and project it onto the customer.
Stephen: That’s why you have to let math take over.
Damian: Yeah exactly.
Stephen: Because what you talked about is kind of interesting. Let’s just go farther – now, we talked about what the most powerful data is, but let’s define data-types first. We talked about behavioral data. Well, that’s powerful,I get that because you know when you build a model you realize that you have behavioral data that takes up about 70 to 80 percent of the predictive power of any kind of a targeting. So, you know it is powerful. Why do we need all these demographic data: income, age, number of children, cars that they drive, all that? Because they are more widely available. So they may not have the uniqueness and depth of the behavioral data but you don’t have a behavior that everybody does all the time. So it works together. All that demographic data fills in the gaps nicely. So, that’s the second most powerful type. But what you talked about is a little interesting because that’s a third type which is attitudinal data. This is an example that I’ve been using for years: let’s just say that an individual you’re targeting makes a lot of money, which is a demographic variable, and he has a Porsche. So that’s behavioral data, he bought an expensive sports car, and he lives in a really nice area with a lot of highly educated people.
Damian: This guy has to be single. I can tell already. You don’t know. Maybe he is having a midlife crisis. See I’m doing it. I’m projecting
Stephen: Exactly. You’re projecting already. Let me ask you this now. Do you think this guy is conservative politically or liberal?
Damian: I have no idea.
Stephen: That’s exactly the right answer. You don’t know. So this third category, called attitudinal data, is the hardest thing to get. But when you get it, it’s really powerful; just because one plays golf, and lives in a nice town, and bought the most expensive golf club last month doesn’t mean that he’s automatically conservative. Sometimes you have to stop and ask. That’s why it’s called attitudinal data. So, I bring my three fingers in this three dimensional way. If you know all the three dimensions, that’s like having a superpower. You could predict anything about that person. Unfortunately, even during the election period a lot of politicians run deep analytics about tendency to vote – sometimes on a household level, especially in swing states. That’s what they do. But they cannot stop and ask everybody. So, they create a proxy of their attitudes based on other factors like income, age, where they live, what is their race, what is their educational level; all that kind of stuff. But it’s a proxy.
So what is the most powerful variable? It depends. People are now going farther than that. People sometimes go after everything. People act like I have to know everything to act. So let’s stop right there. Politicians don’t stop at not knowing their attitudinal data. They create a proxy and say. ‘You know what? It’s good enough that I know that somebody is not hardcore Republican or hardcore Democrat, because I’m going after the middle guys anyway. That’s good enough.’ So all we think about is, ‘What am I going to do with this data?’ instead of asking, “Am I getting all the complete data?’ Because you’ll never get it. Or you might ask, ‘Do I have the most powerful predictor? It would be great if I know somebody’s attitude.’ But you can’t stop and ask everybody. Of course with all the social media it got better instead of having all these focus groups. I say that Twitter and Facebook is one gigantic focus group. They can study that better now.
However, you still don’t know everything about everybody and that goes back to why we model. Well it became a scandal only a few months ago. Cambridge Analytica and Facebook and all those guys. And you have to wonder why do Facebook needs Cambridge Analytica really why do they even need them if they have everything about everybody. What did Cambridge Analytica do? They said, ‘Well give me the target of X, Y, Z and I’m going to find the best way to mimic those guys.’ Right or wrong, because an 80 percent right answer is still better than no answer or even a 60 percent right answer is better than no answer. That’s the whole attitude that they had, right or wrong. And people are still upset about this whole Facebook fiasco and that’s a different story, and Facebook stock just tanked last week and all that. Aside from all this, the joke is this – it proved that analytics works. If it didn’t work, then we would have nothing to talk about.
It’s kind of ironic, isn’t it? So I’m here; I’m trying to be impartial about the value judgment here. I’m just here to talk about data and that It’s not always complete. Sometimes you need to make it up but with an educated guess using mathematics; not your gut feelings. And when you do that, you cannot discriminate what data is better or worse and ugly because you don’t know. And it depends on what you’re trying to do and when you’re trying to do this. and at what stage of the customer journey.
Damian: I’m glad you brought that back around.
Stephen: So you know, you kind of mentioned that the best type of data that you could have depends on the stage. So that makes me think, What are the stages of it. Maybe it’s a customer journey or the buyer lifecycle and what is typically what are the best types of data to use in each.
There are a lot of steps and B2B is quite different. In B2B marketing we use terms like ‘lead’, ‘qualified leads’, ‘conversion’, and ‘retain customer’. That journey is all a little more like a funnel. In B2B there’s a sales involved so it’s a little weird because a lot of human touch is happening in between. I guess that’s why Salesforce makes a lot of data. They have to keep track of all this but let’s make it simple; it’s stages, right.
A lot of digital folks think that customer journey starts on a digital channel because that’s when a lot of people become visible. However, so that a person types in a keyword in a Google box or even any search box in a merchant website, I dare to say that some good old marketers created demand for it. I’ll give you an example. I don’t play golf but I used to play golf a lot.
And when you play golf a lot you know and I know that you don’t blame yourself; you blame the clubs and merchants know that. So how did they touch upon that golfers feelings if you will?
What they do is that they promote all kinds of things to you that if you buy this driver you’re going to hit 10 more yards. Now that’s a joke because every time I buy a new driver if I had 10 more yards, everybody should be hitting 300 yards right now but somebody creates a need.
That’s what I’m trying to get back to. In other words, with the data that they have and at that stage you may have very little data just you know maybe you could buy somebody’s Golf magazine subscription data. Or if you don’t have that then you may have to use or guess. You know, we know that cigar data is related to golf so maybe you buy cigar data, or could be they just income, age, regions that they live in, all that kind of stuff. You’re not wrong, but it’s good enough when you find some prospect to send some e-mail or catalog or whatever. Or it could be a TV commercial. What I’m trying to say is that before digital there was a stage like, ‘I want to expand my customer base; I want to talk to somebody that I don’t know’. And that is the acquisition stage, that kind of activity, results in a response. The response can be an immediate purchase but it could be a lot of research. Somebody gets a catalog but says I am going to do a lot more research about it. Then you go to Google and you decide, ‘Am I going to buy this on an online store or buy it in a store?’ That’s a decision isn’t it?
Each stage of my behavior yields different breadcrumbs. With digital, it’s easy. Because that’s what people hop on to, and they say, ‘all digital is king’. No, but I dare to say you’re only looking at some parts of the human behavior; not the whole journey. But what if you do a lot of Google search then went to the store? If you do not have the store transaction data, you lose that connection.
So again, going back to the theme of you cannot worship just one type of data, you get to look good in the context of where they are, how that data is generated, and most importantly what are you going to use it for? You don’t have to collect everything. So I just want to leave it out there in terms of what kind of breadcrumbs that people leave behind. But, yet another way of looking at this is that let’s look at it from not just customer centric way but from a merchant centric way. What do businesses do really? They only like four major stages of marketing, really. In the beginning, you don’t have a customer. You have to acquire new customers. Now that calls for a different strategy. But unfortunately, you don’t have much data to play with. You could buy data – the demographic data, all the subscription data; you could get the list from another list exchange and all that. So that’s limited to that.
Now you start getting more customers. Oh! Now you have a customer list; you have your own customer. So you can send some of them to Facebook or some other vendors and make a lookalike model. You can do that now. You have their behavioral data- what they click, what they responded to, what they bought, when did that happen, how much they paid for it, how many times they did all that. In fact that’s why our company does. BuyerGenomics is about that kind of data. Okay great, I have this customer; I have more data about them now. I’m going to use all these things to sell more. What is sell more? Cross-sell, upsell, repeat sales, upgraded to VIP status, and make them buy more expensive options. Whatever it is, right.
And then of course, you try to retain those guys. This is the retention stage now. But if all those things fail, if the retention stage fails, then they will fall into what we call dormant customers or inactive customers. And what do we do? We have to wake them up. We need to kick some butt around and say, ‘Hey guys, it’s been a while. Do you know we are still in business? You want to buy more golf clubs, or guitar, or whatever. You got the win back. So, in the win back state, the data is kind of limited, like in the acquisition stage. But you still have what they bought a long time ago. That’s better than nothing.
Damian: Right, It’s limited because it’s just maybe it is stale, very old.
Stephen: Right. We use the term data atrophy. right.
It’s not a hotline name anymore, but hey you know what – he’s not buying anything anymore but when he did, he used to buy a lot of expensive things or he used to be a bargain seeker or he used to be a very sporadic buyer of consumable items or whatever it is. All those things are there; it’s not like having nothing.
So, you’ve got to also look at it from the merchant stage because depending on where you are the kind of analytics that you use are different. Why is it different? Because the available data are different. So you’ve got to look at it from a multi-dimensional way; not to say that, ‘Oh well, because I’m bound by all this digital data so therefore I cannot do X, Y, or Z, for example, across a large cell using just digital data. It’s nearly impossible. Well how are you going to do that with just a click and open. You can’t do it. You need to get some more types of data.
This, by the way, you could go on because there’s so many kinds of data. And you know there are so many things you can do about it. But again going back to the theme that we’ve been touching every single week, is that yeah this whole world of data is huge but it gets simpler when you know what to do with it first. Instead of saying, ‘I’m going to just click everything and one day somebody will figure this out for us.’
And by the way that kind of a leap of faith is moving into the machine. People say, ‘I’m waiting for machine learning.’ I’m sorry to say this but machines will not define goals for you. You still have to tell the machine what you’re trying to do. So there. We will just leave it out there and then maybe we’ll talk but the future of analytics next time.
Damian: Sounds like a plan. Thanks, Steven.