Data Governance

What is Data Governance?

Data Governance may be defined as the overall process used to manage data in an organization. Management activities include acquiring and organizing data, data cleansing and integrity, database management, data security and communications within an organization regarding these functions.

What follows is a lightly edited transcript of Episode 3 of the Inevitable Success Podcast with Damian Bergamaschi and special guest Gary Beck. 


Damian: Today we cover how to make sense of your customer data. As marketers, every stimulus causes either action or inaction even the absence of data is data. Let’s dig in with resident customer data expert Gary Beck.

Gary: Damian thanks again for having me today. It’s always a pleasure.

Damian: So Gary, probably the best place to start is I am a brand, I’m getting transactions, I have email databases, where else can all this data come from? What are the sources? Where does that come from Gary?

Gary: Well, data comes from all sorts of places. There are times when I will start to work with clients and ask clients that very same question. “What kinds of data do you have?” “Where do you keep it?” There have been times where a person will start the conversation by saying “Well we have warranty cards” and I will then ask the logical question, Can we get access to the warranty cards?” and there have been times when clients have pulled a box out from under their desk and provided me with a box of warranty cards.

Gary: “Here are the latest people that have filed or submitted these warranty cards” and that box of warranty cards is the beginnings of a database. We can’t retrieve information electronically from it yet. But it is a collection of customers.

Gary: So any collection of information that we retrieve for the purposes of tracking customers is a database of sorts. But of course, in our business today we want to digitize all that information and make that information retrievable and extract intelligence from it. Sometimes customer databases are maintained in Excel files for example. Then, of course, there are the times when we have sophisticated database management systems that organize the data, allow us to query it, and give us the ability to categorize information in a way that is relevant to our business.

Gary: There are many different types of data. We can talk about it from a business perspective and then we can touch on some of the technology associated with it. From a business perspective, there’s personal information or information that allows us to address customers directly. So, things like your email address, your postal address, or your telephone number. All of that is personal, addressable information.

Damian: Like P.I.I. type data.

Gary: You have personally identifiable information. Absolutely. There is transaction information. What have you purchased from us? When did you purchase it? How much did you pay for it? Did you have to return it for any reason? What are the dates for all those events? That transaction information is very valuable information.

Gary: We have promotion history information and promotion history information might be the most valuable information of all and is frequently the information that we don’t always have complete data about.

Damian: Can you give me an example of that type of data?

Gary: Promotion history information includes the type of communications that were sent to you, perhaps via e-mail or perhaps by direct mail. It includes the date and time if we can get that. It includes the creative treatment. So, what was the actual collateral that was exposed to the customer? It will include whether the customer responded to that information. So, if it was an email, did they click through, did they open the email? Did they convert? Did they purchase? All those different outcomes are recorded and that pretty much sums up what you would expect to see in their promotion history facility.

Damian: So tell me more. You said it was one of the most important types of data you could have. That seems like a lot of storage as well I can understand why a maybe a lot of brands want to have it or even know how to store it and tie it to the customers. But let’s say you did all that. You know how does that monetize, how does it become valuable as a data set in your database?

Gary: It becomes valuable when we monetize that data by understanding the right offers to target to customers, at the right time, and with the right message. It gives us the intelligence necessary to optimize our contact strategies.

Gary: If somebody responds to our communications that’s information. If somebody does not respond to our communications that’s information as well. If somebody clicks on a link that we expose them to that’s information and if they don’t convert, of course, that’s information as well.

Damian: It’s all information.

Gary: Really the goal in database marketing and customer relationship management is to create a closed system if we can, which is to say that every stimulus creates action or inaction. From a data perspective, we want to understand those actions or inactions and find ways of creating the behaviors that we’re looking for. That will ultimately maximize the lifetime value of a customer. So, by collecting these actions or inactions, tying it to the actual collateral that was created, it gives us the ability to dive into the data and tweak our future strategies in terms of testing.

Gary: It gives us the ability to evolve our strategies over time in terms of current promotions that give us the ability to optimize. For example, you learn what’s the right time and the right day of the week to send out communications. Does that make a difference? It absolutely does. We have seen that over time. But if we don’t collect the information, we’ll have no way of doing anything other than really guessing as to what’s working and what isn’t.

Damian: Is there ever a situation that you’ve seen where people are just collecting too much and they’re just building like a warehouse of data and it’s not valuable? Or maybe that’s not the case?

Gary: Well, it’s a great question. In the world of big data, we hear the term data lake referred to. A data lake in a way is a very broad term and it frequently refers to a storage repository that holds a vast amount of raw data. So, when we’re looking at this vast amount of raw data there are many stories today of companies that place data into the lake but then have trouble getting meaningful information out of the lake.

So you know in the old days, data warehouses were basically what we’re calling data lakes today. Although data lakes today, typically for data science purposes, focus on even more raw data than what that data warehouses of yesterday used to hold. Speaking purely as a direct marketer, as I look at data lakes, when all this information is put into an unstructured environment, a disciplined and experienced data science approach is required to yield benefits.

Damian: What are the types of personnel or skill sets that are required to do that well? What kind of roles even if his titles would be of somebody that does that type of work?

Gary:  Well in terms of the big data lakes that are being formed today what we’re looking for are people with data science degrees. So, either a master’s in science or a Ph.D. in data science. People who know how to work with just vast amounts of data and use the techniques that are available and the software that’s available to make sense out of those giant pools of data.

Again I think that the results are perhaps mixed today in terms of the very large data lakes that are being formed out there. I’m sure over time the science and the experience base will prove to yield huge benefits for companies. It just might be early days for direct marketers today. I would love to be proven wrong on that, because I think the promise is there. I really do. And I may just not have had the opportunity to really experience that myself.

Damian: So I’ve also heard this term thrown around and I don’t know if it’s basically mocking a data lake or if it’s a really its own thing but that is one of a data pod. Have you ever heard of that and care to elaborate?

Gary: I have heard of data ponds and a data pond, as you might guess, can be thought of as a small data lake. When we think about the need to get value out of giant data lakes, and the difficulty with that, one can quickly see the logic of pulling down specific data sets from the lake. Then put it into a smaller database management system or environment that allows easier processing and easier access to that information.

So putting it in data warehousing terms you know the data pond is very much like a data mart. The data warehouse would be where all information is stored. The data mart would typically be a subset of that data simply because it was easier to access. It was perhaps focused on one particular component of the business and in terms of resources, or in terms of response time, or in terms of reporting time, it made more sense to have it in its own separate environment for business users. So typically, the data pond or data mart was customized for a specific universe of business users.

Damian: May traditionally be called the iron database.

Gary: The big iron database right. Yes absolutely.

Damian: So now we have a better understanding of where this data lives. How do you what’s involved with the maintenance and upkeep around the data that is that must be done should be done?

Gary: That is a key point to success in any database marketing program. One of the obvious things in working with customers is that customers are not static, they lived their lives and they move residences. I think the mobility rate in the United States is somewhere around 18 percent a year. So once every five or six years you can expect everybody to have moved. Well, moving is a major life event. It certainly is an indicator of change but it also makes tracking customers perhaps a little bit more difficult.

So from a maintenance perspective what we want to do at the very least is understand and record when customers have moved from one location to another. That becomes certainly very important if we’re trying to merge transaction information over time. Consider John Smith that once lived at 1-2-3 Main Street has now moved to 4-5-6 Main Street. Well, those addresses might not match. And if John Smith had not provided you with his customer number or some other information, we now are tracking 2 customers in a database where we really want to only betracking one.

So how do we handle movers? Well you know there are many ways of doing that. Certainly, one is to ask customers to share their new address information. And frequently they are delighted to do that,but sometimes they forget and sometimes they don’t even know how to change their address, or just don’t have the time to do so..

So from that perspective, there are capabilities that companies can purchase such as the National Change of Address system is available. That information takes address change information that customers provide at the post office and it allows NCOA (national change of address licensees) to process the data through that post-office database of recent address changes. So, one thing that we must do in terms of maintaining data is to maintain address changes and there are services to provide that.

Another source of data that we should talk about is external demographic and psychographic data and the importance of maintaining that data over time. As customers go through their lives, their personal situations change. There are marriages, there are divorces, there are the births of children.

There are lifestyle changes: people take up skiing, they take up tennis, they take up other sports,they take up other hobbies as well. So, there are these events that provide us with insights about who the customer is and what might work with them from a marketing perspective in the future.

While we are interested in maintaining information about where a customer lives, we’re also interested in maintaining information about their demographic profile, about their lifestyles, to the extent that we can understand them. We are also interested in other transactions that might in some way influence our marketing strategies towards them.

Damian: So one of the questions that I know I hear a lot, and think about a lot too, is that you can’t get the same data for every single customer in your database. So sometimes you have a full profile for some part of your database and then other parts you have very little or none but you have bits and pieces of that outside fragment. When you’re going to do database marketing how do you deal with the fact that you don’t have perfect information on every single database? But some segments you have near perfect information for what your database can support.

Gary: So that’s a great question. In some ways, missing data is also data.

Damian: Right. I should have expected that.

Gary: So that’s one piece of that. If we don’t have that information, it tells us that we couldn’t get that information for some reason and that’s a piece of data.

There’s also the advantage of being able to take advantage of the data that we have when we have it. And when we don’t have it, sometimes we can infer it.

So it really depends on the type of data. The good news is that there were typically statistical methods that can be used to help us fill in the blanks to the extent possible. Obviously having the data is better than not having the data.

Damian: Yes.

Gary: But all data is data.

Damian: Even the absence of it.

Gary: Even the absence of it, yes.

Damian: You know this data is coming in all the time. How do I manage it to make sure that it’s staying up to date that I can trust it? Because there’s decay or entropy to each data point and they’re probably not old accounting at the same rate. Just maybe speak to that a little bit. How do you keep your database healthy so that you can use it to drive the decisions that you make?

Gary: Great question. I guess I’ll talk to a very practical application of a model that is commonly used, response models.., Response models will decay over time, just as data decays over time.

One of the things that operations groups will do is they will look at the results of models, look at the distributions of the scores from those models to see if things are changing. If things are changing versus the original distributions that we saw, then we need to go back and do basically two things. One is we will rescore the models. So, we’ll go through the process of basically developing a new model to address the data. But even before you do that, the first step that we must ask ourselves is, what has changed?

So by looking at how the universe of data is changing, particularly in the algorithms that we have used to help guide the business, we can look at how those distributions are changing. We can sense that certain elements may be out of date and that can help us address the concerns that we have. Ideally if we are maintaining our data effectively, if we are updating our demographic and psychographic data, if we are continuing to process transaction data in the same way that we always have, and if we are applying new mover information regularly and following data governance best practices, our data will decay at the lowest possible rate.

We talk about data and we talk about updating it, but one thing that can really cause models to decay is when there is a major change in either our operational environment or when there’s a change in the marketplace environment.

In the marketplace, if there’s some dynamic that is changing response — I think we gave the example in an earlier podcast of when somebody drops their price precipitously, which changes the dynamics of the attractiveness of our product. Well, the offer that we developed the model for in the beginning is no longer viable. The model is basically not going to help us. That’s a case where you’ll see the decay and reduced effectiveness of the tools that we’ve created.

Then moving onto the operational considerations,I have seen cases where there have been changes in the way that data has been cataloged from an external service provider, for example. If some of those variables have been re-coded and we didn’t anticipate that and then it goes into a segmentation scheme or into a model that we’re using and its basically bad data, it will cause our models to decay.

Damian: And it doesn’t even have to be bad necessarily it can just be different, just the measurement of it. Yeah, I’ve seen that and not just marketing data there are so many cases of you know how you record data can change. You know the context of it and how you use it. So that’s something to be mindful of. Maybe that’s almost a case for keeping your own custom clusters when you do it because you probably have more control over what you do versus an external force out there.

Gary: Absolutely, it is one benefit of having your own proprietary scheme using your own information. You have control over all those elements and as a result, there should be no surprises in terms of how that data is processed.

Damian: So yeah and this is a question that is going to have a big range and I’m not sure if you have some of these numbers on hand but I was always amazed at in some of those cases where were her clients and how quickly data decayed. So, I’ll give one example, I was working with a B2B client where we were working on a prospecting campaign and it was all focused on finding people at certain sized organizations in certain departments that had certain roles and titles and responsibilities in that organization. We kind of curated a database of all the players that we wanted to target. After we finished it was kind of almost a, “we did it we’re done” kind of feeling.

Lo and behold about three months in we realized that while others were getting a lot of I don’t work here e-mails back from people, and then we factored in wow we’re getting a lot of I’m retired e-mails, and people who moved up in the organization. We needed to keep adding to that database. The universe was changing wasn’t getting bigger or smaller it’s just changing. In that case, it was almost about a 4 to 5 percent turnover in data. So, every month it would basically decay about 4 percent which to me was like wow that’s quick because you know you’re only looking a little bit past a year before it. You know there’s almost a complete turnover of that database.

Now that might be an extreme example but you have ranges from all the clients you’ve worked on to give us you know set the marketers expectation around how much maintenance and updates must do to keep a database healthy and usable?

Gary: You know I don’t think anybody’s asked me that question before. That’s a good question.

The level of maintenance is directly tied to the number of different systems you’re interfacing with and the number of variables or attributes that you are receiving from the systems. So, I don’t think there is a rule of thumb on that although I’d like to create it, that would be a good stat to have.

Damian: Right. Maybe we’ll work on that we’ll put it in the show notes.

I think that would be that would be a good one.

Epic intern project. (The interns are hard at work)

Gary: That would be a very good intern project.

You know the opportunity for data owners to fully take responsibility for content is one that organizations have tried to instill. Particularly large organizations will try to make a single individual responsible for a specific source of data for that very reason. To make sure that the data is not changing in any way, if there’s an arrangement for payment for data services, for example, that person is on top of that contract. If there are new variables that are coming into play and perhaps replacing old variables on a service again, the data owner would be responsible for that. So that role is important. Many smaller companies don’t have the luxury of really diving into all that information, but from our perspective if we can create that rule of thumb for the notes I think we should do that.

Damian: So kind of in closing any strong closing thoughts on data, data hygiene, and storing of data.

Gary: I think I’ve used the phrase before that information is power and information is only as good as the data that we have. So, the notion of data hygiene, making sure that your data is as clean as possible, and perhaps equally important if not more important, is really understanding the data that you have and where it comes from and how you’re using it. It’s fundamental, but if you don’t understand the data that you have and don’t understand how you’re using it today you are potentially at risk and potentially missing out on huge opportunities. From that perspective, data is perhaps every company’s most valuable asset and it is rare that investments in data do not pay dividends.

Damian: Thanks, Gary. That was a great way to end.

Gary: Damian it’s a pleasure, as always, thank you.

Host: Damian Bergamaschi

Special Guest: Gary Beck

Gary’s background includes over 30 years of analytics & database innovation for several leading Fortune 500 companies and Madison Avenue advertising agencies. Gary has been a frequent lecturer and author on the topics of database marketing and applied statistics. His articles have been published in DM News, Direct Marketing and the Journal of Direct Marketing. He recently was President of the Direct Marketing Idea Exchange and currently serves on their Board. Gary received his M.S. in Industrial Administration from Carnegie Mellon University.