Out of the Weeds, Part I
Dan Ariely, a Professor of Psychology and Behavioral Economics at Duke University once famously stated,
“Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.”
We argue the same has held true with the latest generation of buzzwords. Do artificial intelligence (AI), machine learning (ML), or natural language processing (NLP) sound familiar?
To kick off 2019, we are starting a series of articles that will shed some light on these and other related terms that can sound confusing and their explanations a little overwhelming.
An in-depth review of all branches of AI and its reliance on machine algorithms is beyond the scope of this series of articles. For the first installation of this series we are going to focus on machine learning. Machine learning is a branch of AI that automates analytical model building, where systems can learn from data and identify patterns.
Our two primarily goals:
- Demystify the meaning of machine learning
- Show its relevance and practical applications to consumer insights
Let’s begin with a business case. As a consumer insights or market research professional, one of your goals for the year may be to improve your consumer segmentation – a practice of dividing a customer base into groups based on some shared characteristics.
How do you go about it?
There are four primary types of consumer segmentation:
- Demographic: groups based on variables such as age, gender, sexual orientation, family size, marital status, ethnicity, etc.;
- Behavioral: groups of behaviors such as product preferences, shopping patterns and frequencies, types of purchases and consumption;
- Psychographic: psychological profiling and understanding of consumers to include their lifestyle, values, motivations, interests, and opinions;
- Geographic: geography splits of country, state, city, etc.
There are two techniques for analysis that are possible. The first deals with well-defined variables. Out of the four types of segmentation, demographic and geographic segments tend to be well-defined. For example, a distinct group of males in the U.S. is pretty easy to organize. Comparing and contrasting across gender, ethnic groups, or age brackets becomes straightforward.
However, often times variables are not that well-defined. Asking your potential customer, “On a scale of 1 to 10, how likely are you to purchase this?”, is likely to result in something that looks more like a scatter plot than a clean, organized group of a responses.
Behavioral and psychographic segmentations tend to be slightly less defined because the data typically falls along a scale. Ever used an NPS score?
The problem that presents itself within statistics is how do you segment those types of data points?
One way to go about it is to introduce your own of “cut off” to the data (e.g. low, med, high). The issue with that approach is that you just projected your own assumptions about how the data should behave rather than analyze the actual behavior, especially in relationship to other variables.
Great, so now what?
Here comes machine learning! In this case, we would use what is called unsupervised learning. A type of unsupervised learning is k-means clustering. The premise of which is to conduct an iterative process of grouping the widespread data points into a number of clusters that are the best organized and most accurate.
For those technically minded people out there, k-means starts by identifying clusters of data points of comparable spatial extent (i.e. they are close together and can be enclosed by a theoretical rectangular shape). The center of this rectangle, the intersection of two diagonals is what is called the centroid.
After defining these centroids, the algorithm iterates and repeats to perform two things:
- Assign each data point to the closest corresponding centroid;
- For each centroid, calculate the mean of the values of all the points belonging to it
The goal of this iterative process of calculations is to group various points of data in the most accurate “clusters” or “segments” available in the dataset. Note that we didn’t say anything about assumptions around who these groups of consumers were.
The results we receive are of cleanly organized groups of consumers. But they aren’t organized around a well-defined variable, age or gender for example. They are organized around how they as individuals responded to your questions.
If you’ve collected the right types of data, you can then take a segment of consumers who are clustered together based on their preferences or opinions, and then view the resulting breakdown of the demographic variables attributed to that group.
You can use this newly defined segment you’ve created using machine learning to target and message consumers much more efficiently than utilizing just a singular defined variable.
It behooves all types of organizations to find ways of getting a deeper, more thorough understanding of their consumers beyond simple demographic variables.
Deploying a thoughtful research strategy, coupled with the power of some of the above types of machine learning techniques can lead to powerful results.