THE GOAL:
When segmenting markets the objective is to find distinct groups of consumers who are similar to each other on multiple variables of “interest”. If that lofty goal is realized, then products and marketing programs can be designed to appeal to desirable consumer groups so that when CONSUMERS select between you and your competitors, what your company offers will be a better fit in their eyes.
WHAT CLUSTER ANALYSIS DOES FOR YOU:
– Cluster Analysis groups observations based on their similarity across multiple variables. (That sounds “kind of” like the goal.)
WHAT CLUSTER ANALYSIS DOESN’T DO FOR YOU:
– Determine what similarities are useful.
– Adjust for different measurement scales. (Prices from $12,000 to $25,000 represent 13,000 measurement units while a 1 – 7 agree/disagree survey questions represent 7 units of measurement.)
TRAP 1
A lot of people’s knee jerk reaction to different measurement scales is to standardize the data to mean zero and standard deviation one. That changes things, but does it fix anything? Now the original distribution of $12,000-$25,000 dollars is about equal to a seven point scale survey question. Is that appropriate? Maybe, but I could also imagine scenarios where the willingness to pay double the price might be worth 9 or 10 times more than agreement to a survey question. In effect, the scale problem and the usefulness problem are inextricably linked.
The second problem with standardization is that the data that WAS on the same scale is now corrupted. A survey question showing large consumer differences across the whole 7-point range is deflated and a question that all respondents answered as either 6 or 7 is inflated so that they both end up with a standard deviation of one. Consequently, the procedure blurs a lot of the consumer distinctiveness you set out to look for.
TRAP 2
Another common practice is to run a Factor or Correspondence Analysis prior to the Cluster Analysis. Many of these procedures standardize the data automatically (in SAS it’s automatically standardized unless you specify otherwise), but apart from that there is a second problem. Some of the factors represent large sources of variation (a.k.a. differences between consumers), while others represent trivial ones. Cluster Analysis just sees the units of measurement that you give it, so if you do not rescale the factors in accordance with their importance, this procedure will also wipe away data patterns you’d like to identify.
TRAP 3
Unlike many other statistical techniques, there is nothing “unbiased” about using all the data you have. Let’s say you’ve got one pricing variable from your database, and twelve survey questions about different advertising themes. Cluster Analysis measures the distances you give it, so 12 measures of advertising is implicitly about 12 times more distance than one. Second, irrelevant data sums together just as easily as relevant data, so including everything available is likely to reduce the separation of more important concepts.
SOLUTION
Although there are several issues to watch for, all of the above problems can be solved by scaling the data based on its importance to your goals. It’s very much akin to creating a dependent variable. You could over or under weight something, by accident or on purpose, but if the scaling is done with a good faith effort, the results should move you a lot closer to your goals.
In Cluster Analysis, the science takes you part way, but it reminds me of one of those “mathematical word problems” from the 5th grade. It’s up to you to pick which numbers have to be added together to get the desired answer. The described approach, I call “Concept Availability Scaled Segmentation” (CASS). I hope you find it useful.
Tim Johnson
November 20, 2010
David, I really like this post, as it captures many of the things I have learnt using clustering techniques with marketing and research data. I find it amazing how many people just want to put all of their data into defining a segmentation solution, without first identifying what they want their segments to look like! Another helpful step you have not mentioned is one of deciding which variables should define the segmentation and which should be used to profile the resulting groups. All in all this is a great article, cheers!
David Young
March 11, 2011
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: Statistical Consultants
Discussion: TIPS FOR USING CLUSTER ANALYSIS TO SEGMENT MARKETS (CASS)
Thank you for your nice summarization. Just one thing I want to know, after doing cluster analysis, say dendrogram produced, now I want to look at why there are some groups are formed based on original variable. Because end user do not understand statistical term, how could I explain cluster analysis result to them, so that can take policy decision. Any suggestion will be helpful
Posted by Jaynal Abedin
David Young
March 11, 2011
Jaynal,
I wouldn’t try explaining how the clusters were formed beyond a simple statement like “They were grouped statistically to be as similar as possible on the variables of interest and as different as possible from each other.” Later I’d use various graphics and tables of the main distinguishing variables and focus the report on how the segments could be used to improve business performance.
That’s what businessmen will care about anyway. How they can use the analysis results to improve their business and enough straightforward charts to provide credibility and understanding about the segments. On a more technical level, you’ll probably only need a program that identifies the segments based on the information in their database if they intend to target or track the segments. Businessmen won’t be able, or want, to judge the segmentation on its technical merits. The only thing that will matter is if your conclusions make sense and offer a valuable way to improve the business performance. The technical advantages will only be important is so far as they facilitate your ability to show paths to business success so I wouldn’t talk extensively about the techniques to non-statisticians.
David Young
March 11, 2011
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: The Marketing Modelers Group
Discussion: TIPS FOR USING CLUSTER ANALYSIS TO SEGMENT MARKETS (CASS)
A better way to segment nowadays is by using latent class models – preferably latent class regression models so that the segments are based on predictive relationships.
Posted by Marco Vriens
David Young
March 11, 2011
Marco,
Depending upon your objective I’d agree with you, but one technique is not a substitute for the other. Latent Class Regression as well as the various Regression Tree Algorithms need a single dependent variable. Many times it might be fine to use sales volume as that dependent, but its also fair to point out that important relationships that do not correlate with sales volume won’t be picked up by the regression style methods. Segments differing in receptiveness to advertising, sales channels, or propensity to refer others, or costs to service, etc. but with an equal sales volume would be missed.
David Young
March 11, 2011
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: American Association for Public Opinion Research (AAPOR)
Discussion: TIPS FOR USING CLUSTER ANALYSIS TO SEGMENT MARKETS (CASS)
There are four standard requirements for using clustering to identify functional segments. These are based on fundamental principle no matter what type of hierarchical or K-Means clustering you use:
1. The need to balance variables in order to avoid inherent weighting.
2. The need to normalized across variables by respondent since clustering tends to focus on the average values unless they are made equal. Standardization can also be used but you loose the variation between cases.
3. The use of factor analysis to handle categorical variables.
4. The need for variable selection.
I agree that often they are missed resulting with the identification of inappropriate, non-distinct, and invalid segments. I also agree that cluster analysis and other forms of pattern recognition are as much an “art form” as a science. They are useful exploratory tools but they need care.
Posted by Gene Lieb
David Young
March 11, 2011
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: American Association for Public Opinion Research (AAPOR)
Discussion: TIPS FOR USING CLUSTER ANALYSIS TO SEGMENT MARKETS (CASS)
Let me agree with David on this. There are multiple methods to identify segments; they are not the same. They have different advantages and disadvantages. Regression clustering for example tends to produce “weak” or non-distinct assignments. It should be noted that there is no guarantee that any statistical clustering tool will produce quality segments. In the case regression clustering, this is more often the case than not. Furthermore, the identity profiles (logit regression models) that are typically generated with cluster structures to predict assignments tend in the case of regression clusters to be unreliable.
The real question is what is useful. What types of segments and segmentation schemes are meaningful for marketing planning and execution? The choice of method should follow the search process not the other way around. If you wish to identify groups of responds that have differing decision processes than regression cluster is the tool of interest. If on the other you need to look at benefit values than hierarchical or K-Mean is probably better. On the other hand if you are dealing with profiling product feature choice you need some type of categorical clustering (latent class analysis or latent variable clustering). The method should follow the need not the other way around.
Posted by Gene Lieb
David Young
March 11, 2011
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: American Association for Public Opinion Research (AAPOR)
Discussion: TIPS FOR USING CLUSTER ANALYSIS TO SEGMENT MARKETS (CASS)
Regarding Jaynal Abedin comment regarding explaining segmentation to users, I agree that that is not really the issue. The problem is usually the need to better understand who these groups of customers are, not how the segment was identified. Typically, we (me and my clients) have found it useful to profile the segments, first in terms of the defining variables and then in terms of other characteristics. Sometimes a tool to classify potential customers (not respondents) into these segments is needed. An identity profile (logit regression model) is used. The coefficients of these models can act as an explanation of the segments.
For consumer product segments, it often traditional to develop segment persona. This is particularly useful if the segmentation scheme is going used as sub-markets rather than as targets for various strategic activities. Basically these persona are extensions of the profiles, trying to provide a tangible image to the participants in the segments.
Posted by Gene Lieb
David Young
March 11, 2011
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: Predictive Modeling, Data Mining, Actuary / Actuarial and Statistics Group
Discussion: TIPS FOR USING CLUSTER ANALYSIS TO SEGMENT MARKETS (CASS)
Nice to see all at one place
Posted by Anil K. Shukla
David Young
March 11, 2011
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: Advanced Analytics, Predictive Modeling & Statistical Analyses Professionals Group
Discussion: TIPS FOR USING CLUSTER ANALYSIS TO SEGMENT MARKETS (CASS)
An understanding of the functional area wherein we want to implement the “cluster analysis” is very essential to come out with a good cluster. It is more an “art” than a “science”
Posted by Alamelu N
David Young
March 11, 2011
Alamelu,
I agree. In fact the approach I’ve described which requires the analyst to recognise and take responsibility for the implicit weighting in the cluster analysis draws heavily on his functional area expertise. (Marketing in this example). I’d also reluctantly agree with the more “Art” than “Science”. I’m reluctant about expressing it that way, because although it is clearly true, many people take that kind of statement as permission to do a poor job on the Science part because they can’t be held accountable to a clear standard.
David Young
March 11, 2011
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: Dallas R Users Group
Discussion: TIPS FOR USING CLUSTER ANALYSIS TO SEGMENT MARKETS (CASS)
David, these are great discussions. Thanks for sharing your experience.
Posted by Larry D’Agostino, P.E.
David Young
March 11, 2011
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: Market Research Bulletin
Discussion: TIPS FOR USING CLUSTER ANALYSIS TO SEGMENT MARKETS (CASS)
I’m with Marco Vriens, when he says “A better way to segment nowadays is by using latent class models”
And whilst it is true that segmenting on (say) a regression model with LC means there is only one dependent, the maths actually works (as I understand it) to differentiate the coefficients, not just the single dependent.
But more importantly, LC can also be used to segment on just about anything as input data. I have used it with great effect on ‘standard’ segmentation inputs, maxdiff results, ‘pick any’ questions (up to 40 items where the respondent is asked to select one or more) and even on choice models.
Here is a useful summary of pros and cons I put together after talking with the guys at http://www.q-researchsoftware.com:
Traditional approaches suffer a number of disadvantages:
• The number of clusters generally needs to be determined beforehand, or independently from the clustering process
• Individuals in the sample are absolutely assigned to clusters, i.e. they are assumed to be either a member of a particular class, or they are not
• Often appropriate only for continuous (or pseudo-continuous) data
• Need to standardise input variables, and this can be fraught with difficulty (as described elsewhere in this topic)
• K-means (the most commonly used technique for cluster analysis) makes a number of implicit statistical assumptions (e.g. clusters are a priori of equal size)
• Cases (respondents) with missing data need to be deleted prior to analysis, or missing values need to be imputed (itself a debateable procedure)
• Classification algorithms group cases (respondents) that are ‘ near’ to each other, according to some ad hoc and arbitrary definition of ‘distance’
Latent Class advantages:
• Probability-based classification – cases (respondents) are assigned to clusters based on membership probabilities estimated directly from the underpinning model, a much more honest approach than the ‘all-or-nothing’ assignment of traditional methods
• Range of input variables – predictors may be continuous, categorical (nominal or ordinal), or counts, or any combination of these, or even rankings and utilities from MaxDiff or choice experiments
• Being based on a statistical ‘mixture’ model, Latent Class Analysis provides information criteria that allow the optimal determination of the number of classes to be extracted (e.g. the number which minimises AIC/BIC type measures)
• Latent Class Analysis specifically utilises cases (respondents) with some missing data; i.e. they are still included in the computations used to determine the classes and are also allocated to classes, on the assumption that the data are MAR (‘missing at random’) as opposed to the “strong and nearly impossible to justify” assumption of MCAR (‘missing completely at random’)
Disclaimer: I have no formal relationship with the Q software people. I just love the package, and the more people use it, the more the authors will invest in it.
Posted by Scott MacLean
David Young
March 11, 2011
Scott,
That’s a well thought out contribution to the discussion and with only some minor caveats I agree with the gist of your list as basically right, although it clearly takes a side in the sense that it lists only disadvantages of one technique and advantages of the other.
I’d say that LC is an excellent technique and can be applied to a great many things. I’d also concur that as you say LC will differentiate between coefficients groupings, although that might depend upon the implementation. But the goal of Latent Class Regression is prediction of the dependent. So while you find different coefficient groupings, and that flexibility could contribute to your ability to predict, what you won’t find is a balance of multiple goals, because the regression has the goal of predicting the dependent. From that goal you derive some of the LC advantageous like a non-arbitrary definition of distance. On the other hand, the last disadvantage you listed for the clustering type techniques: “Classification algorithms group cases (respondents) that are ‘near’ to each other, according to some ad hoc and arbitrary definition of ‘distance'” is also its advantage over regression style techniques. Ad hoc and arbitrary makes it sound poorly constructed, but it could be poorly done or well done. But because the analyst has constructed a new distance measure based on multiple goals the clustering type segmentation embodies the “balanced scorecard” idea. If your goal is to find different groups related to your dependent then go with LC, but if your goal is to identify consumer groups who differ on multiple strategic issues then you need a technique that counter balances the importance of those multiple issues.
I’m a fan of both LC and Clustering. Which one is better, I think depends upon your goal.
David
David Young
March 11, 2011
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: American Association for Public Opinion Research (AAPOR)
Discussion: TIPS FOR USING CLUSTER ANALYSIS TO SEGMENT MARKETS (CASS)
There is a little confusion here. Latent Class (LC) is used in two types of cluster analysis, one that is based on regression which I refer to as regression clustering and also referred to as a Latent Class Regression and the second and more basic based on substitution. These are very different methods and in this discussion they seem to be merged. Almost all of the issues raised for traditional cluster applies to Latent Class Regression. Latent Class Analysis on the other hand has some unique features.
Regarding the advantages of LC over traditional models, I think there is some confusion here. Several of the disadvantage of traditional methods are not necessary and also tend to apply to LC methods.
In all methods that I know including LC the number of clusters sought need to be identified prior to the application of the techniques. In traditional methods as well as some LC procedures tools are available to help in the selection of the number of clusters and there are analytical approaches to verify whether the number of clusters is reasonable or potential optimal.
The hard (identification) or soft (likelihood of assignment) clustering can be done using any of the available methods. Usually probabilistic assignment are obtained by applying logit models to the assignments. This is true both for traditional clustering and for LCA (Latent Class Analysis). Some methods such as regression clustering use the fit to make the likelihood estimates. In this case, the hard clusters are assigned based on the highest probability.
There are technique to use (particularly factor analysis) that can allow for the use of categorical and ordinal data to be used as well as metric data with both traditional clustering and Latent Class Regression. However, categorical data does produce some problems and Latent Class Analysis is often a preferred approach as long as the data is valid.
I’ve not found K-Means results in equal sized segments. It would be a useful property but I’ve not found it to be the case. Please give me a citation on this one.
In some way, all clustering methods or more generally pattern recognition procedures require some measure of fit. In fact, even in those that are highly specified in practice can be modified. In the case of hierarchical clustering, the ability to select various measures of distance and linkage is it greatest flexibility. While I tend to use a single set of conditions, I don’t view the broader capability as a disadvantage but the opposite. The fundamental principle of clustering is finding groups of similar cases. The ability to adjust the definition of fit, is a power aid.
While Latent Class Regression is based on having mixed sample, Latent Class Analysis is not. LCA is an assignment procedures similar to Bayesian analysis and the EM algorithm for missing data. And that is the reason why it can handle missing data. It is a powerful tool, but not particularly reliable. While some traditional methods have been shown to reproduce embedded cluster structures, LCA has not. Furthermore, LCA is does not produce unique solutions, the resulting clusters are not inherent explainable by the underlying variables, and it may be unstable. Typically I consider LCA as a heroic procedure. Useful in many cases, but I would not prefer to use it as a primary tool.
As I had previously noted in the thread, Latent Class Regression tends to produce “weak” and non-distinct assignments. It is basically a means to improve the underlying regression structure of the data. While I have found it extremely useful in understanding the nature of mixture data, it is not particularly useful in the assignment of cases into specific groups, which is usually the function of clustering.
Gene
Posted by Gene Lieb
David Young
March 11, 2011
Gene,
Thanks for the informative post. It certainly added to my knowledge of LC.
Dave
David Young
March 11, 2011
Due the general advocacy for LC I decided to investigate it further in terms of its use clustering without regression. I’ll confess that I’d thought of it as a predictive tool and was unaware that it could be used without a dependent. Here’s what I found.
LCA vs K-Means
LCA forms clusters based on the density distribution, including the variance, as opposed to just the mean, as is done by K-Means. Here’s an analogy. If you had two piles of sand representing the density of observations, LC would divide the two piles by cutting them at the lowest point in the valley formed between the piles. If one pile is bigger than the other, then the valley would be farther from the ‘middle’ of the big pile than from the ‘middle’ of the small pile because the big pile is wider. Since K-Means only looks at the means, it would cut the two piles at the mid-point between the two pile centers, which would be part way up the side of the bigger pile of sand. (As an aside, this is why K-Means tends to form more equally sized segments than LCA, but neither necessarily identifies equal or unequal groups.) Based on that, it sounds like LCA is generally better than K-Means, and in many cases I’d say that’s true. Let’s look at an example where LCA would be superior.
WHEN LCA WINS
Fisher’s Iris data: Since the distributions of the petal lengths and widths come from three distinct genetic strains of Iris, it makes sense to split the observations along those strains. In pattern recognition the variance of the petals is arguably just as important as the means. Here’s a link to an example showing what I’ve described: http://en.wikipedia.org/wiki/K-means_clustering. The graphic of the “mouse data”, below the Iris data, shows a comparison of K-Means versus the EM algorithm which is used in Latent Gold’s implementation of LCA.
WHEN K-MEANS WINS
Now that we’ve given LCA its due, let’s look at when it might not be the best choice. Let’s say your goal is to identify consumer markets for knives and you’ve conducted a survey that contained information about preferences for the type of serration on the blade. Like before, let’s say there is a small group of people with a preference for a pointy serration and a large group preferring a smoother serration. If YOU prefer a knife with a pointier serration and have to choose between one that’s “a little” too pointy and one that’s “a lot” too smooth, should we include you in the smooth serration group because a lot of other people want a smoother serration? Maybe it makes more sense in this case to just look at how many people are close to each mean? In this type of scenario, where the “structure” of the data doesn’t necessarily represent anything concrete, it probably should not override the data value, in this case preference for serration.
BACK TO THE ORIGNAL POST
My first post was premised on the idea that your goal in segmenting was to identify market segments along multiple issues of interest so that you could organize to be attractive to key consumer groups. The tips advocated taking an active role in determining the importance of different strategic issues and offered some cautions regarding standardization, factor analysis, and variables being included, that might all have unintended effects on the implied importance of different issues.
LCA’s use of the density distribution effectively takes out the scale and makes all variables more equally important in cluster formation. For the goal I premised I probably wouldn’t chose it for the reasons I cited in the first post.
On the other hand if I wanted to identify some unknown phenomena “causing” the data to fall into specific patterns, it does seem that LCA does a superior job of recognizing these patterns. If I suspected that one or more yet unknown diseases were being lumped together, LCA could be a good tool for identifying distinct patterns in the data.
Dave
David Young
March 11, 2011
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: American Association for Public Opinion Research (AAPOR)
Discussion: TIPS FOR USING CLUSTER ANALYSIS TO SEGMENT MARKETS (CASS)
David, Nice discussion, very good thoughts. The issue of focusing on the criteria for clustering is critical. The really statistical work-horse for segmentation, however, is Hierarchical Clustering which provides any number of methods of grouping cases based on distance between and within clusters. Hierarchical Clustering represents far more powerful approaches to clustering then either K-Means or LCA and provides broad range of variations of methods including the consideration of distribution properties. The traditional downside to Hierarchical Clustering has been that it is computationally intensive (it takes a lot of computing resources) which increase quadratically with the number of cases being analyzed. K-Means had been usually used for large samples. I don’t know how LCA scales, but I assume that it is more forgiving than Hierarchical Clustering. However, with the advent of fast computers with huge memories, the limitations on Hierarchical Clustering relegated to very large data mining applications.
The advantages of LCA over Hierarchical Clustering is that it handles categorical and missing data well. The downside still is that it does not provide unique solutions, an inherent explanation of the formation of clusters, and it tends to produce solutions that are not distinct.
I have been able to reproduce cluster structures using K-Means and some forms of Hierarchical Clustering but have not been able to do so with LCA. This involves creating synthetic data-sets with inherent cluster structures. The various methods of clustering are then tested to try to reproduce that structure. Both K-Means and Hierarchical Clustering (using Wards linkage of Euclidean distances) were able to reproduce the structures over a range of underlying conditions. I was unable to do so with LCA. However, the tests of LCA were using categorical variables which is a much more difficult task.
Gene
Posted by Gene Lieb
David Young
March 11, 2011
Gene,
Thanks for sharing the results of your comparisons. I’ve not done all the clustering comparisons that it sounds like you have, but the Cluster procedure offeried by SAS includes eleven or so hierarchical clustering choices of which the two-staged one looks pretty good based on the comparison in their examples. I’d not really meant to solely focus on K-Means, but I didn’t want to get drawn into comparing all the possibilities and the LinkedIN 4,000 character limit on posts had me cutting back as it was.
I’d also like to take this chance to thank Marco Vriens for his initially sparking this debate and for his private mails to me, and Scott MacLean for picking up and carrying forward the LC torch. Without them pushing me forward I wouldn’t have been aware of LC as a non-regression clustering choice, researched it, or thought through when it might be a good option.
David
David Young
March 11, 2011
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: Advanced Business Analytics, Data Mining and Predictive Modeling
Discussion: TIPS FOR USING CLUSTER ANALYSIS TO SEGMENT MARKETS (CASS)
The article is a relevant influence in as much it tries to address certain conceptual blurs; so to speak.
Trap-1 as highlighted in the article:
Fundamental apportioning of probabilistic weights to the strength of each response to a poser is important as well as powerfully effective in solving scaling problems and eliminates the simplistic approaches to a problem by equalizing to a mean zero and a variance 1. FOR INSTANCE, A RESPONSE SHIFTS TO A MORE LIKELY ASSERTION THAN A PROBABLE ASSERTION. THIS COULD BE DIFFERENTIATED BY ATTACHING A STRENGTH WEIGHT OF SAY 0.85 TO THE FORMER RESPONSE WHILE GIVING 0.65 TO THE LATTER’S TENTATIVE RESPONSE. Scaling on a probabilistic reference frame solves the problem of normalization more effectively.
Trap-2: Responses for different posers need to find inter-dependence linkages and a combined weight that explains the covariance between posers and the corresponding response sets is vital in understanding the nuances of the complexity in explaining the anomalies that would creep up in a survey. I call it a neural number that explains the inter-dependence between influences and the strength or otherwise is explained by the absolute value in a scale of -1 to 1 through 0. That usually evades the trap-2.
Trap-3: More often than not, rare outliers classified as peculiarly innate traits characterize decision-making in a maze of similar sounding data output. The ruling is to treat the distribution as one of a Poisson distribution to highlight rare occurrences that would explain deciding characteristics.
Posted by Debashish Banerjee
David Young
March 11, 2011
REPOSTED WITH PERMISSION OF THE CONTRIBUTOR
Group: Retention , CRM, Customer Insights & Loyalty Marketing professionals
Discussion: TIPS FOR USING CLUSTER ANALYSIS TO SEGMENT MARKETS (CASS
my experience? make sure you involve marketing strategy teams in the process and define clear objectives what you will do with the cluster segmentation. Often, this causes some confusion with other marketing segmentation exercises (the outcome is never the same!). I lost a whole year explaining afterwards that there may be different types of segmentation ! 🙂
Posted by Bert Van Driessche