Generating Reproducible Clusters and Evaluating Their Utility in Market Research

The segmentation of survey respondents into actionable groups is a familiar analysis performed in market research. Identifying unique subsets of consumers informs better decision-making in a given product space. Standard practice in the industry is to employ k-Means and Hierarchical Clustering Analysis methods, and in some cases ensembles thereof, to find these segments. It is commonly recognized that clustering as a problem is ill-defined. The absence of a response variable or labels in the data means unsupervised learning methods are difficult to evaluate. Furthermore, cluster analysis will always produce a clustering solution whether or not there are underlying relationships in the data to justify those clusters. As there is no ground-truth, accuracy measure, or error signal with which to measure a solution, it is difficult for a market researcher to know if a segmentation has utility. Beyond utility, segmentations become about the story that can be told by a particular clustering of the respondents. This thesis considers a variety of ways segmentations can be appraised objectively and how doing so impacts the story a market researcher can draw from the data. It will define criteria with which to evaluate the output of clustering methods. The criteria will measure a segmentation’s usefulness to domain experts as representative of an actionable marketplace. Consequently, an array of methods and distance measures will also be analyzed and evaluated across these evaluation criteria to identify best-practice algorithms and dissimilarity measures for market research cluster analyses. Five public datasets are used in this study and consist of respondent-level survey data on Likert scale variables.