Customer Segmentation

  • Problem: we don't know if we have different types of customers and how to approach them
  • Goals:
    • We want to understand better our customers
    • We want to have clear criteria to segment our customers
  • Why? To perform specific actions to improve the customer experience

Technique to solve the business problem

We need a formal definition

Customer segmentation is the practice of dividing a customer base into groups of individuals that are similar in specific ways relevant to marketing, such as age, gender, interests and spending habits.

The most common forms of customer segmentation are:

  • Geographic segmentation: considered as the first step to international marketing, followed by demographic and psychographic segmentation.
  • Demographic segmentation:based on variables such as age, sex, generation, religion, occupation and education level.
  • Firmographic: based on features such as company size (either in terms of revenue or number of employees), industry sector or location (country and/or region).
  • Behavioral segmentation: based on knowledge of, attitude towards, usage rate, response, loyalty status, and readiness stage to a product.
  • Psychographic segmentation: based on the study of activities, interests, and opinions (AIOs) of customers.
  • Occasional segmentation: based on the analysis of occasions (such as bieng thirsty).
  • Segmentation by benefits: based on RFM, CLV, etc.
  • Cultural segmentation: based on cultural origin.
  • Multi-variable segmentation: based on the combination of several techniques.

Main Concepts

Customer Segmentation Techniques

  • Single discrete variable (CLV, RFM, CHURN)
  • Clustering: K-means, Hierarchical
  • Latent Class Analysis (LCA)
  • Finite mixture modelling (ex. Gaussian Mixture Modelling)
  • Self Organizing maps
  • Topological Data Analysis
  • PCA
  • Spectral Embedding
  • Locally-linear embedding (LLE)
  • Hessian LLE
  • Local Tangent Space Alignment (LTSA)
  • Random forests, Decision Trees

Implementation Process

  • [BU] Determine business needs
  • [DU] Sourcing, Cleaning & Exploration
  • [DP] Feature Creation (Extract additional information to enrich the set)
  • [DP] Feature Selection (Reduce to a smaller dataset to speed up computation)
  • [M] Select Customer Segmentation Technique (test and compare some of them)
  • [M] Applied Selected Customer Segmentation Technique
  • [E] Analyze results and adjust parameters
  • [D] Present and explain the results

Benefits

This technique provides the following benefits:

  • Customer profiling
  • Targeted marketing actions
  • Targeted operations

Use cases

This technique is used in different use cases:

  • Reporting
  • Commercial actions: Retention offers, Product promotions, Loyalty rewards
  • Operations: Optimise stock levels, store layout
  • Pricing: price elasticity
  • Strategy: M&A, new products,...

How to implement this algorithm using R

K-means

Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k (≤ n) sets S = {S1, S2, …, Sk} so as to minimize the within-cluster sum of squares (WCSS) (sum of distance functions of each point in the cluster to the K center). In other words, its objective is to find:

$$ \underset{\mathbf{S}} {\operatorname{arg\,min}} \sum{i=1}^{k} \sum{\mathbf x \in S_i} \left| \mathbf x - \boldsymbol\mu_i \right|^2

$$

where $$μ_i$$ is the mean of points in $$S_i$$.

Case

We consider the dataset: Wholesale customers Data Set. Abreu, N. (2011). Analise do perfil do cliente Recheio e desenvolvimento de um sistema promocional. Mestrado em Marketing, ISCTE-IUL, Lisbon

This dataset has the following attributes:

  • FRESH: annual spending (m.u.) on fresh products (Continuous);
  • MILK: annual spending (m.u.) on milk products (Continuous);
  • GROCERY: annual spending (m.u.) on grocery products (Continuous);
  • FROZEN: annual spending (m.u.) on frozen products (Continuous)
  • DETERGENTS_PAPER: annual spending (m.u.) on detergents and paper products (Continuous)
  • DELICATESSEN: annual spending (m.u.) on and delicatessen products (Continuous);
  • CHANNEL: customers Channel - Horeca (Hotel/Restaurant/Café) or Retail channel (Nominal)
  • REGION: customers Region of Lisbon, Oporto or Other (Nominal)
# Install packages
install.packages("NbClust")

# Load packages
library(NbClust)

# Load data
data <- read.csv('data/chapter7.csv', header = T,sep=',')

# Review data structure
str(data)

# Review data
summary(data)

# Scale data
testdata <- data 
testdata <- scale(testdata)

# Determine number of clusters. Option 1: visual rule
wss <- (nrow(testdata)-1)*sum(apply(testdata,2,var))
for (i in 2:15) wss[i] <- sum(kmeans(testdata, 
                                     centers=i)$withinss)
plot(1:15, wss, type="b", xlab="Number of Clusters",
     ylab="Within groups sum of squares")

# Determine number of clusters. Option 2: more frequent optimal number
res <- NbClust(data, diss=NULL, distance = "euclidean", min.nc=2, max.nc=12, 
             method = "kmeans", index = "all")

# More information
res$All.index
res$Best.nc
res$All.CriticalValues
res$Best.partition

# K-Means Cluster Analysis (based on the proposed number by NbCluster)
fit <- kmeans(testdata, 3)

# Calculate average for each cluster
aggregate(data,by=list(fit$cluster),FUN=mean)

# Add segmentation to dataset
data <- data.frame(data, fit$cluster)

References

results matching ""

    No results matching ""