RFM - Customer Level Data
RFM (recency, frequency, monetary) analysis is a behavior based technique used to segment customers by examining their transaction history such as
- how recently a customer has purchased (recency)
- how often they purchase (frequency)
- how much the customer spends (monetary)
It is based on the marketing axiom that 80% of your business comes from 20% of your customers. RFM helps to identify customers who are more likely to respond to promotions by segmenting them into various categories.
To calculate the RFM score for each customer we need transaction data which should include the following:
- a unique customer id
- number of transaction/order
- total revenue from the customer
- number of days since the last visit
rfm includes a sample data set
rfm_data_orders which includes the above details:
## # A tibble: 39,999 x 5 ## customer_id revenue most_recent_visit number_of_orders recency_days ## <dbl> <dbl> <date> <dbl> <dbl> ## 1 22086 777 2006-05-14 9 232 ## 2 2290 1555 2006-09-08 16 115 ## 3 26377 336 2006-11-19 5 43 ## 4 24650 1189 2006-10-29 12 64 ## 5 12883 1229 2006-12-09 12 23 ## 6 2119 929 2006-10-21 11 72 ## 7 31283 1569 2006-09-11 17 112 ## 8 33815 778 2006-08-12 11 142 ## 9 15972 641 2006-11-19 9 43 ## 10 27650 970 2006-08-23 10 131 ## # i 39,989 more rows
So how is the RFM score computed for each customer? The below steps explain the process:
A recency score is assigned to each customer based on date of most recent purchase. The score is generated by binning the recency values into a number of categories (default is 5). For example, if you use four categories, the customers with the most recent purchase dates receive a recency ranking of 4, and those with purchase dates in the distant past receive a recency ranking of 1.
A frequency ranking is assigned in a similar way. Customers with high purchase frequency are assigned a higher score (4 or 5) and those with lowest frequency are assigned a score 1.
Monetary score is assigned on the basis of the total revenue generated by the customer in the period under consideration for the analysis. Customers with highest revenue/order amount are assigned a higher score while those with lowest revenue are assigned a score of 1.
A fourth score, RFM score is generated which is simply the three individual scores concatenated into a single value.
The customers with the highest RFM scores are most likely to respond
to an offer. Now that we have understood how the RFM score is computed,
it is time to put it into practice. Use
to generate the score for each customer from the sample data set
rfm_table_order() takes 8 inputs:
data: a data set with
- unique customer id
- date of transaction
- and amount
customer_id: name of the customer id column
order_date: name of the transaction date column
revenue: name of the transaction amount column
analysis_date: date of analysis
recency_bins: number of rankings for recency score (default is 5)
frequency_bins: number of rankings for frequency score (default is 5)
monetary_bins: number of rankings for monetary score (default is 5)
analysis_date <- as.Date('2007-01-01') rfm_result <- rfm_table_customer(rfm_data_customer, customer_id, number_of_orders, recency_days, revenue, analysis_date) rfm_result
rfm_table_customer() will return the following columns
as seen in the above table:
customer_id: unique customer id
date_most_recent: date of most recent visit
recency_days: days since the most recent visit
transaction_count: number of transactions of the customer
amount: total revenue generated by the customer
recency_score: recency score of the customer
frequency_score: frequency score of the customer
monetary_score: monetary score of the customer
rfm_score: RFM score of the customer
The heat map shows the average monetary value for different categories of recency and frequency scores. Higher scores of frequency and recency are characterized by higher average monetary value as indicated by the darker areas in the heatmap.
rfm_bar_chart() to generate the distribution of
monetary scores for the different combinations of frequency and recency
rfm_histograms() to examine the relative
- monetary value (total revenue generated by each customer)
- recency days (days since the most recent visit for each customer)
- frequency (transaction count for each customer)
Visualize the distribution of customers across orders.
Let us classify our customers based on the individual recency, frequency and monetary scores.
|Champions||Bought recently, buy often and spend the most||5||5||5|
|Potential Loyalist||Recent customers, spent good amount, bought more than once||3 - 5||3 - 5||2 - 5|
|Loyal Customers||Spend good money. Responsive to promotions||2 - 4||2 - 4||2 - 4|
|Promising||Recent shoppers, but haven’t spent much||3 - 4||1 - 3||3 - 5|
|New Customers||Bought more recently, but not often||4 - 5||1 - 3||1 - 5|
|Can’t Lose Them||Made big purchases and often, but long time ago||1 - 2||3 - 4||4 - 5|
|At Risk||Spent big money, purchased often but long time ago||1 - 2||2 - 5||4 - 5|
|Need Attention||Above average recency, frequency & monetary values||1 - 3||3 - 5||3 - 5|
|About To Sleep||Below average recency, frequency & monetary values||2 - 3||1 - 3||1 - 4|
|Lost||Bought a long time ago, average amount spent||1 - 1||1 - 5||1 - 5|
We can use the segmented data to identify
- best customers
- loyal customers
- at risk customers
- and lost customers
Once we have classified a customer into a particular segment, we can take appropriate action to increase his/her lifetime value.
## Warning in instance$preRenderHook(instance): It seems your data is too big for ## client-side DataTables. You may consider server-side processing: ## https://rstudio.github.io/DT/server.html