LOGO

Customer Behavior Modelling

machine learningllmnlp

Customer Analytics / Churn and Retention Analysis

This project utilizes free customer data from Kaggle to explore, predict, and visualize customer behavior.

The main goal is to understand churn patterns, segment customers using behavioral metrics, and communicate insights through interactive Power BI dashboards, along with my thought process behind every decision and model used.

Our data from Kaggle was relatively clean, requiring minimal preprocessing.

Males represent approximately 46.64% of the dataset, with 38.58% of them churning. In comparison, 18,911 out of 34,353 females churned.

I engineered new features to enhance the analysis:

  • Engagement Ratio: Usage frequency divided by tenure.
  • Age Groups: Customers grouped by age ranges.
  • New Customers: Customers with less than 6 years of tenure.
  • Frequent Caller: Based on the assumption that frequent support calls may be related to churn.
  • Payment Issue: Derived from the "Payment Delay" variable, assuming customers with delays may be at risk due to financial constraints or contract neglect.

For encoding:

  • OrdinalEncoder was used for categorical variables with an inherent order.
  • LabelEncoder was used for nominal variables such as gender.

The dataset was balanced and did not require SMOTE or other resampling techniques.

A major goal was to evaluate various classification models for churn prediction. The best model was:

  • Random Forest Classifier with an accuracy of 0.99844 and an F1 score of 1.0.
  • This slightly outperformed Gradient Boosting by 0.001.
  • KNN, SVM, and Logistic Regression also performed reasonably well.

Customer Segmentation:

Cluster

We applied RFM (Recency, Frequency, Monetary) analysis and K-Means clustering for customer segmentation. Key insights include:

  • Customers with higher recency scores tend to spend more, though the pattern is not strictly linear.
  • Customers with moderate to high frequency also tend to be higher spenders, though not always.

The clustering produced four distinct segments:

  • Loyal Moderate Spenders
  • At-Risk Low Spenders
  • High-Value Dormant
  • New/Recent Customers
ClusterSuggested SegmentCharacteristicsSuggested Action
0Loyal Moderate SpendersRecent, frequent, moderate spendUpselling, loyalty programs
1At-Risk Low SpendersInactive, moderate past interaction, low spendReactivation campaigns
2High-Value DormantInactive, some past interaction, high spendHigh-priority win-back
3New/Recent CustomersVery recent, low interaction, moderate spendNurturing and engagement

We asked: Do churn rates align with these segments? After analysis, we found that the segmentation holds well overall. However, Cluster 2 would be better labeled as High-Value Steady Customers to reflect their consistent behavior and value.

ClusterAdjusted Segment NameCharacteristicsBusiness Strategy
0Loyal Moderate SpendersRecent, frequent, moderate spend, low churnRetention & Upselling: Loyalty rewards, personalized offers to increase spend, referral incentives
1At-Risk Low SpendersInactive, moderate past interaction, higher churnReactivation Campaigns: Targeted discounts, reminders, win-back emails, understand reasons for churn
2High-Value Steady CustomersModerate activity, high spend, low churnProtect & Reward: VIP treatment, premium support, early access to offers, prevent churn at all costs
3New/Uncertain CustomersRecent interaction, low interaction frequency, moderate churnOnboarding & Engagement: Educational content, welcome offers, early satisfaction surveys to build loyalty

I also used Power BI to show KPIs and some analysis. View KPI Dashboard

View Project