Customer Loyalty Prediction
Challenge: Predict the customer loyalty score for recent clients
Solution: Create a machine learning model that can accurately predict every customer’s loyalty score well in advance
Technologies Used: Python, Keras, Scikit-learn, Light GBM Model
Client: A payment processing company
Challenge: Predict customer loyalty scores
Our client, a payment processing company, was looking for ways to predict new customers’ loyalty scores early on by using their spending pattern and demographic data. The customer loyalty scores are valuable indicators for help companies refine their marketing strategy. In addition, companies can enhance their brand equity by analyzing the driving factors in customer satisfaction and loyalty, and investing in them.
A new customer’s purchasing pattern is a key predictor of future purchases. It is noteworthy that purchasing patterns include several data points, such as transaction amount, frequency of transactions, number of transactions, geographies of the transactions, merchant categories, new merchant purchases, etc. Initially, we identified and studied several customer and transaction features based on historical data in order to find the most relevant features. After cleaning the data and preprocessing, we developed a model that accurately predicts new customers’ loyalty score.
At the initial phase, we had to set up a big data system in order to explore, manipulate and analyze millions of data points. During the data exploration phase of the project we utilized Google’s BigQuery platform to identify and engineer relevant features and overcome the big data challenges posed by the client’s large dataset.
Based on our initial analyses, we realized that the transaction data is more predictive of loyalty if analyzed as a dynamic time series. In other words, absolute value of transaction features were not as powerful indicators as the relative changes in those features over time. For example if the transaction amounts or the number of transactions for a specific client are increasing or decreasing over time, or if the client is processing payments through a handful of merchants or if they are continually making transactions at new merchants that they have not visited in the past. What we ultimately realized was that a customer’s loyalty score is dynamically correlated with how basic dimensions change over time rather than a static snapshot of their status at every point in time.
After we finalized the relevant features, we were able to pre-process the payment processor’s data and engineer a number of new features to incorporate the time series nature of the problem into the model.
Due to the nature of the problem, computing resources available, and the size of the training data we opted for using a Light Gradient Boosting Model (LGBM) and utilized a Bayesian Optimization framework to tune the hyper parameters for this model to achieve optimum results.