Predicting Customer Satisfaction using H20.ai Auto ML running on Azure Data Science Virtual Machine

Dr. Caio Moreno
4 min readFeb 2, 2021

In this tutorial, you will learn how to quickly predict customer satisfaction using H20.ai AutoML running on Azure Data Science Virtual Machine.

Which customers are happy customers?

Happy or Unhappy customers

Around 5 years ago, Santander Bank created a prediction competition at Kaggle to predict which customers are happy customers. The money prize was $60.000.

Overview about the use case / competition from Kaggle.

From frontline support teams to C-suites, customer satisfaction is a key measure of success. Unhappy customers don’t stick around. What’s more, unhappy customers rarely voice their dissatisfaction before leaving.

Santander Bank is asking Kagglers to help them identify dissatisfied customers early in their relationship. Doing so would allow Santander to take proactive steps to improve a customer’s happiness before it’s too late.

In this competition, you’ll work with hundreds of anonymized features to predict if a customer is satisfied or dissatisfied with their banking experience.

Competition link: Santander Customer Satisfaction | Kaggle

The dataset

The dataset provided by Santander Bank is anonymized and contains 371 variables (all continuous variables).

A continuous variable is a variable that has an infinite number of possible values.

The TARGET column is the variable to predict. It equals 1 (one) for unsatisfied customers and 0 for satisfied customers.

The Kaggle Competition objective is to predict who are satisfied and unsatisfied clients.

Numbers of observations (Row number):

  • Train: 76020 rows
  • Test: 75818 rows

Number of 1s (train): 3008 (3.95%) (Imbalanced Dataset Problem)

Variables:

  • 34 variables with one single value; (Suggested action: Delete all of them)
  • 100 variables with two unique values; (binary variables)
  • 157 variables with values between 3 y 101…
Dr. Caio Moreno

Solutions Architect and Data Scientist @databricks | PhD @unicomplutense (Opinions are my own)