A Portuguese banking institution, ran a marketing campaign to convince potential customers to invest in a bank term deposit scheme.
The marketing campaigns were based on phone calls. Often, the same customer was contacted more than once through phone, in order to assess if they would want to subscribe to the bank term deposit or not.
Our goal is to perform the marketing analysis of the data generated by this campaign.
The data fields are as follows:
- age- numeric
- job- type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
- marital- marital status (categorical: 'divorced', 'married', 'single', 'unknown'; note: 'divorced' means divorced or widowed)
- education- (categorical: 'basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown')
- default- has credit in default? (categorical: 'no', 'yes', 'unknown')
- housing- has housing loan? (categorical: 'no', 'yes', 'unknown')
- loan- has a personal loan? (categorical: 'no', 'yes', 'unknown')
- contact- contact communication type (categorical: 'cellular', 'telephone')
- month- Month of last contact (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
- day- last contact day of the week (categorical: 'mon','tue','wed','thu','fri')
- duration- last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (example, if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call “y” is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.
other attributes: - campaign- number of times a customer was contacted during the campaign (numeric, includes last contact)
- pdays- number of days passed after the customer was last contacted from a previous campaign (numeric; 999 means customer was not previously contacted)
- previous- number of times the customer was contacted prior to (or before) this campaign (numeric)
- poutcome- outcome of the previous marketing campaign (categorical: 'failure', 'nonexistent', 'success')
Output variable (desired target): - y- has the customer subscribed a term deposit? (binary: 'yes', 'no')
- Load data and create a Spark data frame
- Give marketing success rate (No. of people subscribed / total no. of entries)
- Give marketing failure rate
- Give the maximum, mean, and minimum age of the average targeted customer
- Check the quality of customers by checking average balance, median balance of customers
- Check if age matters in marketing subscription for deposit
- Check if marital status mattered for a subscription to deposit
- Check if age and marital status together mattered for a subscription to deposit scheme
- Do feature engineering for the bank and find the right age effect on the campaign.