• Understanding Customer Churn in Telecom

    using Logistic Regression in R

  • DATASET INFO

    The dataset contains a sample of 3333 observations, sourced from the portfolio of customers of a telecommunications company.


    The dependent variable of the model is the Churn outcome observed during the previous period. Churn is a binary (categorical) variable, with 2 possible outcomes; 0=the customer left the company / 1=the customer stayed with the company.


    The dataset also contains 20 independent variables, that provide details on the usage during the previous period (behavioral data) and some demographics.

    STACK

    R, Logistic Regression, LASSO, Variable Selection (AIC/BIC)

    PROJECT DESCRIPTION

    In this study, we attempted to formulate a regression model that identifies the characteristics that influence whether a customer is probable to switch telecommunications providers (Churn). We started with a logistic regression model that made use of all variables, performed variable selection using AIC and BIC through the stepwise method and moved on to other models, using LASSO, or a few simple (aggregate) transformation of certain predictor variables. We concluded that the best logistic regression model we could find was produced with an aggregate transformation of the variables that concern domestic charges for various times of the day (Day Charges, Evening Charges, Night Charges).

    CODE

    Code in R is available here

  • SHORT PRESENTATION