Real Time Loan Default Prediction


Nowadays, There are many risks related to bank loans, for the bank and for those who get the loans. The analysis of risk in bank loans need understanding what is the meaning of risk.

 

Financial companies who have interests in personal loans face risk from customers who default. With the advent of predictive analytics, the risk to the company can be reduced by being able to predict the outcome of these loans. The personal data collected by the companies during loan applications is a goldmine of information. Mining the data to create powerful features which can predict the loan outcome, can prevent write offs for lenders. This increases profitability and can help maintain a healthy lending market. With the data, the interest rate and loan grade prediction can be useful to businesses as well. This allows for faster processing of loans for the customers and lenders.

Problem Statement

Multiclass classification problem for predicting loan status from a rich dataset. The problem can be reduced to a binary classification problem to build a loan approval pre-check system for potential customers.

Exploratory data analysis

Some of the examples of exploratory data analysis are:

  1. Visualize the distribution of loan amounts to understand lending trends over time
  2. Extensive data driven study of the reasons of people taking the loans and ascertaining reasons for the default.
  3. Study and visualize the demographic nature of loan defaults.
  4. Computing and validating correlations between load status and indicators like FICO score, credit score and other personal history in order to model interactions.
  5. Visualizing the correlation between interest rates and loan grades. Correlating the interest rate with demographic data will be an interesting task.

Proposal

The interesting problems that an investigator might face with this type of dataset including :

  1. Practical and usable cross-validation strategy.
  2. Choosing evaluation metric for the dataset and the problem being solved.
  3. Handling missing data.
  4. Building features from the rich text data available in loan descriptions.
  5. Example solutions include:
  • Predicting loan status from loan grades.
  • Using logistic regression to predict loan status (in case problem is being reduced to a binary classification problem).
  • Using LDA to parse the textual data and categorize loans into certain categories from available information.
  • Building a tree or neural network-based solution and generating the most important features.

 

Four basic building blocks of XORML architecture :

  1. Big Data Analytics
  2. Data Engineering
  3. Machine Learning
  4. Model Deployment and productization

 

1. Big Data Analytics :

        XORML will help you in examining large data set and to uncover information including patterns, unknown correlation, market trends and customer preferences that can help organizations make informed business decision.

2. Data Engineering :

        XORML will help you in facilitating data cleansing and enrichment through data de-duplication and construction. 

3. Machine Learning :

       XORML adapted advanced machine learning algorithms like Lasso Regression, Ridge Regression, Random Forest, Clustering Algorithms for customer segmentation.

4. Model Deployment and productization :

       XORML was able to use Databricks to build its entire data ingest pipeline with Apache Spark in a matter of six weeks. Not only was Databricks able to provide high performance and reliable Spark clusters instantly, it also democratized the access of every XORML team to Spark.

 

Results obtained from classification model : Confusion Matrix

Confusion Matrix

 

Classification Model Evaluation metrics

areaUnderPR

0.938782884769

areaUnderROC

0.887388237246

accuracy

0.934006094632

falsePositiveRate 0.0

0.0249861596273

falsePositiveRate 1.0

0.20023736588

truePositiveRate 0.0

0.79976263412

truePositiveRate 1.0

0.975013840373

precision 0.0

0.907215369501

precision 1.0

0.940968708924

recall 0.0

0.79976263412

recall 1.0

0.975013840373

fMeasure

0.850106975119

accuracy

0.934006

Weighted Precision

0.933071

Weighted Recall

0.934006

F1

0.932515

 

 

 

 

 


XORML Loan default prediction use case Data Analysis Data Engineering Machine Learning Big Data

Related Stories

    blog comments powered by Disqus