Building Better Credit Scores

Machine Learning and NLP for Optimized Risk Assessment

By Aman Kar, Daniel Mathew, Tracy Pham

View on GitHub

Introduction

Traditional Credit Scoring Model (FICO Score)
Traditional Credit Scoring Model (FICO Score)

Credit scores are pivotal in today’s financial landscape, influencing everything from rental eligibility to access to health insurance, yet the formula for calculating creditworthiness has long been shrouded in mystery and often overlooks important nuances. Typically, the credit score is determined based on five factors: payment history, amount owed, new credit, credit history, and credit mix. This structure can place individuals with limited credit history, especially young adults who are just starting out building their credit, at a compounded disadvantage, restricting their access to loans, credit cards, employment opportunities, and insurance. Our proposed Cash Score model aims to address these limitations by providing a more comprehensive measure of creditworthiness. The Cash Score model leverages detailed account transaction data to predicts the probability of defaulting on a loan (as known as loan delinquency). This approach highlights the potential for transaction-based credit evaluation to more accurately assess financial risk and improve access to credit, offering a fairer alternative to traditional credit scoring methods.

Cash Score Model Pipeline
Cash Score Model Pipeline
We adopted an iterative approach to model development, emphasizing continuous refinement and enhancement of features alongside model selection and performance evaluation. We began with logistic regression to establish a baseline and identify key features. As the process evolved, we integrated more advanced algorithms like HistGradientBoosting (HistGB), CatBoost, LightGBM, and XGBoost, chosen for their ability to handle complex data patterns. Throughout the iterations, we focused on refining and enhancing feature generation, selecting the most relevant ones to improve the model’s performance. This iterative process allows us to optimize the model’s predictive power.

Methods

Data Description

Our analysis leverages four key datasets that provide insights into consumer accounts, transaction histories, and credit scores. As the datasets were prepared and preprocessed by Prism Data, this minimized the need for extensive data cleaning. Our primary focus in terms of data cleaning was reviewing the data for consistency, addressing any remaining missing values, standardizing categorical variables, and structuring time-series data to optimize it for modeling.

Below are the heads of the datasets used in this analysis. Click on each section to expand and view the data:

prism_consumer_id prism_account_id account_type balance_date balance
3,0230SAVINGS2021-08-3190.57
3,0231CHECKING2021-08-31225.95
4,4162SAVINGS2022-03-3115,157.17
4,4163CHECKING2022-03-3166.42
4,2274CHECKING2021-07-317,042.90
The acctDF.csv dataset provides detailed information about consumer financial accounts, such as account types, balances, and balance dates.
prism_consumer_id evaluation_date credit_score DQ_TARGET
02021-09-017260
12021-07-016260
22021-05-016800
32021-03-017340
42021-10-016760
The consDF.csv dataset which provides credit scores, evaluation dates, and delinquency targets for each consumer. This dataset is essential for building a model of credit risk, as it contains direct indicators of a consumer’s creditworthiness. The delinquency targets serves as the dependent variable, enabling us to assess our model’s per- formance in predicting credit risk.
prism_consumer_id prism_transaction_id category amount credit_or_debit posted_date
3,023040.05CREDIT2021-04-16
3,023112481.56CREDIT2021-04-30
3,023240.05CREDIT2021-05-16
3,023340.07CREDIT2021-06-16
3,023440.06CREDIT2021-07-16
The trxnDF.csv dataset records individual transactions, including transaction category IDs, amounts, and whether the transaction was a credit or debit. These transactional data are vital for modeling consumer behavior, such as income sources, spend- ing habits, and cash flow.
category_id category
0SELF_TRANSFER
1EXTERNAL_TRANSFER
2DEPOSIT
3PAYCHECK
4MISCELLANEOUS
The cat_map.csv dataset maps transaction categories to their corresponding category IDs, allowing us to classify and interpret the transactions effectively.

It is important to note that, in compliance with the Equal Credit Opportunity Act (ECOA), we excluded specific transaction categories that could introduce bias in credit decision-making. Categories related to child dependents, healthcare and medical expenses, unemployment benefits, education, and pensions have been removed to ensure that our model does not unintentionally discriminate based on protected attributes.

Exploratory Data Analysis

Through exploratory data analysis (EDA), we examined consumer transaction trends and spending patterns to uncover insights that aid in identifying key factors for predicting credit risk. Below are a few examples of EDA conducted to look at temporal trends, transaction frequency, spending categories, and the impact of specific financial behaviors.

Feature Generation

We engineered features to capture financial behavior through transaction history, balance trends, spending patterns, and risk indicators. Our feature generation process included:

Feature Selection

The final dataset contained more than 2,000 features, with the dataframe shape being 15000 rows × 2430 columns. To refine model input, we performed feature selection using the following techniques:

Models

We evaluated multiple machine learning models to predict credit risk. Below is a brief description of each model used in our analysis.

Modeling Approaches

Each of these models (other than Logistic Regression) improves upon regular decision trees by using "boosting" to combine multiple trees, which enhances the model's accuracy.

Model Evaluation

We used the following metrics to assess model performance:

Results

Feature Importance

Click to expand
Top SHAP Values

Figure: Top SHAP Values

SHAP (SHapley Additive exPlanations) is a method used to explain model predictions by attributing each feature's contribution to the final prediction.

In our model, SHAP identified the following features to be important in predicting credit delinquency:

  • sum_acct_balances: Higher account balances suggest lower delinquency risk.
  • HAS_SAVINGS_ACCT: Having a savings account reduces delinquency risk.
  • DEPOSIT_last_14_days_count: Recent deposits indicate financial stability.
  • OVERDRAFT: Frequent overdrafts increase delinquency risk.
  • LOAN_last_14_days_count: Recent loans may signal financial stress.

Model Performance

Results Results

The ROC curves below illustrate the trade-off between the true positive rate and the false positive rate for each model. The AUC scores indicate overall model performance, with higher values reflecting better predictive power.

Model Performance Metrics Comparison
Model ROC-AUC Accuracy Precision Recall F1-Score Training Prediction
Logistic Regression (w/o Credit Score) 0.7079 0.8445 0.2383 0.2785 0.2568 1.3368 0.4016
Logistic Regression (w/ Credit Score) 0.7241 0.8571 0.2674 0.3548 0.3050 1.7175 0.3315
LightGBM (w/o Credit Score) 0.7796 0.8991 0.3878 0.0802 0.1329 4.1249 0.0931
LightGBM (w/ Credit Score) 0.8162 0.9068 0.4167 0.1382 0.2076 3.9720 0.0859
CatBoost (w/o Credit Score) 0.7704 0.9019 0.4474 0.0717 0.1236 38.6703 0.0788
CatBoost (w/ Credit Score) 0.8260 0.9170 0.4681 0.1095 0.1774 40.9512 0.0960

Key Insights:

Futher Analysis: Confusion Matrices

Click to expand
CatBoost Confusion Matrix without Credit Score

CatBoost (w/o Credit Score)

CatBoost Confusion Matrix with Credit Score

CatBoost (w/ Credit Score)

LightGBM Confusion Matrix without Credit Score

LightGBM (w/o Credit Score)

LightGBM Confusion Matrix with Credit Score

LightGBM (w/ Credit Score)

From the confusion matrices, we observe that both CatBoost and LightGBM improve slightly with credit score inclusion. However, they remain highly conservative, predicting very few positive cases. This results in high precision but low recall as seen in the model performance table.

Note: In these models, "positive" refers to delinquent cases, while "negative" represents non-delinquent cases.

Cash Score vs. Credit Score

Delinquency Rate Heatmap

Delinquency Rate Heatmap (Cash Score vs. Credit Score)

The heatmap visually represents delinquency rates using color intensity and numerical values, where darker regions indicate higher delinquency. The bottom-left region, where scores are lowest, shows delinquency reaching 100%, while the top-right region, representing higher scores, exhibits near-zero delinquency. This highlights both cash and credit scores as strong indicators of financial risk, with higher scores consistently associated with lower delinquency rates.

Conclusion

Our research highlights that incorporating detailed bank transaction data into credit scoring models results in performance that is comparable to traditional models, all without the necessity of credit history. This approach allows for a more comprehensive and nuanced assessment of an individual’s creditworthiness, providing a more holistic view of their financial behavior. By utilizing transactional data, we aim to improve the accuracy of credit scoring, offering a more transparent and equitable evaluation process. This model addresses existing biases and limitations in traditional credit scoring, especially for individuals with limited or no credit history, such as young adults or those from underrepresented groups. Ultimately, this approach seeks to enhance fairness and inclusivity within the financial system, increasing access to credit opportunities for those who have historically been overlooked or excluded from traditional lending practices.

Next Steps

Our Team

Aman Kar

Aman Kar

akar@ucsd.edu

Daniel Mathew

Daniel Mathew

drmathew@ucsd.edu

Tracy Pham

Tracy Pham

tnp003@ucsd.edu