Introduction

Credit scores are pivotal in today’s financial landscape, influencing everything from rental eligibility to access to health insurance, yet the formula for calculating creditworthiness has long been shrouded in mystery and often overlooks important nuances. Typically, the credit score is determined based on five factors: payment history, amount owed, new credit, credit history, and credit mix. This structure can place individuals with limited credit history, especially young adults who are just starting out building their credit, at a compounded disadvantage, restricting their access to loans, credit cards, employment opportunities, and insurance. Our proposed Cash Score model aims to address these limitations by providing a more comprehensive measure of creditworthiness. The Cash Score model leverages detailed account transaction data to predicts the probability of defaulting on a loan (as known as loan delinquency). This approach highlights the potential for transaction-based credit evaluation to more accurately assess financial risk and improve access to credit, offering a fairer alternative to traditional credit scoring methods.

Methods
Data Description
Our analysis leverages four key datasets that provide insights into consumer accounts, transaction histories, and credit scores. As the datasets were prepared and preprocessed by Prism Data, this minimized the need for extensive data cleaning. Our primary focus in terms of data cleaning was reviewing the data for consistency, addressing any remaining missing values, standardizing categorical variables, and structuring time-series data to optimize it for modeling.
Below are the heads of the datasets used in this analysis. Click on each section to expand and view the data:
prism_consumer_id | prism_account_id | account_type | balance_date | balance |
---|---|---|---|---|
3,023 | 0 | SAVINGS | 2021-08-31 | 90.57 |
3,023 | 1 | CHECKING | 2021-08-31 | 225.95 |
4,416 | 2 | SAVINGS | 2022-03-31 | 15,157.17 |
4,416 | 3 | CHECKING | 2022-03-31 | 66.42 |
4,227 | 4 | CHECKING | 2021-07-31 | 7,042.90 |
prism_consumer_id | evaluation_date | credit_score | DQ_TARGET |
---|---|---|---|
0 | 2021-09-01 | 726 | 0 |
1 | 2021-07-01 | 626 | 0 |
2 | 2021-05-01 | 680 | 0 |
3 | 2021-03-01 | 734 | 0 |
4 | 2021-10-01 | 676 | 0 |
prism_consumer_id | prism_transaction_id | category | amount | credit_or_debit | posted_date |
---|---|---|---|---|---|
3,023 | 0 | 4 | 0.05 | CREDIT | 2021-04-16 |
3,023 | 1 | 12 | 481.56 | CREDIT | 2021-04-30 |
3,023 | 2 | 4 | 0.05 | CREDIT | 2021-05-16 |
3,023 | 3 | 4 | 0.07 | CREDIT | 2021-06-16 |
3,023 | 4 | 4 | 0.06 | CREDIT | 2021-07-16 |
category_id | category |
---|---|
0 | SELF_TRANSFER |
1 | EXTERNAL_TRANSFER |
2 | DEPOSIT |
3 | PAYCHECK |
4 | MISCELLANEOUS |
It is important to note that, in compliance with the Equal Credit Opportunity Act (ECOA), we excluded specific transaction categories that could introduce bias in credit decision-making. Categories related to child dependents, healthcare and medical expenses, unemployment benefits, education, and pensions have been removed to ensure that our model does not unintentionally discriminate based on protected attributes.
Exploratory Data Analysis
Through exploratory data analysis (EDA), we examined consumer transaction trends and spending patterns to uncover insights that aid in identifying key factors for predicting credit risk. Below are a few examples of EDA conducted to look at temporal trends, transaction frequency, spending categories, and the impact of specific financial behaviors.

Comparing bank balances over time between a randomly selected delinquent and non-delinquent consumer reveals distinct financial patterns. The delinquent consumer’s balance remains mostly stagnant, with a single large spike that quickly drops. In contrast, the non-delinquent consumer maintains a steady, positive balance with gradual growth, indicating stable income, controlled spending, and savings. This suggests that bank balance trends can serve as a strong predictor of creditworthiness.

The normal distribution of delinquent credit scores, compared to the left-skewed distribution of non-delinquent credit scores, shows that non-delinquent individuals typically have higher credit scores, while most delinquent individuals fall within the lower middle of the credit score range. This reinforces that credit scores are already a strong indicator of delinquency. This provides a foundation for our model, allowing us to build upon the credit score feature to potentially outperform traditional models at predicting delinquency.

Identifying "Buy Now, Pay Later" (BNPL) as a risky category, we analyzed this category further. The figure reveals that a significantly higher proportion of non-delinquent consumers fall into the lowest bin for mean BNPL transactions. However, delinquent consumers tend to have higher proportions in the upper bins, indicating that they engage in larger BNPL transactions compared to non-delinquent consumers.

The plot reveals a wider range of tax transactions over the last two weeks for non-delinquent consumers, while delinquent consumers show little to no variation in their tax transaction frequency. This suggests that non-delinquent consumers are more active and consistent in handling their tax-related transactions, which could indicate better financial management and stability compared to delinquent consumers.
Feature Generation
We engineered features to capture financial behavior through transaction history, balance trends, spending patterns, and risk indicators. Our feature generation process included:
- Time Window Analysis: Transactions were analyzed across multiple time windows—14 days, 30 days, 3 months, 6 months, and 1 year—to capture short- and long-term trends.
- Aggregated Statistics: Summary statistics (minimum, maximum, mean, median, standard
deviation, sum, count, percent of transactions) are calculated on categorical and balance trends.
Category-Based Feature Generation Process This diagram showcases our process for generating category-based features. For example, one of the features created through this process is FOOD_BEVERAGES_last_14_days_mean, which represents the average transaction amount within the “Food & Beverages” category over the past 14 days. By analyzing these features, we aim to capture spending habits, identify fluctuations in financial stability, and differentiate between various financial behaviors.
- Risk Indicators: High-risk behaviors were identified through flagged transactions, such as gambling, using threshold-based indicators.
- Balance Features: Features that reflect balance fluctuations such as balance deltas, rolling averages, and recent trends were created.
- Income Features: Income-based features such as the number of income sources and income standard deviation were calculated to assess the diversity and variability of a consumer’s income.
- Standardization: Non-categorical features were standardized to ensure consistent scaling.
- Resampling: Our dataset had an imbalance, meaning one class had much more data than the other. To fix this, we used Sythethic Minority Over-Sampling Technique (SMOTE) to generate new samples for the smaller group and undersampling to reduce the larger group. This helped create a more balanced dataset, allowing the model to learn patterns more effectively without bias.
Feature Selection
The final dataset contained more than 2,000 features, with the dataframe shape being 15000 rows × 2430 columns. To refine model input, we performed feature selection using the following techniques:
- Correlation Analysis: Selected top features most correlated with delinquency using Lasso (L1) Regularization.
- Mutual Information: Identified features with the highest mutual information score for predictive power.
- Embedded Method: Utilized Random Forest to rank and select the most relevant features.
Models
We evaluated multiple machine learning models to predict credit risk. Below is a brief description of each model used in our analysis.
Modeling Approaches
- Baseline Model: Logistic Regression: A simple yet effective linear model that serves as the starting point for comparison with more advanced models.
- Histogram-based Gradient Boosting (HistGB): Speeds up training by grouping data into bins, working well for large datasets and making the model faster and more memory-efficient.
- Categorical Boosting (CatBoost): A gradient boosting method specifically designed for categorical data, preventing overfitting and building more balanced trees for better predictions.
- Light Gradient-Boosting Machine (LightGBM): Uses "leaf-wise" decision trees for faster learning and reduced memory usage, making it particularly effective for large datasets.
- Extreme Gradient Boosting (XGBoost): A popular gradient boosting method known for its speed and accuracy. It reduces errors through regularization and handles large datasets efficiently by running in parallel on multiple processors.
Each of these models (other than Logistic Regression) improves upon regular decision trees by using "boosting" to combine multiple trees, which enhances the model's accuracy.
Model Evaluation
We used the following metrics to assess model performance:
- ROC AUC: Shows how well the model can tell the difference between positive and negative outcomes. Higher values mean the model is better at making this distinction.
- Accuracy: Tells us the percentage of times the model made a correct prediction.
- Precision: Measures how many of the model's positive predictions were actually correct.
- Recall: Shows how many of the actual positive cases were correctly identified by the model.
- Confusion Matrix: A table that helps us see how many predictions were correct and how many were wrong, broken down by type of error (false positive, false negative).
Results
Feature Importance
Click to expand

Figure: Top SHAP Values
SHAP (SHapley Additive exPlanations) is a method used to explain model predictions by attributing each feature's contribution to the final prediction.
In our model, SHAP identified the following features to be important in predicting credit delinquency:
- sum_acct_balances: Higher account balances suggest lower delinquency risk.
- HAS_SAVINGS_ACCT: Having a savings account reduces delinquency risk.
- DEPOSIT_last_14_days_count: Recent deposits indicate financial stability.
- OVERDRAFT: Frequent overdrafts increase delinquency risk.
- LOAN_last_14_days_count: Recent loans may signal financial stress.
Model Performance


The ROC curves below illustrate the trade-off between the true positive rate and the false positive rate for each model. The AUC scores indicate overall model performance, with higher values reflecting better predictive power.
Model | ROC-AUC | Accuracy | Precision | Recall | F1-Score | Training | Prediction |
---|---|---|---|---|---|---|---|
Logistic Regression (w/o Credit Score) | 0.7079 | 0.8445 | 0.2383 | 0.2785 | 0.2568 | 1.3368 | 0.4016 |
Logistic Regression (w/ Credit Score) | 0.7241 | 0.8571 | 0.2674 | 0.3548 | 0.3050 | 1.7175 | 0.3315 |
LightGBM (w/o Credit Score) | 0.7796 | 0.8991 | 0.3878 | 0.0802 | 0.1329 | 4.1249 | 0.0931 |
LightGBM (w/ Credit Score) | 0.8162 | 0.9068 | 0.4167 | 0.1382 | 0.2076 | 3.9720 | 0.0859 |
CatBoost (w/o Credit Score) | 0.7704 | 0.9019 | 0.4474 | 0.0717 | 0.1236 | 38.6703 | 0.0788 |
CatBoost (w/ Credit Score) | 0.8260 | 0.9170 | 0.4681 | 0.1095 | 0.1774 | 40.9512 | 0.0960 |
Key Insights:
- Adding credit scores improves model performance, helping predict delinquency more accurately for all models.
- CatBoost (with credit score as a feature) is the most accurate model, with the highest AUC-ROC score, but struggles to detect delinquent cases and takes longer to train.
- Without credit scores, LightGBM performs best. Compared to CatBoost, the training time is also significantly lower.
Futher Analysis: Confusion Matrices
Click to expand

CatBoost (w/o Credit Score)

CatBoost (w/ Credit Score)

LightGBM (w/o Credit Score)

LightGBM (w/ Credit Score)
From the confusion matrices, we observe that both CatBoost and LightGBM improve slightly with credit score inclusion. However, they remain highly conservative, predicting very few positive cases. This results in high precision but low recall as seen in the model performance table.
Note: In these models, "positive" refers to delinquent cases, while "negative" represents non-delinquent cases.
Cash Score vs. Credit Score

Delinquency Rate Heatmap (Cash Score vs. Credit Score)
Conclusion
Our research highlights that incorporating detailed bank transaction data into credit scoring models results in performance that is comparable to traditional models, all without the necessity of credit history. This approach allows for a more comprehensive and nuanced assessment of an individual’s creditworthiness, providing a more holistic view of their financial behavior. By utilizing transactional data, we aim to improve the accuracy of credit scoring, offering a more transparent and equitable evaluation process. This model addresses existing biases and limitations in traditional credit scoring, especially for individuals with limited or no credit history, such as young adults or those from underrepresented groups. Ultimately, this approach seeks to enhance fairness and inclusivity within the financial system, increasing access to credit opportunities for those who have historically been overlooked or excluded from traditional lending practices.
Next Steps
- Feature Engineering: We aim to optimize aggregated feature metrics based on transaction categories and time windows. Additionally, we plan to implement clustering algorithms to identify and select the most relevant features for improved model performance.
- Model Refinement: We intend to explore deep learning models, incorporating extended hyperparameter tuning sessions to uncover more complex patterns in the data and improve predictive accuracy.
- Bias & Fairness: To ensure equitable credit assessments, we will evaluate the potential for biases in predictions across different demographic groups and implement fairness constraints to mitigate any identified disparities.