An optimization problem has to be solved by adjusting the threshold and seeking the optimum in order to balance the trade-off between the decrease in revenue and a decrease in cost.

March 26, 2021

If “Settled” is described as good and “Past Due” is described as negative, then utilizing the design of this confusion matrix plotted in Figure 6, the four areas are split as real Positive (TN), False Positive (FP), False bad (FN) and real Negative (TN). Aligned with all the confusion matrices plotted in Figure 5, TP is the loans that are good, and FP may be the defaults missed. We have been keen on those two areas. To normalize the values, two widely used mathematical terms are defined: real Positive Rate (TPR) and False Positive Rate (FPR). Their equations are shown below:

In this application, TPR could be the hit price of good loans, also it represents the capacity of earning funds from loan interest; FPR is the rate that is missing of, and it also represents the probability of losing profits.

Receiver Operational Characteristic (ROC) bend is one of widely used plot to visualize the performance of a category model at all thresholds. In Figure 7 left, the ROC Curve associated with Random Forest model is plotted. This plot basically shows the partnership between TPR and FPR, where one always goes in the exact same way as one other, from 0 to at least one. a great category model would will have the ROC curve over the red standard, sitting by the “random classifier”. The location Under Curve (AUC) can be a metric for assessing the category model besides precision. The AUC of this Random Forest model is 0.82 away from 1, which will be decent.

Although the ROC Curve plainly shows the partnership between TPR https://badcreditloanshelp.net/payday-loans-or/florence/ and FPR, the limit is an implicit adjustable. The optimization task cannot purely be done because of the ROC Curve. Consequently, another measurement is introduced to incorporate the limit adjustable, as plotted in Figure 7 right. Considering that the orange TPR represents the ability of creating cash and FPR represents the opportunity of losing, the intuition is to look for the limit that expands the gap between curves whenever possible. The sweet spot is around 0.7 in this case.

You will find limits to the approach: the FPR and TPR are ratios. Also though they’ve been proficient at visualizing the effect for the category limit on making the forecast, we nevertheless cannot infer the precise values for the revenue that various thresholds result in. The FPR, TPR vs Threshold approach makes the assumption that the loans are equal (loan amount, interest due, etc.), but they are actually not on the other hand. Individuals who default on loans could have an increased loan quantity and interest that require become reimbursed, also it adds uncertainties into the results that are modeling.

Luckily for us, detail by detail loan amount and interest due are offered by the dataset it self.

The one thing staying is to get ways to link all of them with the limit and model predictions. It is really not hard to determine a manifestation for revenue. These two terms can be calculated using 5 known variables as shown below in Table 2 by assuming the revenue is solely from the interest collected from the settled loans and the cost is solely from the total loan amount that customers default