Assessment of the structure of the loan portfolio on R

During the discussions, a "small" task arose - to build the dynamics of the structure of the loan portfolio (dynamics of a credit card, for example). There is an important specificity - it is necessary to apply the FIFO method to repay loans. Those. when repaying, the earliest loans must be repaid first. This imposes certain requirements on calculating the status of each individual loan and determining its maturity date.







Consider it as an Olympiad problem. No “ bloody energy prizes ” and code pedaling, the approach is exclusively “ think first ”. No more than one screen of code per prototype and no loops (embedded for performance and readability). Below is the R code with a prototype approach.







It is a continuation of a series of previous publications .







Decomposition



Since we do everything from scratch, we divide the task into three steps:







  1. Formation of test data.
  2. Calculation of the maturity date of each loan.
  3. Calculation and visualization of dynamics for a given time window.


Assumptions and provisions for the prototype:







  1. Granularity up to date. Only one transaction on one date. If there are several transactions in one day, then their order will need to be established (to comply with the FIFO principle). You can use add. indexes, you can use unixtimestamp, you can come up with something else. This is irrelevant for the prototype.
  2. There for



    shouldn't be any explicit loops . There should be no unnecessary copies. Focus on minimum memory consumption and maximum performance.
  3. We will consider the following groups of delays: "<0", "0-30", "31-60", "61-90", "90+".


Step 1. Generating the dataset



Just a test dataset, all matches are random. For each user, we will generate ~ 10 records. For calculations, we assume that the loan is a positive value, the repayment is negative. And the entire life cycle for each user should start with a loan.







Dataset generation
library(tidyverse)
library(lubridate)
library(magrittr)
library(tictoc)
library(data.table)

total_users <- 100

events_dt <- tibble(
  date = sample(
    seq.Date(as.Date("2021-01-01"), as.Date("2021-04-30"), by = "1 day"),
    total_users * 10,
    replace = TRUE)
  ) %>%
  #    50 .
  mutate(amount = (runif(n(), -2000, 1000)) %/% 50 * 50) %>%
  #   
  mutate(user_id = sample(!!total_users, n(), replace = TRUE)) %>%
  setDT(key = "date") %>%
  #     
  .[.[, .I[1L], by = user_id]$V1, amount := abs(amount)] %>%
  #        , 
  #          
  #       
  unique(by = c("user_id", "date"))
      
      





Step 2. Calculate the maturity date of each loan



data.table



allows you to change objects by reference even inside functions, we will actively use this.







Maturity date calculation
#  
accu_dt <- events_dt[amount < 0, .(accu = cumsum(amount), date), by = user_id]

ff <- function(dt){
  #           
  #   
  accu_dt[dt, amount := i.amount, on = "user_id"]
  accu_dt[is.na(amount) == FALSE, accu := accu + amount][accu > 0, accu := NA, by = user_id]
  calc_dt <- accu_dt[!is.na(accu), head(date, 1), by = user_id]

  #     data.frame,   
  calc_dt[dt, on = "user_id"]$V1
}

repay_dt <- events_dt[amount > 0] %>%
  .[, repayment_date := ff(.SD), by = date] %>%
  .[order(user_id, date)]
      
      





Step 3. Calculation of the dynamics of the structure for the period



Dynamics calculation
calcDebt <- function(report_date){
  as_tibble(repay_dt) %>%
    #  ,      
    filter(is.na(repayment_date) | repayment_date > !! report_date) %>%
    mutate(delay = as.numeric(!!report_date - date)) %>%
    #  
    mutate(tag = santoku::chop(delay, breaks = c(0, 31, 61, 90),
                               labels = c("< 0", "0-30", "31-60", "61-90", "90+"),
                               extend = TRUE, drop = FALSE)) %>%
    #  
    group_by(tag) %>%
    summarise(amount = sum(amount)) %>%
    mutate_at("tag", as.character)
}

#   
df <- seq.Date(as.Date("2021-04-01"), as.Date("2021-04-30"), by = "1 day") %>%
  tibble(date = ., tbl = purrr::map(., calcDebt)) %>%
  unnest(tbl)

#  
ggplot(df, aes(date, amount, colour = tag)) +
  geom_point(alpha = 0.5, size = 3) +
  geom_line() +
  ggthemes::scale_colour_tableau("Tableau 10") +
  theme_minimal()
      
      





We can get something like this.







One screen of code, as required.







Previous post - "Storytelling R Report vs. BI, A Pragmatic Approach . "








All Articles