Product analytics in practice with R

Problematic

The modern product management meta is about data driven management.





Everyone wants an analytical approach and the ability to make decisions on the heap of product data that is available. But in reality, there is a lack of information on exactly how to do this. What tools to use, how to make decisions, and how to explore the data. I would like to share exactly the practical aspect of this issue in the format of one case.





Introductory part

The beginning of 2020, you are an ordinary product manager who was offered to develop a loan product in another country. The offer has been accepted, the documents have been completed, it's time to get to work. 





The first thing that comes to mind is to see what is going on with the economy of the product. And how the product generally behaves.





After another couple of days, a meeting where you are asked to answer a few questions:





  1. Now we pay for attracting a client 695494. What is the acceptable cost of attracting for us? Does it make sense to increase the cost per customer to get more volume?





  2. How healthy is the portfolio economy and what dynamics are here and now?





  3. We recently changed the approach to the size of the issue and began to issue smaller checks in the first loans. How did this affect the product?





In general, clear questions that any product owner should be able to answer. 





. , , : LTV, CAC . , , , .





. ( ), ( ) ( ).





, , . .





- , .





. , , . .





, , : R ,Rstudio,dbeaver( ). , .





, .   select * from transactions t .





Rows: 2,226,532
Columns: 10
$ borrower_id          2, 2, 2, 6, 6, 12, 12, 12, 12, 16, 20, 20, 20, 20, 22, 23, 23, 33, 33, 39, 39, 36, 36
$ con_id               1, 1, 1, 2, 2, 4, 4, 4, 4, 5, 7, 7, 7, 7, 8, 9, 9, 10, 10, 11, 11, 12, 12, 12, 12, 13
$ disbursement_date    2017-11-23, 2017-11-23, 2017-11-23, 2017-11-24, 2017-11-24, 2017-11-27, 2017-11-27, …
$ prolongations_count  1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1
$ loan_type            "pdl", "pdl", "pdl", "pdl", "pdl", "pdl", "pdl", "pdl", "pdl", "pdl", "pdl", "pdl", "…
$ date                 2017-12-15, 2017-11-23, 2017-12-06, 2017-11-24, 2017-12-25, 2017-12-29, 2017-11-27, …
$ type                 "Payments::Transaction::ContractAddTransaction", "Payments::Transaction::DisburseTran…
$ amount               250000, 1000000, 1200000, 2500000, 3500000, 1040000, 1500000, 10000, 1470000, 1500000
$ id                   325, 2, 127, 5, 587, 557500, 557499, 557504, 557507, 17, 182865, 182874, 182869, 1828
$ deleted_at           NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
      
      



. ( , - ).





, , . (disbursment_date). (prolongations_count) - . . "" .





- (ContractAdd- , DisburseTransaction - ).





. ( ): , -> -> -> .





: .





, .





( , 0.00003 ).





  • : , , , , .





  • R, , 2 . data table. tidyverse( )





.





. ( z_type am):





lk %>% data.table()->lk1 #    data table

lk1[,':='
(z_type=z_type<-fifelse(
#     -     
 type=='Payments::Transaction::ContractAddTransaction','add','disb'),
am=amount*fifelse(z_type=='add',1,-1))][1:20,c(-1,-11,-8,-6)] #  
lk1[disbursement_date<'2020-01-01' & date<='2020-03-01',c(-1,-11,-8,-6)]->lk1
glimpse(lk1) #   
      
      



$ borrower_id          2, 2, 2, 6, 6, 12, 12, 12, 12, 16, 20, 20, 20, 20, 22, 23, 23, 33, 33, 39, 39, 36, 36$ con_id               1, 1, 1, 2, 2, 4, 4, 4, 4, 5, 7, 7, 7, 7, 8, 9, 9, 10, 10, 11, 11, 12, 12, 12, 12, 13$ disbursement_date    2017-11-23, 2017-11-23, 2017-11-23, 2017-11-24, 2017-11-24, 2017-11-27, 2017-11-27, …
$ prolongations_count  1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1$ date                 2017-12-15, 2017-11-23, 2017-12-06, 2017-11-24, 2017-12-25, 2017-12-29, 2017-11-27, …
$ amount               250000, 1000000, 1200000, 2500000, 3500000, 1040000, 1500000, 10000, 1470000, 1500000$ id                   325, 2, 127, 5, 587, 557500, 557499, 557504, 557507, 17, 182865, 182874, 182869, 1828$ z_type               "add", "disb", "add", "disb", "add", "add", "disb", "add", "add", "disb", "disb", "ad…
$ am                   250000, -1000000, 1200000, -2500000, 3500000, 1040000, -1500000, 10000, 1470000, -150…
      
      



, .





. , :





lk1[,.(total_in_mln=sum(am*for_ex)/1e6),.(z_type)]
      
      



total_in_mln





add





58.33176





disb





-45.49114





: (add) (disb). .





, :





lk1[!is.na(disbursement_date)&z_type=='disb',
.(sum=sum(am*for_ex*-1,na.rm = T)/1e6),
.(date=floor_date(disbursement_date,'month',))][,
ggplot(.SD,aes(date,sum,label=round(sum,2)))+
geom_col(fill=polar_night[2])+ff+tt+
  geom_text(aes(y=sum+0.1),col=aurora[1])+
  labs(x=' ',y='   ',
  title='   ')]
      
      



, . .





, .





- , . 





. , ( ) . 





, , :





lk1[][,':='( min_date=min(disbursement_date)),.(borrower_id)][,#     
c("gen",'dif'):=.(floor_date(min_date,'1 month'),as.numeric(date-min_date))][, #               
n:=uniqueN(borrower_id),][, #  
  .(sum=sum(am*for_ex),n=unique(n)),
  .(dif)
][order(dif)][,":="(bal=bal<-sum/n,cum=cumsum(bal))][,
    ggplot(.SD,aes(dif,cum))+ # 
   geom_line()+
   tt+ff+
        labs(x='  ',y='  USD',col='',
        title='LTV      ')+
   scale_y_continuous(breaks = seq(-200,200,20),
   labels =paste0('$',seq(-200,200,20),'k' ))+
   scale_x_continuous(breaks = seq(0,1000,20))  ]
      
      



LTV c . .





, , , . , , .





, 24 . ( ):





lk1[,':='( min_date=min(disbursement_date)),.(borrower_id)][,
c("gen",'dif'):=.(floor_date(min_date,'1 month'),as.numeric(date-min_date))][,
n:=uniqueN(borrower_id),.(gen)][,
  .(sum=sum(am*0.00003),n=unique(n)),
  .(gen,dif)#     
][order(gen,dif)][,
":="(bal=bal<-sum/n)][,':='(cum=cumsum(bal),m_dif=max(dif)),
.(gen)][dif<=m_dif-30 & gen %between% c('2018-01-01','2021-03-31')][,
 ggplot(.SD,aes(dif,cum,col=factor(gen)))+
 geom_line()+
 facet_wrap(~factor(year(gen),levels = c(2019,2018)),nrow=2)+tt+ff+
 labs(x='  ',y='  USD',col='',
 title = 'LTV      ')+
 scale_y_continuous(breaks = seq(-200,200,20),
 labels =paste0('$',seq(-200,200,20),'k' ))+
 scale_x_continuous(breaks = seq(0,1000,20))+
 geom_hline(yintercept = 695494*for_ex,color='red',size=1)+
 geom_hline(yintercept = 0,color='dark red',linetype='dashed')+
 geom_text(inherit.aes = F,aes(x=as.Date(600),
 y=695494*for_ex+3,group=1),label='   = $20.86',
 col='red',size=6)+tt+ff+
 theme(legend.text = element_text(size=20),
       legend.title = element_text(size=25))+
  guides(colour = guide_legend(override.aes = list(size=10)))]

      
      



- ( ), :





  1. , - .





  2. , .





  3. 0, .





  4. , , .









  1. . , (150 180).





  2. . , 2018 , . “” 10-40 . 2019 : .





  3. 40 60 . . , , 120-150 3





  4. 90 , .





,





1. 695494. ? , ?





- , - . - 40-60 . $20.8 (695494* 0.00003)





- , . - . 





- - . 





2. ?





, . - 100+ . 





.





3. . ? 





- . - .





The product manager's job is to make decisions. 2.2 million lines talked about what was happening. Such analysis takes from 30 minutes to a couple of hours, depending on the knowledge of the subject area and the dirt in the data. This analysis requires nothing but raw data and open source software.





Several dozen lines of code, a little common sense and the economics of the product is clear, calculated and conclusions are drawn.





From the same data, several more important assessments and conclusions are easily collected, but about them in an arc time.








All Articles