HMM: catching fraudulent transactions

For three years I worked in Serbia as an iOS evangelist - there were two specialized projects and one Machine Learning.





If you are interested, welcome to the world of HMM.





Formulation of the problem

Austrian bank. He has many clients, clients have an account with this bank. During the year, the client spends funds from his account. He goes to shops, extinguishes utility bills, etc. Each withdrawal of money from an account is called a transaction. A sequence of transactions is given for a certain time (say, a year). It is necessary to train the machine so that it starts checking new transactions as valid or suspicious. And issued a warning in the latter case. To solve the problem, you need to use the Hidden Markov Model.





Introduction to HMM

I get coronavirus every year for 10 days in a row. The rest of the days he is as healthy as a bull.





Let's represent this sequence of 365 characters as an array. h means healthy, l means sick.





days{365} = {hhhhhhhhhhllllllllllhhhhhhhhhhhhhhhhhhhhhhhh...hhhhh}
      
      



Question - What is the probability that I am sick today?





\ frac {10} {365}= 3 percent





, , 15 HMM. - .





- , ?





: - ?





( - 10), \ frac {9} {10}= 90 10 .





? -





\ frac {1} {355}= 0.3 99.7% .





, 10% 90% .





4 , 2 2 - ! . , 0 1, .





















0.997





0.003









0.10





0.90





, , 0.997 , 0.003 .





/? .





, .





27.10.2020 00:00 GAZPROMNEFT AZS 219    2507,43 118 753,95 28.10.2020 / 298380 
 26.10.2020 14:45 SPAR 77                319,73 121 261,38 27.10.2020 / 220146 
 26.10.2020 14:38 ATM 60006475           4800,00 121 581,11 26.10.2020 / 213074  
 25.10.2020 17:52 EUROSPAR 18            970,02 126 381,11 26.10.2020 / 259110 
 25.10.2020 00:00 Tinkoff Card2Card      20000,00 127 351,13 26.10.2020 / 253237   
 22.10.2020 14:22 SBOL  4276      7000,00 147 351,13 22.10.2020 / 276951   
 22.10.2020 12:18 STOLOVAYA              185,00 154 351,13 23.10.2020 / 279502   
 21.10.2020 16:46 MEGAFON R9290499831    500,00 154 536,13 21.10.2020 / 224592  , , .
 21.10.2020 14:17 SPAR 77                987,03 155 036,13 22.10.2020 / 219015 
 21.10.2020 13:42 PYATEROCHKA 646        289,93 156 023,16 22.10.2020 / 294539 
 21.10.2020 00:00 MEBEL                  75,00 156 313,09 22.10.2020 / 279935  
 19.10.2020 14:54 SPAR 77                552,92 132 044,80 20.10.2020 / 208987 
 19.10.2020 00:00 MOBILE FEE             60,00 132 597,72 20.10.2020 / -  
 16.10.2020 14:19 SPAR 77                579,39 132 657,72 17.10.2020 / 229627 
 12.10.2020 13:33 STOLOVAYA              185,00 133 237,11 13.10.2020 / 261374   
 12.10.2020 00:00 OOO MASTERHOST         1000,00 133 422,11 13.10.2020 / 268065  
 11.10.2020 12:09 SPAR 77                782,87 134 422,11 12.10.2020 / 275816 
 10.10.2020 14:52 SBOL            400,00 135 204,98 10.10.2020 / 276925   
 09.10.2020 13:29 SBOL  5484*     1000,00 135 604,98 09.10.2020 / 229184   
 09.10.2020 11:55 MAGNIT MK KRYUCHYA     274,00 136 604,98 10.10.2020 / 209914 

      
      



,





def readtrans():
    with open ("assets/trans.txt", "r") as file:
        grades = file.read()
    pattern = '(\d{2,5}),\d\d'
    result = re.findall(pattern, grades)
    r = list(map(int, result[0::2]))
    return r

data = readtrans()
t = list(range(len(data)))
df = pd.DataFrame({'number':t, 'amount':data})
ax1 = df.plot.bar(x='number', y='amount', rot=0, width=1.5)


      
      



- ( 10$) l, 100$ h, - m.









print(observations[:20])
trans[] = ['m', 'm', 'm', 'l', 'm', 'm', 'h', 'm', 'l', 'l', 'm', 'l', 'l', 'l', 'l', 'l', 'l', 'm', 'l', 'l']
      
      



. 3 3, 3 = {l,m,h}





[[0.5 0.3 0.2]
 [0.6 0.3 0.1]
 [0.7 0.3 0.0]]
      
      



- , 0.7 , 0.3 - .





, . - . - .





- ?! - . , . (), , . .





, , . - , , , , ...





, . , ?! . , 4-6 . . . -. . , 300 .





, 5 5 ( 5 5) 20 .





[[a1 a2 a3 a4 a5]
 [b1 b2 b3 b4 b5]
 [c1 c2 c3 c4 c5]
 [x1 x2 x3 x4 x5]
 [y1 y2 y3 y4 y5]]
      
      



20, 25 ( ). , , 5 .





( ) 5 3.





? , a ( )





l-, m-, h-.





[0.96 0.04 0.0]
      
      



100 . .





, , 20 10 .





20+10 , !





!





, .





hmm, - , . , 15-20 , .





.





.





Accord C#





using Accord.MachineLearning;
using Accord.Statistics.Models.Markov;
using Accord.Statistics.Models.Markov.Learning;
using Accord.Statistics.Models.Markov.Topology;
using Comtrade.FMS.Common;

      
      



, ( ) . -. , run- )) . 2010 .





I will give one line of code in which the learning method is encrypted.





var teacher = new BaumWelchLearning (hmm)





You will understand the details of the Baum-Welch method by reading the relevant literature and tuning your brains to a stat. processes.





I wish you success and a good career in banking IT structures!








All Articles