In recent years, programs have become widespread that try to predict which objects will be of interest to the user, having certain information about his profile. Until 2006, such algorithms were not popular. But that all changed in the fall of 2006 when Netflix offered the developers $ 1,000,000 for the best prediction algorithm. The competition lasted 3 years.
Let's talk today about our experience in building a recommendation system in personnel training.
.
?
– IT- . , . . , , .
. . , .
1. Content-based filtering ( )
, . , .
2. Collaborative filtering ( )
, .
3. ,
– . , .
?
. Users. , , Users.
. ( ). , , . …
features ( ) Users.
Users :
/ ( );
/ ;
;
(, Data Analist, Data Engineer, Data Scientist);
( );
( ).
Users .
MVP , . . Users :
(-1, +2);
– ;
– ;
– Data Scientist;
– 5 ( 20 65);
- 5 .
Users – 3 .
– 6 ( 2 User).
– Python.
(DataSet), , , Users.
User 3 Users .
# DataSet
for row in df:
corrMatr = df.corrwith(df[row]) #
corrMatr = pd.DataFrame(corrMatr)
tempMatr = corrMatr #
tempMatr = tempMatr.drop([row], axis=0)
li = list()
li2 = list()
print(row)
k = 0
while k < 6:
if len(tempMatr) == 0: # tempMatr 0, while
break
name = tempMatr.idxmax().item() #
dp = df3[df3['Tab'] == name].set_index('Tab') # ,
# Tab name
if name not in li2 and ((df[name]['pos'] <= df[row]['pos'] + 2 and df[name]['pos'] >= df[row]['pos'])):
#
li2.append(name)
col_dp = dp.columns.tolist() # DataFrame
random.shuffle(col_dp) #
for yy in col_dp: #
if pd.DataFrame(df3[df3['Tab'] == name][yy]).reset_index()[yy][0] == 1 and \
pd.DataFrame(df3[df3['Tab'] == row][yy]).reset_index()[yy][0] == 0 and \
yy not in li and yy in df777[''].tolist():
#
recList.append([row, name, yy,
pd.DataFrame(df4[df4['Tab'] == row]['TB']).reset_index()['TB'][0], \
pd.DataFrame(df4[df4['Tab'] == name]['TB']).reset_index()['TB'][0], \
pd.DataFrame(df4[df4['Tab'] == row]['FIO']).reset_index()['FIO'][0], \
pd.DataFrame(df4[df4['Tab'] == name]['FIO']).reset_index()['FIO'][0]])
k += 1
li.append(yy)
# tempMatr
tempMatr = tempMatr.drop([tempMatr.idxmax().item()], axis=0)
break # for
else: # tempMatr
tempMatr = tempMatr.drop([tempMatr.idxmax().item()], axis=0)
# DataFrame Excel
recomendations = recomendations.append(recList, ignore_index=True)
recomendations.to_excel('.xlsx')
.
. :
(, );
.
.
This recommendation algorithm was implemented in a pilot mode (during one quarter). The created MVP has reached the target conversion rate of 25% set by the management, which allows us to recognize it as successful and ready for implementation in the industry.