Clustering traces for assessing the quality of processes

There are cases in the analysis of processes when there is not a lot of data, and the actions in the processes are chaotic. And what to do? Analyze, of course. To do this, we will use the familiar tools: python and excel. And sometimes Google.





Always look at the data with your eyes before touching your laptop. What we have: the original set consists of 1,000,000+ rows and 19 columns. Impressive. We clean and extract the required data. After applying some filters, there are about 36,000 lines left that we need. The difference is huge! From the rest of the set, select the columns 'case_id', 'activity', 'timestamp'.





. 80% 1. . , , . : , , . .





. , Inductive miner (pm4py):









. , 80% 1 .





, , . , . .





, .





unique_actions = df.pivot_table(index='case:concept:name', columns='concept:name', aggfunc='size', fill_value=0)

actions_sum = unique_actions.sum(axis=1)
unique_actions['sum'] = actions_sum

unique_actions.sort_values(by='sum', ascending=False)
      
      



, , .





median = unique_actions.median(axis=1)
mean = unique_actions.mean(axis=1)

unique_actions['mean'] = mean
unique_actions['median'] = median

case_durations = case_statistics.get_all_casedurations(event_log, 
                                                                       parameters={case_statistics.Parameters.TIMESTAMP_KEY: 'time:timestamp'})
durations = pd.DataFrame(case_durations)
      
      



3 : PCA, tSNE, DBSCAN. , .





PCA





, . , ( ), .





, , .





, : , 90%.





2





StandardScaler .





:













, . . PCA . . , tSNE.tSNE





. , , PCA.





:





PCA .





, โ€“ (, 80%). , , ยซยป .





, . . . . , 2, 3 ( ).





DBSCAN





, . .





. , , 25 000 . tSNE, DBSCAN 1 . 2000 (, ยซ-1ยป). .





:





, 3 , . , . . , , , .





, :





โ€”





โ€” : , ( 1-5- ).





. , . , โ€“ . , . , . , - . , , ? PCA tSNE โ€“ , : , , . ยซ ยป. ( ) .








All Articles