About cats and process mining





“Will the cat survive in my house? I wondered before picking up my furry friend from the pet shelter. And I decided to test my hypothesis by means of Process Mining, this relatively new, but actively developing direction of process analysis. Among the software products in this area, there are a sufficient number of alternatives, in particular: Celonis, Disco, ProM, Apramore. I decided to try the Python language library - PM4PY (Process Mining for Python), the first version of which appeared on GitHuba little over a year ago, at the end of 2018. Its peculiarity is that it is free software, and it has no restrictions both in terms of the amount of downloaded files and the number of events considered in the log (event log). PM4PY also has extensive documentation describing the basic functionality - code examples and reference information can be found on the official site - pm4py.org .



First of all - ... no, not airplanes, but data! PM4PY supports multiple input formats. Among them: CSV (comma delimited), XES (eXtensible Event Stream) and Parquet. The simplest one, both in understanding and in terms of available functionality, is the CSV format.







It was in this format that the data on the morning routine was presented, used later in the considered example of the process. In a series of such events in the morning as "getting up", "breakfast", "brushing your teeth", etc. a significant new step has been added as "feed the cat". This example was artificially generated in a MS Excel table and then saved to CSV (the number of simulation days is 39, and the total number of events is 250). Data import occurs in two lines of code. In the first, the so-called import factory (the corresponding Python class of this library) is loaded, and in the second, the name of the data file is fed to the input of this “factory” and assigned to the desired variable.



A log loaded from the outside (in one format or another) can be fed to the input of "Miners" - algorithms that analyze the event log loaded into memory and try to build an assumed model of the process in the form of a Petri net using these input data. Examples of algorithms available in PM4PY: Alpha, IMDFb, Heuristic. But, if you remember, I was not interested in the academically rigorous theoretical graph of the process model, but in the purely practical question of the cat's survival.



And therefore, we are moving on to a much more interesting practical analysis of the process in the form of DFG graphs (Directly-Follows Graph), where the vertices of the graph are the events of the event log we loaded, and the directed edges connect pairs of events that happened one after the other at least once ... The advantage of this view is the detailed display of all possible transitions. The downside is the excessive clutter of the picture with connecting lines, the number of which sharply increases with the increase in the number of actions in the log and the multivariance of the actual steps that occurred in the observed process.











It can be seen that the already difficult process of the morning routine with the addition of one additional step with feeding the pet has become even more difficult. DFG can be plotted in terms of frequency, i.e. how many transitions were from one vertex to another. And you can build in the context of efficiency in time, choosing as an indicator the average value of the time between events. You can also choose the minimum value, maximum or median as an indicator.



To be able to narrow the area of ​​data under consideration, PM4PY provides the ability to work with filters (you can set filters by columns similar to working in the pandas library) and with options (typical sequences of steps) of the log. For example, in the figures above, DFGs were displayed with all options, without restrictions. But you can choose, for example, the 3 most frequent sequences of steps, and then the picture will be much simpler.



It should be remembered that the simplicity in the picture on the right appeared due to the discarding of more rare options, which contain either atypical but not interesting to us, but quite acceptable options, and important deviations from the standard process that are very interesting to us.







As a result, we see that, even in the most typical sequences of events, the step "feed the cat" happened only in 4 out of 8 cases, and in half of the cases (4) this action was skipped and there was no return to it. Those. in this case, it is better not to rush for now and, without taking on additional responsibility, to work on your discipline and commitment, not reassuring the animal with care, which in fact will not be.



All Articles