Alpha-algorithm is the first in process analysis technology that allowed finding the so-called Workflow nets from process logs. The algorithm was developed in 2013 by the founder of the Process Mining methodology, Professor Will MP van der Aalst.
What is Workflow nets (hereinafter WF) is a network built on the basis of Petri nets. Importantly, WF based on Petri networks allows you to present and further analyze workflows.
Distinctive features of WF are:
- . ( , )
- ( , )
- .1. .2.
Let's look at a couple of examples:
These schemas are not WF. Why? In the first case, we do not have the beginning and end of the chain (they are indicated by a circle). In the second case, action d has no end.
Below I gave an example of a correct WF network - there is a beginning and an end, and all actions are located between them and are completed.
Having clarified a bit what WF is,
let's move on to the alpha algorithm: In order to get WF using the alpha algorithm, we need to put things in order in our log. To do this, we will define the following relations between transitions in the event log (later they will be needed to build a model):
1. Direct sequence.
Event A> Event B.
In a real event log, it would look like this:
2. Causal relationship.
Event A โ Event B. It
means that there are such transitions in the event log
But there are no such transitions :
Therefore, on the diagram we put the symbol
- Parallel events.
The log contains both transitions Event A โ Event B and Event B โ Event A. - Lack of consistency.
Event A # Event B and vice versa. These events do not appear in the log.
The common dataset of all the transitions is called the L set.
Let's look at a small example. Below is a log of three cases.
Let's write the connections from our log that are used in the alpha algorithm:
- > ,
> ,
> ,
> ,
> ,
> ,
> - โ ,
โ ,
โ ,
โ ,
โ ,
โ - ||
Based on the obtained relations, we draw WF.
The resulting model covers all actions of our log and is easy to analyze.
Limitations of the alpha algorithm.
If your log contains single or double loops (repetitions of actions), the algorithm misinterprets and may generate a model that is different from what is expected. Let's go back to our log earlier and add repetitions to it:
The expected model will look like this:
But the alpha algorithm will give us a completely different picture:
What is the reason? The action "Processing an application" has no beginning or end. In the process of generating the model, a set A (where all the beginnings are) and a set B (where the processes end) are created. Since with multiple repetitions, the data sets disappear from us, the algorithm cannot find them. Accordingly, this action falls out of the general model.
The same situation occurs with two repeating actions in a row. The Alpha algorithm will leave only one of them, and the second will drop out and we will not be able to interpret the model.
How can this problem be solved? It is necessary to take into account the features of the system that you are analyzing as much as possible. If your system writes to the log not only the main points, but also actions that are generated automatically (for example, in cases of handwriting, the system can do an autosave every 5 seconds and write it to the log), then it makes sense to combine these actions into one element.