Sort by inserts

Hello. Today we continue the series of articles that I wrote specifically for the launch of the course "Algorithms and Data Structures" from OTUS.








Introduction



Sorting an array is one of the first serious problems studied in the classic course "Algorithms and Data Structures" of the computer science discipline. In this regard, the tasks of writing sorts and the corresponding questions are often encountered in interviews as an intern or junior developer.



Formulation of the problem



Traditionally, it is worth starting the presentation of solutions to the problem with its statement. Usually the sorting task involves ordering some array of integers in ascending order. But in fact, this is somewhat oversimplification. The algorithms described in this section can be used to order an array of any objects between which an ordering relationship is established (that is, for any two elements we can say: the first is greater than the second, the second is greater than the first, or they are equal). You can sort both in ascending and descending order. We'll use the standard simplification.



Sort by inserts



Last time we talked about one of the simplest sorts - selection sort . Today we will focus on a slightly more complex algorithm - insertion sort.



Algorithm Description



Sorting an array by inserts is carried out in the following way: just as in the case of selection sort, the array is divided into two parts. One part is called sorted and the other unsorted. The algorithm assumes traversing the entire array so that the length of the sorted part becomes equal to the length of the entire array. Within each iteration, we take the first element of the unsorted part of the array and perform the following operation with it: while our element is strictly less than the previous one, we swap them. Then we increase the length of the sorted part of the array by one. Thus, by successively moving the element under study, we achieve that it falls into place. An example of one iteration is presented below:

1 3 5 | 2 9 6 -> 1 3 2 5 9 6 -> 1 2 3 5 9 6 -> 1 2 3 5 | 9 6



Implementation



I suggest looking at the implementation of this algorithm in C:



void insertionSortFunction(double array[], int size) {
    int i, j, temp;
    // i     ,   1,         
    for (i = 1; i < size; i++) {
        temp = array[i];
        for (j = i - 1; j >= 0; j--) {
            if (array[j] < temp) {
                break;
            }
  
            array[j + 1] = array[j];
            array[j] = temp;
        }
    }
}




Analysis



I propose to analyze this algorithm.



The easiest way to start the analysis is to get the asymptotics of memory. Regardless of the length and structure of the array proposed for sorting, memory is allocated only for two loop counters and one auxiliary variable used to exchange two variable values. Thus, it is always true:

M(n)=O(1)

...



Over time, everything is somewhat more interesting. The body of the inner loop itself takes O (1), that is, it does not depend on the size of the array being sorted. This means that to understand the asymptotics of the algorithm, it is necessary to count how many times this body is executed. But the number of iterations of the inner loop depends on how well ordered (or unordered) the elements of the array being sorted. Several cases need to be looked at for analysis.



The minimum number of iterations is reached if the array to be sorted is already sorted. Indeed, for each iteration of the outer for loop, exactly one iteration of the inner loop occurs. This is the so-called best case .

T(n)=(n1)O(1)=O(n)

Thus, sorting is done in linear time.



In the worst case, the number of iterations is assumed to be the largest, that is, break never fires. At the first iteration of the outer loop, one iteration of the inner loop is performed. At the second iteration of the outer loop, 2 iterations of the inner loop are performed. Continuing the reasoning further, we can come to the conclusion that at the last (n - 1) - th) iteration of the outer loop (n - 1) iteration of the inner loop will be executed. We get:

T(n)=O(1)+2O(1)+3O(1)+...+(n1)O(1)=O(1+2+3+...+(n1))=O(n(n1)/2)=O(n2)

To carry out the calculations, we used the properties of O - notation and the formula for the sum of an arithmetic progression.



In the average case, it is assumed that the number of iterations of the inner loop for a particular iteration of the outer loop is equal to its average value, that is, the mathematical expectation. Suppose that all the allowed numbers of the inner loop firing are equally probable. In this case, the average number of iterations of the inner loop is image. I is assumed to be the iteration number of the outer loop. Now, to calculate the asymptotics, you need to calculate image. That is, we just count how many times the body of the inner loop is executed. Thus, we get image.



To summarize, the memory asymptotics of the algorithm is

O(1)

in time at best

O(n)

and on average and worst cases

O(n2)

Therefore, this sorting belongs to the class of quadratic sorts .



It is also important to note that selection sorting is robust in this implementation . Let me remind you that sorting is called stable if the order of the equal elements does not change during its execution. This property is not very important for such an educational task as sorting an array of numbers, but if we were sorting some more complex objects with an established ordering relation, it might be important. We can consider a similar example sometime the next time we talk about radix sorting.



Outcome



We looked at another quadratic sort: insertion sort, looked at its robust implementation. Sorting is mostly educational, although in practice it can be applied due to good asymptotics at best: if you need to add new data to a large enough ordered amount of data so that all data is ordered again, the inner for loop can come in handy. So it can be supported for

O(n)

orderliness of data volume.






Learn more about the course.







All Articles