Lexorangs - what they are and how to use them to effectively sort lists

In this article, I will explain what Lexorangs are, how they are used in Jira, and how we used them to effectively sort lists and drag and drop items in our mobile app.







Dragging and dropping items in the list is a popular feature in modern applications, the presence of which will only delight users. However, when implementing such functionality, you need to try not to step on the rake of bad optimization: a large number of elements, recalculation of the position every time, and if there are several sections in the list, then when dragging between sections, most likely, you need to implement additional logic. How not to get hit in the forehead, reduce the number of calculations, and how lexorangs will help us with this - read under the cut.



Let's define the problem



So, you've decided to add drag and drop functionality to your application. So, you need to sort the elements somehow, otherwise there is no point in dragging and dropping. And what's the first thing that comes to mind?



Positions



The most common unremarkable positions. Those same numbers from 1 to infinity (not really). It is easy and convenient to work with them, the elements are sorted without problems. At first glance, everything is fine. So good that for most applications, this is what you need.



What, then, is wrong with the numeric position?



The problem lies in the associated operations. Need to inject an element between the second and third elements? We shift everything forward by one, starting from the third element, while not forgetting to update the data in the database. Performing such an operation once does not look difficult, but this operation will be performed quite often.



Another problematic operation is updating data on the server. Updated the task - you need to send an update of all affected tasks to the server. The server, in turn, should send this update to everyone who is “subscribed” to the task list. The more often users change the order of tasks in the list, the more data must be sent to the server, and the more data the server must send to clients.



It turns out that when dragging one task, we will not only change the positions of a large number of elements, but also send them to the server, which will then send them to other users.



Conclusion: I want something more optimal



Solution options



When we in the company faced a similar problem, the first possible solution was the following algorithm: we will place all elements with some standard positions at equal intervals (steps). So, the first element will have a position of 1, and the second one - 1000. When the user wants to drag something between these two elements, we calculate the average position - (1000 + 1) / 2 = ~ 500. And so on and so forth.



Why this option is bad, I think you guessed right away. We are limited in the number of steps that can be taken. Those. between 1 and 1000 - 500. Between 1 and 500 - 250. Then 125 ... and eventually there will be no room. Increasing the step does not solve this problem.



Can we use fractional numbers?



No, fractional numbers do not fix the problem, but only delay the moment of its appearance.



After a little thought and googling, we came across a report on how lexorangs are used in Jira (Lexorank, report ).

They are based on three things:



1 - it is easy to sort the strings in alphabetical order

2 - you can find the middle string between two strings (not always, and this is not so easy anymore)

3 - can't you find the middle one? Let's use a bucket (sounds strange, yes)



With sorting everything is clear, we go directly to point number 2.



There are letters in the English alphabet in "a" and "c", and between them, obviously, "b". But how to find this "b" mathematically?



Let's just subtract from the code of the letter "c" the code of the letter "a", we get 2 ("c" = 143, "a" = 141). It remains to divide the result in half. Got 1. Indeed, if we add one to the code "a", we get the code of the letter "b".



A combination of English letters is called a lexorang.



Situations when there is no space between two lines, there is also a place to be, and I already wrote that buckets are used to solve them.



The bucket is the label before the rank , it looks like this: "0 | aaa". Here 0 is the bucket number. When there is no room left, the elements are moved from one bucket to another, and the labels are re-arranged in order. That's all the magic!



How we took advantage of it

It is not said exactly (rather, we just did not find) how easy and painless to find the middle line between the two. So we tensed and came up with this. Let's immediately plunge into an example.



Let's take two lines: "aa" and "cc" and find the middle line between them.



After subtraction by symbol, as above, we get the number 11. But what to do next? You might think you just need to add the result to the "aa" line. Here the string "bb" will really turn out, located between "aa" and "cc", but the algorithm will be incorrect, and now we will see why.



Let's think about what it looks like? "Aa", "cc", "11". Some kind of number system. On what? And on the 26th! Why? Because the English alphabet has 26 letters. So that's it.

It is necessary to translate the result, "11", from the 26-ary number system into the usual 10-ary number system.



The formula is quite simple:



X = y 0 + y 1 * size + y 2 * size ^ 2 ... y n * size ^ (n-1)



Here size is the "size" of the number system (in this case, size = 26)

y n - n-th number on the right



Let us remember this formula as, for example, formula 1 , it will still be useful to us.



We substitute our numbers and this is what we get: 2 + 2 * 26 = 54. Now we know how many characters are between the lines "aa" and "cc". But we need to take the average between the two. We divide 54 by 2, we get 27. It remains only to correctly add our result to the “aa” codes.

How to do it? First, we find out how much to add to the first (right) character. To do this, we get the remainder of dividing 27 by 26. We get 1. Add 1 to "a" - we get the letter "b"



Now we need to think about how to find out how many characters to shift the second character.

Here the following formula will help us:



X = Y / size ^ (n-1) % size



By this formula, we can find out how much needs to be added to a certain place (character, specified using n).



Substituting everything there, we get (n = 2): (27/26)% 26 = 1. Add. We get the final result "bb".



It is not so difficult to implement an algorithm in any programming language when you know exactly how it works. Below I have added the implementation of the algorithm in Dart (the application in which this problem occurred is written in Flutter).



Our implementation of finding the 'middle' row
String getRankBetween({@required String firstRank, @required String secondRank}) {
  assert(firstRank.compareTo(secondRank) < 0, "First position must be lower than second. Got firstRank $firstRank and second rank $secondRank");

  /// Make positions equal
  while (firstRank.length != secondRank.length) {
    if (firstRank.length > secondRank.length)
      secondRank += "a";
    else
      firstRank += "a";
  }

  var firstPositionCodes = [];
  firstPositionCodes.addAll(firstRank.codeUnits);

  var secondPositionCodes = [];
  secondPositionCodes.addAll(secondRank.codeUnits);

  var difference = 0;

  for (int index = firstPositionCodes.length - 1; index >= 0; index--) {
    /// Codes of the elements of positions
    var firstCode = firstPositionCodes[index];
    var secondCode = secondPositionCodes[index];

    /// i.e. ' a < b '
    if (secondCode < firstCode) {
      /// ALPHABET_SIZE = 26 for now
      secondCode += ALPHABET_SIZE;
      secondPositionCodes[index - 1] -= 1;
    }

    /// formula: x = a * size^0 + b * size^1 + c * size^2
    final powRes = pow(ALPHABET_SIZE, firstRank.length - index - 1);
    difference += (secondCode - firstCode) * powRes;
  }

  var newElement = "";
  if (difference <= 1) {
    /// add middle char from alphabet
    newElement = firstRank +
        String.fromCharCode('a'.codeUnits.first + ALPHABET_SIZE ~/ 2);
  } else {
    difference ~/= 2;

    var offset = 0;
    for (int index = 0; index < firstRank.length; index++) {
      /// formula: x = difference / (size^place - 1) % size;
      /// i.e. difference = 110, size = 10, we want place 2 (middle),
      /// then x = 100 / 10^(2 - 1) % 10 = 100 / 10 % 10 = 11 % 10 = 1
      final diffInSymbols = difference ~/ pow(ALPHABET_SIZE, index) % (ALPHABET_SIZE);

      var newElementCode = firstRank.codeUnitAt(
          secondRank.length - index - 1) + diffInSymbols + offset;
      offset = 0;

      /// if newElement is greater then 'z'
      if (newElementCode > 'z'.codeUnits.first) {
        offset++;
        newElementCode -= ALPHABET_SIZE;
      }

      newElement += String.fromCharCode(newElementCode);
    }

    newElement = newElement
        .split('')
        .reversed
        .join();
  }

  return newElement;
}




But that is not all



In any case, it was not all for us. We added this feature to an already released application, so a migration was needed. Writing migrations for SQL is no problem, but calculating the standard ranks is no longer so easy. But, knowing how the middle row is located, it becomes not difficult to do this. The algorithm will be as follows:



  • set the beginning and end of the interval (for us these are "aaa" and "zzz", respectively)
  • we count how many combinations of different characters between the lines, here formula 1 will help us
  • now we divide the result by the maximum possible number of elements in the list
  • so, we have a step, there is an initial position, it remains only to add a step to the initial position, get a rank, then add a step to this rank, get a new rank, then add a step again, and so on


It's the same on Dart. the forNumOfTasks parameter is responsible for how many positions you get. If you put down positions for a list where there are only three elements now, it makes no sense to calculate positions for the entire list (by 50, 100 or something else)



Our implementation of finding 'default' ranks
/// modify field forNumOfTasks to get certain number of positions
List‹String› getDefaultRank({int forNumOfTasks = TASK_FOR_PROJECT_LIMIT_TOTAL}) {
	final startPos = START_POSITION;
	final endPos = END_POSITION;

	final startCode = startPos.codeUnits.first;
	final endCode = endPos.codeUnits.first;

	final diffInOneSymb = endCode - startCode;

	/// x = a + b * size + c * size^2
	final totalDiff = diffInOneSymb + diffInOneSymb * ALPHABET_SIZE + diffInOneSymb * ALPHABET_SIZE * ALPHABET_SIZE;
	/// '~/' – div without remainder
	final diffForOneItem = totalDiff ~/ (TASK_FOR_PROJECT_LIMIT_TOTAL + 1);

	/// x = difference / size^(place - 1) % size
	final List‹int› diffForSymbols = [
		diffForOneItem % ALPHABET_SIZE,
		diffForOneItem ~/ ALPHABET_SIZE % ALPHABET_SIZE,
		diffForOneItem ~/ (pow(ALPHABET_SIZE, 2)) % ALPHABET_SIZE
	];

	List‹String› positions = [];
	var lastAddedElement = startPos;
	for (int ind = 0; ind < forNumOfTasks; ind++) {
		var offset = 0;
		var newElement = "";
		for (int index = 0; index < 3; index++) {
			final diffInSymbols = diffForSymbols[index];

			var newElementCode = lastAddedElement.codeUnitAt(2 - index) + diffInSymbols;
			if (offset != 0) {
				newElementCode += 1;
				offset = 0;
			}
			/// 'z' code is 122 if 'll be needed
			if (newElementCode > 'z'.codeUnitAt(0)) {
				offset += 1;
				newElementCode -= ALPHABET_SIZE;
			}
			final symbol = String.fromCharCode(newElementCode);
			newElement += symbol;
		}

		/// reverse element cuz we are calculating from the end
		newElement = newElement.split('').reversed.join();
		positions.add(newElement);
		lastAddedElement = newElement;
	}

	positions.sort();
	positions.forEach((p) => print(p));
	return positions;
}





Fuuuuh, are you tired? The hardest part is over, very little is left!



We didn't really like the bucket idea. Objectively, she is good. But we didn’t like the very idea of ​​having a “recovery” algorithm: if positions are over, recover with buckets! So, no buckets. However, the ranks are not infinite, which means that something must be invented.



And we came up with



If there was no space left between the lines, then we decided to simply add the middle letter of the English alphabet ("n") to the bottom border. Those. if we want to stick an element between "aa" and "ab", we get "aa", "aan" and "ab". Due to the fact that strings are sorted element-wise from left to right, lengthening the string will not spoil the sort. But we have a place for new elements, and this is without any recovery algorithms.



This piece of code can also be found in the algorithm for finding the middle line.



A piece of code with the addition of a 'middle' character
if (difference <= 1) {
    /// add middle char from alphabet
    newElement = firstRank +
        String.fromCharCode('a'.codeUnits.first + ALPHABET_SIZE ~/ 2);
  }




Summary



Lexorangs seemed to us an excellent indexing tool, the use of which optimizes work with the database and the server: when changing the order of tasks, only one changed task needs to be updated.



Share your opinion on lexorangs and what thoughts you have on solving such problems.



Well, for all the readers of Habr, we propose to evaluate the result that we got. And also pick up a useful list of "Habr's Authors' Code" .



Thanks for your attention!



All Articles