Tensors for C #. Matrices, vectors, custom types, and relatively fast

Hello!



I somehow needed tensors (matrix expansions) for my projection. Googled, found a number of all sorts of libraries, all around the bush, and what you need - no. I had to implement the five-day plan and implement what was needed. A short note on working with tensors and optimization tricks.







So what do we need?



  1. N-dimensional matrices (tensors)
  2. Implementation of a basic set of methods for working with a tensor as a data structure
  3. Implementation of a basic set of mathematical functions (for matrices and vectors)
  4. Generic types, that is, any. And custom operators


And what has already been written before us?
, Towel , :



  1. ,
  2. Transpose โ€” ยซยป , O(V), V โ€” ยซยป.
  3. , . , , , . , ( , )


System.Numerics.Tensor . , , , . , , .



MathSharp, NumSharp, Torch.Net, TensorFlow, /ML-, .



Element storage, transposition, subtensor



The items will be stored in a one-dimensional array. To get an element from a set of indices, we will multiply the indices by certain coefficients and add. Namely, suppose we have a tensor [3 x 4 x 5]. Then we need to form an array of three elements - blocks (he himself came up with the name). Then the last element is 1, the penultimate is 5, and the first element is 5 * 4 = 20. That is, blocks = [20, 5, 1]. For example, when accessing by index [1, 0, 4], the index in the array will look like 20 * 1 + 5 * 0 + 4 * 1 = 24. So far, everything is clear



Transpose



... that is, changing the order of the axes. The dumb and easy way is to create a new array and put the elements there in a new order. But it is often convenient to transpose, work with the desired axis order, and then change the axis order back. Of course, in this case, you cannot change the linear array (LM) itself , and when referring to certain indices, we will simply change the order.



Consider the function:



private int GetFlattenedIndexSilent(int[] indices)
{
    var res = 0;
    for (int i = 0; i < indices.Length; i++)
        res += indices[i] * blocks[i];
    return res + LinOffset;
}


As you can see, if you swap the blocks , then the visibility of the swap axes will be created. Therefore , we will write this:



public void Transpose(int axis1, int axis2)
{
    (blocks[axis1], blocks[axis2]) = (blocks[axis2], blocks[axis1]);
    (Shape[axis1], Shape[axis2]) = (Shape[axis2], Shape[axis1]);
}


We just change the numbers and lengths of the axes in places.



Subtensor



The subtensor of an N-dimensional tensor is an M-dimensional tensor (M <N), which is part of the original. For example, the zero element of the shape tensor [2 x 3 x 4] is the shape tensor [3 x 4]. We get it just by shifting.



Let's imagine that we get a subtensor at index n . Then, the first element is the n * blocks [0] + 0 * blocks [1] + 0 * blocks [2] + ... . That is, the shift is n * blocks [0] . Accordingly, without copying the original tensor, we remember the shift , create a new tensor with a link to our data, but with a shift. And you will also need to throw out the element from blocks, namely the element blocks [0], because this is the first axis, there will be no calls to it.

Other Composition Operations



All others already follow from these.



  1. SetSubtensor will forward the elements to the desired subtensor
  2. Concat creates a new tensor, and there it will forward elements of two (while the length of the first axis is the sum of the lengths of the axes of the two tensors)
  3. Stack groups a number of tensors into one with an additional axis (for example, stack ([3 x 4], [3 x 4]) -> [2 x 3 x 4])
  4. Slice returns Stack from certain subtenzers


All the composition operations that I have defined can be found here .



Mathematical operations



Everything is already simple here.



1) Pointwise operations (that is, for two tensors, operations are performed on a pair of corresponding elements (that is, with the same coordinates)). The implementation is here (an explanation of why such an ugly code is below).



2) Operations on matrices. Product, inversion, and other simple operations, it seems to me, require no explanation. Although there is a story to tell about the determinant.



The Determinant's Tale
, . โ€” , , O(N!). 3x3 ( ).



, . -: float , int .



, , . , .



, . , , , , . . SafeDivisionWrapper .



: . , . SafeDivision ( , ).



3) Operations on veterans (dot and cross product).



Optimization



Templates?


There are no templates in C # . Therefore, you have to use crutches. Some people have come up with dynamic compilation into a delegate, for example , it implements the sum of two types.



However, I wanted a custom, so I started an interface from which the user needs to inherit the structure. In this case, the primitive is stored in the linear array itself, and the addition, difference, and others functions are called as



var c = default(TWrapper).Addition(a, b);


Which is inlined before your method. An example of the implementation of such a structure .



Indexing


Further, although it seems that it is logical to use params in the indexer, that is, something like this:



public T this[params int[] indices]


In fact, each call will create an array, so you have to create a lot of overloads . The same happens with other functions that work with indexes.



Exceptions


I also drove all the exceptions and checks into the #if ALLOW_EXCEPTIONS block in case you definitely need it quickly, and there are definitely no problems with indexes. There is a slight increase in performance.



In fact, this is not just micro-optimization, which will cost a lot in terms of security. For example, a query is sent to your tensor, in which you already, for your own reasons, made a check for the correctness of the data. Then why do you need another check? And they are not free, especially if we save even unnecessary arithmetic operations with integers.



Multithreading


Thanks Billy, it turned out to be very simple and implemented via Parallel.For.



Although multithreading is not a panacea, you must enable it carefully. I ran a benchmark for pointwise addition of tensors on the i7-7700HQ:







Where the Y-axis shows the time (in microseconds) to perform a single operation with two integer tensors of a certain size (X-axis size).



That is, there is a certain threshold from which multithreading makes sense. In order not to have to think, I made the Threading.Auto flag and stupidly hardcoded the constants, starting with a volume equal to which you can enable multithreading (is there a smarter automatic method?).



At the same time, the library is still not faster than game matrices, or even more so those that are calculated on CUDA. What for? Those have already been written, but our main thing is a custom type.



Like this



Here's a short note, thanks for reading. The link to the project's github is here . And its main user is the AngouriMath symbolic algebra library .



A little about our tensors in AngouriMath
, AM- Entity, . ,



var t = MathS.Matrices.Matrix(3, 3,
              "A", "B", "C",   //      ,     "A * 3",  "sqrt(sin(x) + 5)"
              "D", "E", "F",
              "G", "H", "J");
Console.WriteLine(t.Determinant().Simplify());






A * (E * J - F * H) + C * (D * H - E * G) - B * (D * J - F * G)








All Articles