Programmers need mathematics, or a problem I had to solve

Hello!

I am working on WebRTC - a framework for audio-video conferencing (or calls? In other words - real time communication). In this article I want to describe an interesting problem and how it was solved. In the problem, in fact, it was required to minimize the lcm of several real numbers with additional restrictions. I had to apply quite a bit of number theory or at least logic.

If you are only interested in the problem, then you can safely skip to the section "Formulation of the problem". The next section explains where it came from and what its meaning is.

Introduction

Clients can configure WebRTC to encode the incoming stream in multiple resolutions at once. For example, this can be useful in video conferencing: each client sends several streams to the server with different resolutions and bit rates, and the server sends to everyone else only the stream that fits in the bandwidth to the client.

But you can't just set the desired permissions, no — that would be too easy. The fact is that a source (for example, a camera in chrome) can produce video of any resolution. And there is also a feedback mechanism, and with a high load on the CPU, the incoming resolution decreases. In short, the user sets the scaling factors S_i \ ge 1.0. Then the incoming frame is compressed a specified number of times, encoded and sent over the network to recipients.

The problem is that some encoders don't work with arbitrary images - they definitely need even sizes. And there are also all sorts of optimizations when encoding, if the resolution ratio for different images is whole. And most importantly, if different streams have different aspect ratios, then when switching between them there will be a very noticeable jerk. Therefore, it is necessary that the incoming resolution is completely divided by all coefficients.

, , : alignment. , {1.0, 2.0, 4.0} , alignment=8. - . , . , , 8 1, 2 4 , .

, {1, 1.7, 2.3}? , "" - 391. , 782. , , 782. , VGA (640x480) . - , , , -, , -, .

, , , ? , {1, 1.6, 2.4} {1, 1.7, 2.3} 48 ( 782). , .

: d \ in \ mathbb {N}, \ S_i \ ge 1, S_i \ in \ mathbb {R}, i = 1..n

: A \ in \ mathbb {N}, \ A \ le MaxA, \ S'_i \ in \ mathbb {R}, \ S'_i \ ge 1, \ i = 1..n

:

\ sum_ {i = 1} ^ n \ left (S_i -S'_i \ right) ^ 2 \ rightarrow min\ frac {A} {S'_i \ cdot d} \ in \ mathbb {N}, i = 1..n \ \ \ \ \ \ \ \ \ \ (1)

: - , , .

, - -. A . MaxA MaxA ( 16). - A , .

- , (1), . i- .

A / (S'_i \ cdot d), A, d \ in \ mathbb {N}, , S'_i \ in \ mathbb {Q}S'_i = N_i / D_i. .

, : GCD (N_i, D_i) = 1

(1) \ frac {A \ cdot D_i} {N_i \ cdot d} \ in \ mathbb {N} ,

N_i \ cdot d \ vert A \ cdot D_i \ \ \ \ \ \ \ (2)

( : ).

. N_i D_i , . N_i N_i \ vert A,

A = N_i \ cdot f, \ f \ in \ mathbb {N} \ \ \ \ \ \ (3)

, (2) f:

f \ cdot N_i \ cdot d \ vert f \ cdot A \ cdot D_i

(3) A:

A \ cdot d \ vert f \ cdot A \ cdot D_i

A

d \ vert f \ cdot D_i

, :

f \ cdot D_i = k \ cdot d, k \ in \ mathbb {N} \ \ \ \ \ \ \ \ \ \ (4)

S'_i f (3) (4):

S'_i = \ frac {N_i \ cdot f} {D_i \ cdot f} = \ frac {A} {f \ cdot D_i} = \ frac {A} {k \ cdot d}, \ \ k \ in \ mathbb {N} \ \ \ \ \ \ \ \ (5)

, S'_i 1 ( ) :

k \ le \ frac {A} {d} \ \ \ \ \ \ \ (6)

, (1) (5) (6), , A, d . . (6) , .

. , k  , 0, k ^ * = \ frac {A} {S_i \ cdot d}  . , 2 k = min \ {\ lfloor k ^ * \ rfloor, \ \ lceil k ^ * \ rceil \}   . , - , . . , (6).

, ( ):

const int kMaxAlignment = 16;

//    scale_factor (S_i)   
//   (d)     (A).
//     error_acc.
float GetApprox(int encoder_alignment, int requested_alignment, 
                float scale_factor, float *error_acc) {
  int k = static_cast<int> ((requested_alignment + 0.0) / 
                            (encoder_alignment * scale_factor));
  float best_error = 1e90;
  float best_approx = 1.0;
  for (int i = 0; i < 2; i++, k++) {
    if (k == 0 || k * encoder_alignment > requested_alignment) continue;
    float approx = (requested_alignment +0.0) / (k * encoder_alignment);
    float error = (approx - scale_factor) * (approx - scale_factor);
    if (error < best_error) {
      best_error = error;
      best_approx = approx;
    }
  }
  *error_acc += best_error;
  return best_approx;
}

//  .    (S'_i)
//    (A)   requested_alignment.
std::vector<float> CalulateAlignmentAndScaleFactors(
    int encoder_alignment, std::vector<float> scale_factors, 
    int *requested_alignment) {
  float best_error = 1e90;
  int best_alignment = 1;
  std::vector<float> best_factors;
  std::vector<float> cur_factors;
  
  for (int a = 1; a <= kMaxAlignment; ++a) {
    float cur_error = 0;
    cur_factors.clear();
    for (float factor: scale_factors) {
      float approx = GetApprox(encoder_alignment, a, factor, &cur_error);
      cur_factors.push_back(approx);
    }
    if (cur_error < best_error) {
      best_error = cur_error;
      best_factors = cur_factors;
      best_alignment = a;
    }
  }
  *requested_alignment = best_alignment;
  return best_factors;
}

, , . , . , .

Yes, without mathematics, you can still convince yourself that the coefficients issued by this code will fit the condition of the problem (the numerator divides the calculated alignment, so share everything entirely, and the denominator gives divisibility by the required alignment for the encoder). But without the chain of reasoning (1) => (4), (5) it is generally unclear how this code finds the optimal solution.




All Articles