Why it is important to conduct static analysis of open source libraries that you add to your project

PVS-Studio and Awesome header-only C ++ libraries


Modern applications are built from third-party libraries like building blocks. This is normal and the only option to complete the project in a reasonable amount of time and with a reasonable budget. However, taking all the bricks indiscriminately may not be such a good idea. If there are several options, then it is useful to take the time to analyze open libraries in order to choose the highest quality one.



Collection "Awesome header-only C ++ libraries"



The story of this writing began with the Cppcast podcast " Cross Platform Mobile Telephony ". From it, I learned about the existence of the " awesome-hpp " list, which lists a large number of open C ++ libraries, consisting only of header files.



This list interested me for two reasons. Firstly, it is an opportunity to replenish the base of projects for testing our PVS-Studio analyzer on modern code. Many projects are written in C ++ 11, C ++ 14 and C ++ 17. Secondly, it is an opportunity to write an article about checking these projects.



The projects are small, so there are few bugs in each one individually. Plus, there are few warnings, because some bugs can only be detected if template classes or functions are instantiated in custom code. Until these classes and functions are used, it is often impossible to figure out whether there is an error or not. Nevertheless, in total, there were a lot of mistakes, and I will write about them in the next article. This article is not about errors, but about a warning.



Why analyze



By using third-party libraries, you unconditionally trust them to do some of the work and calculations. The danger is that sometimes programmers choose a library without thinking at all that errors can contain not only their code, but also the code of this library itself. As a result, there are unobvious, incomprehensible errors that can manifest themselves in the most unexpected way.



The code of well-known open source libraries is well debugged, and the probability of encountering an error there is much less than in similar code written by yourself. The trouble is that not all libraries are widely used and debugged. And this is where the question of assessing their quality arises.



To make it clearer, let's look at an example. Let's take the JSONCONS library .
JSONCONS is a C ++, header-only library for constructing JSON and JSON-like data formats such as CBOR.
A specific library for specific tasks. It may work well overall, and you will never see bugs in it. But God forbid you need to use this overloaded operator << = .



static constexpr uint64_t basic_type_bits = sizeof(uint64_t) * 8;
....
uint64_t* data() 
{
  return is_dynamic() ? dynamic_stor_.data_ : short_stor_.values_;
}
....
basic_bigint& operator<<=( uint64_t k )
{
  size_type q = (size_type)(k / basic_type_bits);
  if ( q ) // Increase common_stor_.length_ by q:
  {
    resize(length() + q);
    for (size_type i = length(); i-- > 0; )
      data()[i] = ( i < q ? 0 : data()[i - q]);
    k %= basic_type_bits;
  }
  if ( k )  // 0 < k < basic_type_bits:
  {
    uint64_t k1 = basic_type_bits - k;
    uint64_t mask = (1 << k) - 1;             // <=
    resize( length() + 1 );
    for (size_type i = length(); i-- > 0; )
    {
      data()[i] <<= k;
      if ( i > 0 )
        data()[i] |= (data()[i-1] >> k1) & mask;
      }
  }
  reduce();
  return *this;
}


PVS-Studio analyzer warning: V629 Consider inspecting the '1 << k' expression. Bit shifting of the 32-bit value with a subsequent expansion to the 64-bit type. bigint.hpp 744



As I understand it, the function works with large numbers that are stored as an array of 64-bit elements. To work with certain bits, you need to form a 64-bit mask:



uint64_t mask = (1 << k) - 1;


But this mask is formed incorrectly. Since the numeric literal 1 is of type int , shifting it by more than 31 bits will result in undefined behavior.
From the standard:

shift-expression << additive-expression

...

2. The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 * 2 ^ E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 * 2 ^ E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
The value of the variable mask can be anything you like. Yes, I know, theoretically anything can happen because of UB. But in practice, most likely, we are talking about an incorrect expression result.



So, we have a function that cannot be used. Rather, it will work only for some special cases of the value of the input argument. This is a potential trap that a programmer can fall into. The program can run and pass various tests, and then unexpectedly refuse the user on other input files.



You can also see another error like this in operator >> = .



A rhetorical question. Should I trust this library?



Perhaps worth it. After all, there are mistakes in any project. However, it's worth considering: if these errors exist, are there others that could lead to nasty data corruption? Wouldn't it be better to give preference to the more popular / tested library if there are several?



An unconvincing example? Okay, let's get another one. Let's take the Universal math library . It is expected that the library provides the ability to operate with vectors. For example, multiply and divide a vector by a scalar value. Ok, let's see how these operations are implemented. Multiplication:



template<typename Scalar>
vector<Scalar> operator*(double scalar, const vector<Scalar>& v) {
  vector<Scalar> scaledVector(v);
  scaledVector *= scalar;
  return v;
}


PVS-Studio analyzer warning: V1001 The 'scaledVector' variable is assigned but is not used by the end of the function. vector.hpp 124



Due to a typo, it is not the new container scaledVector that is returned , but the original vector. The same error is in the division operator. Facepalm.



Again, these errors do not mean anything separately. Although no, this is a hint that this library is underutilized and there is a high probability that there are other serious unnoticed bugs in it.



Output. If several libraries provide the same functionality, then it is worthwhile to carry out a preliminary analysis of their quality and choose the most tested and reliable one.



How to analyze



Ok, we want to understand the quality of the libraries' code, but how to do it? Yes, this is not easy to do. You can't just go and see the code. Rather, you can look at something, but it will give little information. Moreover, such a review is unlikely to help assess the density of errors in the project.



Let's return to the previously mentioned Universal math library. Try to find the error in the code of this function. Actually, seeing the accompanying comment, I can't get past this place :).



// subtract module using SUBTRACTOR: CURRENTLY BROKEN FOR UNKNOWN REASON


PVS-Studio Facepalm


template<size_t fbits, size_t abits>
void module_subtract_BROKEN(const value<fbits>& lhs, const value<fbits>& rhs,
                            value<abits + 1>& result) {
  if (lhs.isinf() || rhs.isinf()) {
    result.setinf();
    return;
  }
  int lhs_scale = lhs.scale(),
      rhs_scale = rhs.scale(),
      scale_of_result = std::max(lhs_scale, rhs_scale);

  // align the fractions
  bitblock<abits> r1 = lhs.template nshift<abits>(lhs_scale-scale_of_result+3);
  bitblock<abits> r2 = rhs.template nshift<abits>(rhs_scale-scale_of_result+3);
  bool r1_sign = lhs.sign(), r2_sign = rhs.sign();

  if (r1_sign) r1 = twos_complement(r1);
  if (r1_sign) r2 = twos_complement(r2);

  if (_trace_value_sub) {
    std::cout << (r1_sign ? "sign -1" : "sign  1") << " scale "
      << std::setw(3) << scale_of_result << " r1       " << r1 << std::endl;
    std::cout << (r2_sign ? "sign -1" : "sign  1") << " scale "
      << std::setw(3) << scale_of_result << " r2       " << r2 << std::endl;
  }

  bitblock<abits + 1> difference;
  const bool borrow = subtract_unsigned(r1, r2, difference);

  if (_trace_value_sub) std::cout << (r1_sign ? "sign -1" : "sign  1")
    << " borrow" << std::setw(3) << (borrow ? 1 : 0) << " diff    "
    << difference << std::endl;

  long shift = 0;
  if (borrow) {   // we have a negative value result
    difference = twos_complement(difference);
  }
  // find hidden bit
  for (int i = abits - 1; i >= 0 && difference[i]; i--) {
    shift++;
  }
  assert(shift >= -1);

  if (shift >= long(abits)) {            // we have actual 0 
    difference.reset();
    result.set(false, 0, difference, true, false, false);
    return;
  }

  scale_of_result -= shift;
  const int hpos = abits - 1 - shift;         // position of the hidden bit
  difference <<= abits - hpos + 1;
  if (_trace_value_sub) std::cout << (borrow ? "sign -1" : "sign  1")
    << " scale " << std::setw(3) << scale_of_result << " result  "
    << difference << std::endl;
  result.set(borrow, scale_of_result, difference, false, false, false);
}


I am sure, despite the fact that I suggested that there is an error in this code, it is not easy to find it.



If not found, then here it is. PVS-Studio warning : V581 The conditional expressions of the 'if' statements situated alongside each other are identical. Check lines: 789, 790. value.hpp 790



if (r1_sign) r1 = twos_complement(r1);
if (r1_sign) r2 = twos_complement(r2);


A classic typo. In the second condition, the r2_sign variable should be checked .



In general, you can forget about the "manual" code review. Yes, such a path is possible, but it is unreasonably time consuming.



What do I suggest? Very simple. Use static code analysis .



Check the libraries you intend to use. Start looking at the reports and everything will become clear enough quickly.



You don't even need deep, thorough analysis, and you don't need to filter out false positives. You just have to go through the report and examine the warnings. False positives due to lack of settings can simply be patient and focus on mistakes.



However, false positives can also be taken into account indirectly. The more there are, the more messy the code is. In other words, there are many tricks in the code that confuse the analyzer. They also confuse the people who support the project and, as a result, negatively affect its quality.



Note. Don't forget about project size. A big project will always have more bugs. But the number of errors is not at all the same as the error density. Consider this when taking projects of different sizes and make adjustments.



What to use



There are many static code analysis tools. I naturally suggest using the PVS-Studio analyzer . It is great for both a one-time evaluation of the quality of the code, and for regular searching and fixing bugs.



You can check the code of projects in C, C ++, C # and Java. The product is proprietary. However, a free trial license will be more than enough to evaluate the quality of several open source libraries.



I also remind you that there are several options for free licensing of the analyzer for:





Conclusion



The methodology of static code analysis is still undeservedly underestimated by many programmers. A possible reason for this is experience with simple noisy "linter" class tools that perform very simple and, unfortunately, often not very useful checks.



For those who doubt whether it is worth trying to implement a static analyzer in the development process, the following two publications are:





Thank you for your attention, and I wish you fewer bugs both in your code and in the code of the libraries used :).





If you want to share this article with an English-speaking audience, please use the translation link: Andrey Karpov. Why it is important to apply static analysis for open libraries that you add to your project .



All Articles