Data flow analysis is an integral part of any modern static code analyzer. However, from the outside, it is not very clear what this is and what is most important - why it is needed. Until now, some people associate static analysis with searching for something in the code according to a certain pattern. Therefore, from time to time we write notes in which we demonstrate how this or that technology, used in the PVS-Studio analyzer, helps to reveal another interesting error. Today is just such an article in which we will consider a bug in one of the implementations of the Base64 binary data encoding standard.
It all started with checking the latest version of the Qt 6 library. This was a separate classic article , where I described 77 errors found. It so happened that at first I decided to skim through the report, not yet hiding the warnings related to third-party libraries. In other words, I have not disabled warnings related to \ src \ 3rdparty in the settings. And it so happened that I immediately came across an interesting example of an error in the Open Asset Import Library , about which I decided to make this separate small note.
The defect found demonstrates why it is useful to analyze data flow in tools such as PVS-Studio . Without this, the search for many errors is simply impossible. By the way, if you are interested in learning more about data flow analysis and other aspects of the tool's structure, I would like to bring to your attention the article " Technologies used in the PVS-Studio code analyzer to find errors and potential vulnerabilities ".
Now let's move on to the error found in the Open Asset Import Library (assimp). File: \ src \ 3rdparty \ assimp \ src \ code \ FBX \ FBXUtil.cpp.
std::string EncodeBase64(const char* data, size_t length)
{
// calculate extra bytes needed to get a multiple of 3
size_t extraBytes = 3 - length % 3;
// number of base64 bytes
size_t encodedBytes = 4 * (length + extraBytes) / 3;
std::string encoded_string(encodedBytes, '=');
// read blocks of 3 bytes
for (size_t ib3 = 0; ib3 < length / 3; ib3++)
{
const size_t iByte = ib3 * 3;
const size_t iEncodedByte = ib3 * 4;
const char* currData = &data[iByte];
EncodeByteBlock(currData, encoded_string, iEncodedByte);
}
// if size of data is not a multiple of 3,
// also encode the final bytes (and add zeros where needed)
if (extraBytes > 0)
{
char finalBytes[4] = { 0,0,0,0 };
memcpy(&finalBytes[0], &data[length - length % 3], length % 3);
const size_t iEncodedByte = encodedBytes - 4;
EncodeByteBlock(&finalBytes[0], encoded_string, iEncodedByte);
// add '=' at the end
for (size_t i = 0; i < 4 * extraBytes / 3; i++)
encoded_string[encodedBytes - i - 1] = '=';
}
return encoded_string;
}
, . , , Base64 :). :
Ok, . Base64. 64 . - A-Z, a-z 0-9 (62 ) 2 , . 3 4 .
, , "=". . .
, . , . , , - . "- " : V547 [CWE-571] Expression 'extraBytes > 0' is always true. FBXUtil.cpp 224
, , extraBytes:
// calculate extra bytes needed to get a multiple of 3
size_t extraBytes = 3 - length % 3;
, , 3. 3. :
size_t extraBytes = length % 3;
, , , 5 , 5 % 3 = 2, 2 . 6 , , 6 % 3 = 0.
, , . :
size_t extraBytes = (3 - length % 3) % 3;
, . , :
size_t extraBytes = 3 - length % 3;
. length, [0..2]. PVS-Studio , . . . Value Range Analysis. .
:
size_t extraBytes = 3 - [0..2];
, extraBytes . : [1..3].
, . , , , :
if (extraBytes > 0)
, , , , , , , .
, , , . . , 6 . 8 . , .
// calculate extra bytes needed to get a multiple of 3
size_t extraBytes = 3 - length % 3; // 3-6%3 = 3
// number of base64 bytes
size_t encodedBytes = 4 * (length + extraBytes) / 3; // 4*(6+3)/3 = 12
std::string encoded_string(encodedBytes, '=');
, 12 , 8. – .
. , - , , Base64. , , , " , ".
, : Andrey Karpov. Why PVS-Studio Uses Data Flow Analysis: Based on Gripping Error in Open Asset Import Library.