Checking Clang 11 with PVS-Studio

PVS-Studio: Still worthy!


From time to time we have to write articles about checking the next version of a compiler. It is not interesting. However, as practice shows, if this is not done for a long time, people begin to doubt whether the PVS-Studio analyzer deserves the title of a good catcher of bugs and potential vulnerabilities. Perhaps the new compiler already knows how to do this? Yes, compilers do not stand still. However, PVS-Studio is also developing, again and again demonstrating the ability to find errors even in the code of such high-quality projects as compilers.



Time to double-check the Clang code



To be honest, I took the article " Checking the GCC 10 Compiler Using PVS-Studio " as the basis for this text . So if it seems to you that you've already read some paragraphs somewhere, then you don't think :).



It's no secret that compilers have their own built-in static code analyzers, and they are also being developed. Therefore, from time to time we write articles about how the PVS-Studio static analyzer can find errors even inside compilers and that we eat bread for a reason :).



In fact, you cannot compare classical static analyzers with compilers. Static analyzers are not only about finding errors in the code, but also a developed infrastructure. For example, this is integration with systems such as SonarQube, PlatformIO, Azure DevOps, Travis CI, CircleCI, GitLab CI / CD, Jenkins, Visual Studio. These are advanced mechanisms for mass suppression of warnings, which allows you to quickly start using PVS-Studio even in a large old project. This is a notification distribution. And so on and so forth. However, it is still the first question that is asked: "Can PVS-Studio find something that compilers cannot find?" This means that we will again and again write articles about checking these compilers themselves.



Let's get back to the topic of checking the Clang project. There is no need to dwell on this project and tell what it is. In fact, the code of not only Clang 11 itself was tested, but also the LLVM 11 library on which it is built. From the point of view of this article, it makes no difference whether a defect is found in the compiler or library code.



The Clang / LLVM code seemed to me much clearer than the GCC code. At least all these terrible macros are missing, and the modern features of the C ++ language are actively used.



Despite this, the project is large, and without preliminary settings of the analyzer it is still very tedious to view the report. Basically, half-false positives interfere. By "semi-false" positives, I mean situations when the analyzer is formally right, but there is no sense in warning. For example, a lot of such positives are issued for unit tests and generated code.



Example for tests:



Spaces.SpacesInParentheses = false;               // <=
Spaces.SpacesInCStyleCastParentheses = true;      // <=
verifyFormat("Type *A = ( Type * )P;", Spaces);
verifyFormat("Type *A = ( vector<Type *, int *> )P;", Spaces);
verifyFormat("x = ( int32 )y;", Spaces);
verifyFormat("int a = ( int )(2.0f);", Spaces);
verifyFormat("#define AA(X) sizeof((( X * )NULL)->a)", Spaces);
verifyFormat("my_int a = ( my_int )sizeof(int);", Spaces);
verifyFormat("#define x (( int )-1)", Spaces);

// Run the first set of tests again with:
Spaces.SpacesInParentheses = false;               // <=
Spaces.SpaceInEmptyParentheses = true;
Spaces.SpacesInCStyleCastParentheses = true;      // <=
verifyFormat("call(x, y, z);", Spaces);
verifyFormat("call( );", Spaces);


The analyzer warns that the variables are assigned the same values ​​that they already contain:



  • V1048 The 'Spaces.SpacesInParentheses' variable was assigned the same value. FormatTest.cpp 11554
  • V1048 The 'Spaces.SpacesInCStyleCastParentheses' variable was assigned the same value. FormatTest.cpp 11556


Formally, the analyzer returned the correct response, and this is the code fragment that should be simplified or corrected. At the same time, it is clear that, in fact, everything is fine and that there is no point in editing something either.



Another example: the analyzer issues a huge number of warnings to the auto-generated Options.inc file. You can see the "code sheet" in the file:



The code


And PVS-Studio sends warnings to all this:



  • V501 There are identical sub-expressions to the left and to the right of the '==' operator: nullptr == nullptr Options.inc 26
  • V501 There are identical sub-expressions to the left and to the right of the '==' operator: nullptr == nullptr Options.inc 27
  • V501 There are identical sub-expressions to the left and to the right of the '==' operator: nullptr == nullptr Options.inc 28
  • and so on for each line ...


All this is not scary. All this can be defeated: disable checking of unnecessary files, mark up some macros and functions, suppress certain types of alarms, and so on. It is possible, but doing this as part of the task of writing an article is not interesting. Therefore, I did exactly the same as in the article about the GCC compiler. I studied the report until I had 11 interesting code examples to write an article. Why 11? I thought that since the Clang version is 11, then let the fragments be 11 :).



11 suspicious code snippets



Fragment N1, modulo division by one



% 1 - modulo division by 1


Cool mistake! I love these!



void Act() override {
  ....
  // If the value type is a vector, and we allow vector select, then in 50%
  // of the cases generate a vector select.
  if (isa<FixedVectorType>(Val0->getType()) && (getRandom() % 1)) {
    unsigned NumElem =
        cast<FixedVectorType>(Val0->getType())->getNumElements();
    CondTy = FixedVectorType::get(CondTy, NumElem);
  }
  ....
}


PVS-Studio warning: V1063 The modulo by 1 operation is meaningless. The result will always be zero. llvm-stress.cpp 631 Modulo



division is used to get a random value of 0 or 1. But, apparently, this value of 1 is confusing, and people make the classic pattern of error, dividing by one, although it is necessary to divide by two. The operation X% 1 is meaningless, since the result is always 0 . Correct code variant:



if (isa<FixedVectorType>(Val0->getType()) && (getRandom() % 2)) {


Diagnostics V1063, which recently appeared in PVS-Studio, is outrageously simple, but, as you can see, it works.



As we know, compiler developers look at what we are doing and borrow our best practices. There is nothing wrong with that. We are pleased that PVS-Studio is the engine of progress . We set the time after how much the same diagnostics will appear in Clang and GCC :).



Fragment N2, typo in the condition



class ReturnValueSlot {
  ....
  bool isNull() const { return !Addr.isValid(); }
  ....
};

static bool haveSameParameterTypes(ASTContext &Context, const FunctionDecl *F1,
                                   const FunctionDecl *F2, unsigned NumParams) {
  ....
  unsigned I1 = 0, I2 = 0;
  for (unsigned I = 0; I != NumParams; ++I) {
    QualType T1 = NextParam(F1, I1, I == 0);
    QualType T2 = NextParam(F2, I2, I == 0);
    if (!T1.isNull() && !T1.isNull() && !Context.hasSameUnqualifiedType(T1, T2))
      return false;
  }
  return true;
}


PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the '&&' operator:! T1.isNull () &&! T1.isNull () SemaOverload.cpp 9493 Checking



twice ! T1.isNull () . This is an obvious typo, and the second part of the condition should check the T2 variable .



Fragment N3, potential out of bounds of the array



std::vector<Decl *> DeclsLoaded;

SourceLocation ASTReader::getSourceLocationForDeclID(GlobalDeclID ID) {
  ....
  unsigned Index = ID - NUM_PREDEF_DECL_IDS;

  if (Index > DeclsLoaded.size()) {
    Error("declaration ID out-of-range for AST file");
    return SourceLocation();
  }

  if (Decl *D = DeclsLoaded[Index])
    return D->getLocation();
  ....
}


PVS-Studio warning: V557 Array overrun is possible. The 'Index' index is pointing beyond array bound. ASTReader.cpp 7318



Suppose the array contains one element, and the Index variable is also one. The condition (1> 1) is false, and as a result the array will be overrun. Correct check:



if (Index >= DeclsLoaded.size()) {


Fragment N4, the order of evaluation of arguments



void IHexELFBuilder::addDataSections() {
  ....
  uint32_t SecNo = 1;
  ....
  Section = &Obj->addSection<OwnedDataSection>(
      ".sec" + std::to_string(SecNo++), RecAddr,
      ELF::SHF_ALLOC | ELF::SHF_WRITE, SecNo);
  ....
}


PVS-Studio warning: V567 Unspecified behavior. The order of argument evaluation is not defined for 'addSection' function. Consider inspecting the 'SecNo' variable. Object.cpp 1223



Note that the SecNo argument is used twice and incremented along the way. It's just impossible to say in what order the arguments will be evaluated. Therefore, the result will vary depending on the compiler version or compilation settings.



Let me explain this with a synthetic example:



#include <cstdio>
int main()
{
  int i = 1;
  printf("%d, %d\n", i, i++);
  return 0;
}


Depending on the compiler, both "1, 2" and "2, 1" can be printed. Using the Compiler Explorer, I get the following results:



  • a program compiled with Clang 11.0.0 produces : 1, 1.
  • a program compiled with GCC 10.2 produces : 2, 1.


Interestingly, for this simple case, the Clang compiler issues a warning:



<source>:6:26: warning:
unsequenced modification and access to 'i' [-Wunsequenced]
printf("%d, %d\n", i, i++);


Apparently, in real conditions, this warning did not help. Diagnostics is either disabled, as it is inconvenient for practical use, or the compiler was unable to warn about a more complex case.



Fragment N5, strange retest



template <class ELFT>
void GNUStyle<ELFT>::printVersionSymbolSection(const ELFFile<ELFT> *Obj,
                                               const Elf_Shdr *Sec) {

  ....
  Expected<StringRef> NameOrErr =
      this->dumper()->getSymbolVersionByIndex(Ndx, IsDefault);
  if (!NameOrErr) {
    if (!NameOrErr) {
      unsigned SecNdx = Sec - &cantFail(Obj->sections()).front();
      this->reportUniqueWarning(createError(
          "unable to get a version for entry " + Twine(I) +
          " of SHT_GNU_versym section with index " + Twine(SecNdx) + ": " +
          toString(NameOrErr.takeError())));
    }
    Versions.emplace_back("<corrupt>");
    continue;
  }
  ....
}


PVS-Studio warning: V571 Recurring check. The 'if (! NameOrErr)' condition was already verified in line 4666. ELFDumper.cpp 4667



The second check duplicates the first and is redundant. Perhaps the second check can simply be removed. But this is most likely a typo, and a different variable must be used in the second condition.



Snippet N6, dereferencing a potentially null pointer



void RewriteObjCFragileABI::RewriteObjCClassMetaData(
  ObjCImplementationDecl *IDecl, std::string &Result)
{
  ObjCInterfaceDecl *CDecl = IDecl->getClassInterface();

  if (CDecl->isImplicitInterfaceDecl()) {
    RewriteObjCInternalStruct(CDecl, Result);
  }

  unsigned NumIvars = !IDecl->ivar_empty()
  ? IDecl->ivar_size()
  : (CDecl ? CDecl->ivar_size() : 0);
  ....
}


PVS-Studio warning: V595 The 'CDecl' pointer was utilized before it was verified against nullptr. Check lines: 5275, 5284. RewriteObjC.cpp 5275



During the first check, the CDecl pointer is always boldly dereferenced:



if (CDecl->isImplicitInterfaceDecl())


And only from the code written below it becomes clear that this pointer can be null:



(CDecl ? CDecl->ivar_size() : 0)


Most likely, the first check should look like this:



if (CDecl && CDecl->isImplicitInterfaceDecl())


Snippet N7, dereferencing a potentially null pointer



bool
Sema::InstantiateClass(....)
{
  ....
  NamedDecl *ND = dyn_cast<NamedDecl>(I->NewDecl);
  CXXRecordDecl *ThisContext =
      dyn_cast_or_null<CXXRecordDecl>(ND->getDeclContext());
  CXXThisScopeRAII ThisScope(*this, ThisContext, Qualifiers(),
                              ND && ND->isCXXInstanceMember());
  ....
}


PVS-Studio warning: V595 The 'ND' pointer was utilized before it was verified against nullptr. Check lines: 2803, 2805. SemaTemplateInstantiate.cpp 2803



A variation of the previous error. It is dangerous to dereference a pointer without first checking it if its value is obtained using a dynamic cast. Moreover, from the code below, it is clear that such a check is needed.



Fragment N8, the function continues execution despite an error state



bool VerifyObject(llvm::yaml::Node &N,
                  std::map<std::string, std::string> Expected) {
  ....
  auto *V = llvm::dyn_cast_or_null<llvm::yaml::ScalarNode>(Prop.getValue());
  if (!V) {
    ADD_FAILURE() << KS << " is not a string";
    Match = false;
  }
  std::string VS = V->getValue(Tmp).str();
  ....
}


PVS-Studio warning: V1004 The 'V' pointer was used unsafely after it was verified against nullptr. Check lines: 61, 65. TraceTests.cpp 65



Pointer V may be null. This is clearly an erroneous condition, which is even reported. However, after that, the function continues execution as if nothing had happened, which will lead to dereferencing of this very null pointer. Perhaps they forgot to interrupt the function, and the correct option should look like this:



auto *V = llvm::dyn_cast_or_null<llvm::yaml::ScalarNode>(Prop.getValue());
if (!V) {
  ADD_FAILURE() << KS << " is not a string";
  Match = false;
  return false;
}
std::string VS = V->getValue(Tmp).str();


Snippet N9, typo



const char *tools::SplitDebugName(const ArgList &Args, const InputInfo &Input,
                                  const InputInfo &Output) {
  if (Arg *A = Args.getLastArg(options::OPT_gsplit_dwarf_EQ))
    if (StringRef(A->getValue()) == "single")
      return Args.MakeArgString(Output.getFilename());

  Arg *FinalOutput = Args.getLastArg(options::OPT_o);
  if (FinalOutput && Args.hasArg(options::OPT_c)) {
    SmallString<128> T(FinalOutput->getValue());
    llvm::sys::path::replace_extension(T, "dwo");
    return Args.MakeArgString(T);
  } else {
    // Use the compilation dir.
    SmallString<128> T(
        Args.getLastArgValue(options::OPT_fdebug_compilation_dir));
    SmallString<128> F(llvm::sys::path::stem(Input.getBaseInput()));
    llvm::sys::path::replace_extension(F, "dwo");
    T += F;
    return Args.MakeArgString(F);       // <=
  }
}


PVS-Studio warning: V1001 The 'T' variable is assigned but is not used by the end of the function. CommonArgs.cpp 873



Notice the end of the function. The local variable T is changed but not used in any way. Most likely, this is a typo and the function should end with the following lines of code:



T += F;
return Args.MakeArgString(T);


Fragment N10, divisor is zero



typedef int32_t si_int;
typedef uint32_t su_int;

typedef union {
  du_int all;
  struct {
#if _YUGA_LITTLE_ENDIAN
    su_int low;
    su_int high;
#else
    su_int high;
    su_int low;
#endif // _YUGA_LITTLE_ENDIAN
  } s;
} udwords;

COMPILER_RT_ABI du_int __udivmoddi4(du_int a, du_int b, du_int *rem) {
  ....
  if (d.s.low == 0) {
    if (d.s.high == 0) {
      // K X
      // ---
      // 0 0
      if (rem)
        *rem = n.s.high % d.s.low;
      return n.s.high / d.s.low;
    }
  ....
}


PVS-Studio warnings:



  • V609 Mod by zero. Denominator 'dslow' == 0.udivmoddi4.c 61
  • V609 Divide by zero. Denominator 'dslow' == 0.udivmoddi4.c 62


I don’t know if this is a mistake or some tricky idea, but the code is very strange. There are two ordinary integer variables, and one is divisible by the other. Interestingly, this only happens if both variables are zero. What does all this mean?



Fragment N11, Copy-Paste



bool MallocChecker::mayFreeAnyEscapedMemoryOrIsModeledExplicitly(....)
{
  ....
  StringRef FName = II->getName();
  ....
  if (FName == "postEvent" &&
      FD->getQualifiedNameAsString() == "QCoreApplication::postEvent") {
    return true;
  }

  if (FName == "postEvent" &&
      FD->getQualifiedNameAsString() == "QCoreApplication::postEvent") {
    return true;
  }
  ....
}


PVS-Studio warning: V581 The conditional expressions of the 'if' statements situated alongside each other are identical. Check lines: 3108, 3113. MallocChecker.cpp 3113



The code snippet was copied but not changed in any way. The second snippet should either be removed or modified to begin performing some useful check.



Conclusion





Let me remind you that you can use this free license option to check open source projects . By the way, there are other options for free licensing PVS-Studio, including even for closed projects. They are listed here: " Free PVS-Studio licensing options ". Thank you for attention.



Our other articles about compiler checking









If you want to share this article with an English-speaking audience, please use the translation link: Andrey Karpov. Checking Clang 11 with PVS-Studio .



All Articles