PVS-Studio for Java under the hood: diagnostics development



For a change, today we will tell you a little about the process of developing and finalizing diagnostic rules for PVS-Studio Java. Let's see why the old analyzer triggers don't float too much from release to release, and the new ones are not too crazy. And we will spoil a little more "what are the plans of the javists" and show a couple of beautiful (and not so) errors found with the help of diagnostics from the next release.



Diagnostics and SelfTester Development Process



Naturally, each new diagnostic rule starts with an idea. And since the Java analyzer is the youngest direction in PVS-Studio's development, we basically steal these ideas from the C / C ++ and C # departments. But not everything is so bad: we also add rules that were invented by ourselves (including by users - thanks!), So that later the same departments would steal them from us. The cycle, as they say.



The very implementation of the rules in the code in most cases turns out to be quite a pipeline task. You create a file with several synthetic examples, mark with your hands where errors should be, and with a debugger at the ready you run through the syntax tree until you get bored and cover all the invented cases. Sometimes the rules turn out to be absurdly simple (for example, V6063consists of literally a few lines), and sometimes you have to think long enough about the logic.



However, this is just the beginning. As you know, we do not particularly like synthetic examples, since they very poorly reflect the type of analyzer triggers in real projects. By the way, a significant part of these examples in our unit tests are drawn from real projects - it's almost impossible to invent all possible cases on your own. And unit tests also allow us not to lose triggers on examples from the documentation. There were precedents, yes, only shh.



So, positives in real projects must be found in some way first. And you also need to somehow check that:



  • The rule will not fall on open-source madness, where "interesting" solutions are common;
  • ( - , );
  • data-flow ( ) - ;
  • open-source ;
  • over 9000%;
  • "" , ;
  • .


In general, here, like a knight on a horse (a little limping, but we are working on it), SelfTester comes to the fore. Its main and only task is to automatically check a bunch of projects and show which triggers have been added, disappeared or changed relative to the "reference" in the version control system. Provide diffs for the analyzer report and show the corresponding code in projects, in short. Currently, SelfTester for Java is testing 62 open-source projects of bearded versions, among which, for example, DBeaver, Hibernate and Spring are listed. One full run of all projects takes 2-2.5 hours, which is undoubtedly painful, but nothing can be done.





In the screenshot above, the "green" projects are those in which nothing has changed. Each diff in "red" projects is reviewed manually and, if correct, is confirmed by the same "Approve" button. By the way, the analyzer distribution kit will only be compiled if SelfTester gives a purely green result. In general, this is how we maintain the consistency of results between different versions.



In addition to maintaining consistency of results, SelfTester allows us to get rid of a huge number of false positives even before the release of diagnostics. A typical development pattern looks like this:



  • , . , " double-checked locking" ;
  • SelfTester-, ;
  • , -;
  • SelfTester- , ;
  • 3-4, ;
  • , , ( , );
  • , master.


Fortunately, full SelfTester runs are rare enough, and you don't have to wait "2-2.5 hours" very often. From time to time, luck bypasses, and triggers turn out to be on large projects like Sakai and Apache Hive - it's time to drink coffee, drink coffee and drink coffee. You can also study the documentation, but this is not for everybody.



"Why do we need unit tests, since there is such a magic tool?"



And then that tests are significantly faster. A few minutes - and there is already a result. They also allow you to see exactly which part of the rule fell off. And, besides, not always all permissible triggerings of any rule are caught in SelfTester projects, but their operability must also be checked.



New problems in old acquaintances



Initially, this section of the article began with the words "The versions of projects in SelfTester are quite old, so most of the errors presented have most likely already been fixed." However, when I decided to make sure of this, I was in for a surprise. Every single mistake remained in place. Everything. This means two things:



  • These errors are not super critical for the application to function. Many of them, by the way, are in the test code, and it is difficult to call incorrect tests consistent.
  • These errors are found in infrequently used files of large projects, which developers hardly go to. Because of this, the incorrect code is doomed to lie there for a very long time: most likely, until some critical bug occurs due to it.


For those wishing to dig deeper, there will be links to specific versions that we are checking.



PS The above does not mean that static analysis catches only harmless errors in unused code. We check the release (and almost release) versions of projects - in which the developers and testers (and sometimes, unfortunately, users) found the most relevant bugs by hand, which is long, expensive and painful. You can read more about this in our article " Errors that static code analysis cannot find, because it is not used ".



Apache Dubbo and blank menu



GitHub



Diagnostics " V6080 Consider checking for misprints. It's possible that an assigned variable should be checked in the next condition " was already released in version 7.08, but has not appeared in our articles yet, so it's time to fix it.



Menu.java:40



public class Menu
{
  private Map<String, List<String>> menus = new HashMap<String, List<String>>();

  public void putMenuItem(String menu, String item)
  {
    List<String> items = menus.get(menu);
    if (item == null)                      // <=
    {
      items = new ArrayList<String>();
      menus.put(menu, items);
    }
    items.add(item);
  }
  ....
}


A classic example of a "key-collection" dictionary and an equally classic typo. The developer wanted to create a collection corresponding to the key, if it does not already exist, but he mixed up the name of the variable and got not only an incorrect operation of the method, but also a NullPointerException in the last line. For Java 8 and later, to implement such dictionaries, you should use the computeIfAbsent method :



public class Menu
{
  private Map<String, List<String>> menus = new HashMap<String, List<String>>();

  public void putMenuItem(String menu, String item)
  {
    List<String> items = menus.computeIfAbsent(menu, key -> new ArrayList<>());
    items.add(item);
  }
  ....
}


Glassfish and double-checked locking



GitHub



One of the diagnostics that will be included in the next release is checking the correct implementation of the "double-checked locking" pattern. Glassfish turned out to be the record holder for detections from SelfTester projects: in total, PVS-Studio found 10 problem areas in the project using this rule. I suggest the reader to have some fun and look for two of them in the code snippet below. For help, refer to the documentation: " V6082 Unsafe double-checked locking ". Well, or, if you don't want to at all, at the end of the article.



EjbComponentAnnotationScanner.java



public class EjbComponentAnnotationScanner
{
  private Set<String> annotations = null;

  public boolean isAnnotation(String value)
  {
    if (annotations == null)
    {
      synchronized (EjbComponentAnnotationScanner.class)
      {
        if (annotations == null)
        {
          init();
        }
      }
    }
    return annotations.contains(value);
  }

  private void init()
  {
    annotations = new HashSet();
    annotations.add("Ljavax/ejb/Stateless;");
    annotations.add("Ljavax/ejb/Stateful;");
    annotations.add("Ljavax/ejb/MessageDriven;");
    annotations.add("Ljavax/ejb/Singleton;");
  }

  ....
}


SonarQube and data-flow



GitHub



Improving diagnostics is not only about directly modifying their code in order to catch more suspicious places or remove false positives. Manual marking of methods for data-flow also plays an important role in the development of the analyzer - for example, you can write that such and such a library method always returns non-null. When writing a new diagnostic, we accidentally discovered that the Map # clear () method was not marked up . Apart from the obviously stupid code that the " V6009 Collection is empty. The call of the 'clear' function is senseless " diagnostics began to catch , we were able to find a great typo.



MetricRepositoryRule.java:90



protected void after()
{
  this.metricsById.clear();
  this.metricsById.clear();
}


At first glance, clearing the dictionary again is not a mistake. And we would even think that this is a randomly duplicated line, if our gaze had not dropped a little lower - literally to the next method.



protected void after()
{
  this.metricsById.clear();
  this.metricsById.clear();
}
public Metric getByKey(String key)
{
  Metric res = metricsByKey.get(key);
  ....
}


Exactly. The class has two fields with similar names metricsById and metricsByKey . I'm sure that in the after method the developer wanted to clear both dictionaries, but either the auto-completion failed him, or he inertia entered the same name. Thus, the two dictionaries that hold related data will be out of sync after the call to after .



Sakai and empty collections



GitHub



Another new diagnostic that will be included in the next release is " V6084 Suspicious return of an always empty collection ". It's easy enough to forget to add items to the collection, especially when each item needs to be initialized first. From personal experience, such errors most often lead not to a crash of the application, but to strange behavior or the absence of any functionality.



DateModel.java:361



public List getDaySelectItems()
{
  List selectDays = new ArrayList();
  Integer[] d = this.getDays();
  for (int i = 0; i < d.length; i++)
  {
    SelectItem selectDay = new SelectItem(d[i], d[i].toString());
  }
  return selectDays;
}


By the way, the same class contains very similar methods without the same error. For instance:



public List getMonthSelectItems()
{
  List selectMonths = new ArrayList();
  Integer[] m = this.getMonths();
  for (int i = 0; i < m.length; i++)
  {
    SelectItem selectMonth = new SelectItem(m[i], m[i].toString());
    selectMonths.add(selectMonth);
  }
  return selectMonths;
}


Plans for the future



Apart from various not very interesting internal things, we are thinking about adding diagnostics for the Spring Framework to the Java analyzer. It is not only the main bread for Javists, but also contains a lot of non-obvious moments on which one can stumble. We are not yet very sure in what form these diagnostics will eventually appear, when it will happen and whether it will happen at all. But we are sure that we need ideas for them and open-source projects using Spring for SelfTester. So, if you have something in mind, suggest it (in comments or private messages, you can also)! And the more of this goodness we collect, the more priority will shift to it.



And finally, there are errors in the double-checked locking implementation from Glassfish:



  • The field is not declared 'volatile'.
  • The object is first published and then initialized.


Why all this is bad - again, you can see in the documentation .





If you want to share this article with an English-speaking audience, please use the translation link: Nikita Lazeba. Under the Hood of PVS-Studio for Java: How We Develop Diagnostics .



All Articles