Do you remember nullable value types well? We look "under the hood"

image1.png


Recently, nullable reference types have become a hot topic. However, the good old nullable value types have not gone away and are still actively used. Do you remember well the nuances of working with them? I suggest you refresh or test your knowledge by reading this article. Sample C # and IL code, references to the CLI specification and CoreCLR code are included. I propose to start with an interesting problem.



Note . If you are interested in nullable reference types, you can check out some of my colleagues' articles: " Nullable Reference Types in C # 8.0 and Static Analysis ", " Nullable References Do Not Protect, and Here's the Proof ."



Take a look at the example code below and answer what will be output to the console. And, just as importantly, why. Just let's immediately agree that you will answer as it is: without compiler hints, documentation, reading literature, or something like that. :)



static void NullableTest()
{
  int? a = null;
  object aObj = a;

  int? b = new int?();
  object bObj = b;

  Console.WriteLine(Object.ReferenceEquals(aObj, bObj)); // True or False?
}


image2.png


Well, let's think a little. Let's take a few main lines of thought that, it seems to me, can arise.



1. Proceeding from the fact that int? - reference type.



Let's reason like this, what is int? Is a reference type. In this case, a value is written to null , it will also be recorded and aObj after assignment. A reference to some object will be written in b . It will also be written to bObj after assignment. As a result, Object.ReferenceEquals will take null and a non- null reference to the object as arguments , so ...



It's obvious, the answer is False!



2. We proceed from the fact that int? - significant type.



Or maybe you doubt that int? - reference type? And are you sure of this despite the int expression ? a = null ? Well, let's go from the other side and start from what is int? - significant type.



In this case, the expression int? a = null looks a little strange, but suppose that again in C # sugar was poured on top. It turns out that a stores some kind of object. b also stores some kind of object. When initializing the variables aObj and bObj , the objects stored in a and b will be packed, as a result of which different references will be written to aObj and bObj . It turns out that Object.ReferenceEquals takes references to different objects as arguments, therefore ...



Everything is obvious, the answer is False!



3. We assume that Nullable <T> is used here .



Let's say you didn't like the options above. Because you know perfectly well that there is no int? actually not, but there is a value type Nullable <T> , and in this case Nullable <int> will be used . Also you understand that in fact in a and bthere will be identical objects. At the same time, you did not forget that when writing values ​​to aObj and bObj , packing will occur, and as a result, references to different objects will be obtained. Since Object.ReferenceEquals accepts references to different objects, then ...



It's obvious, the answer is False!



4.;)



For those who started from value types - if you suddenly have any doubts about comparing references, you can look at the documentation on Object.ReferenceEquals at docs.microsoft.com... In particular, it also touches on the topic of value types and boxing / unboxing. True, there is described a case when instances of significant types are passed directly to the method, we took out the packaging separately, but the essence is the same.



When comparing value types. If objA and objB are value types, they are boxed before they are passed to the ReferenceEquals method. This means that if both objA and objB represent the same instance of a value type , the ReferenceEquals method nevertheless returns false , as the following example shows.



It would seem that here the article can be finished, but only ... the correct answer is True .



Well, let's figure it out.



Understanding



There are two ways - simple and interesting.



The easy way



int? Is Nullable <int> . Open the Nullable <T> documentation , where we look at the "Boxing and Unboxing" section. In principle, that's all - the behavior is described there. But if you want more details, I invite you on an interesting path. ;)



Interesting way



We won't have enough documentation on this path. She describes the behavior, but does not answer the question 'why'?



What is an int actually ? and null in the appropriate context? Why does it work like this? Does the IL code use different commands or not? Is the behavior different at the CLR level? Any other magic?



Let's start by parsing the int entity ? to remember the basics, and gradually get to the analysis of the original case. Since C # is a rather "luscious" language, we will periodically refer to the IL code to look at the essence of things (yes, C # documentation is not our way today).



int ?, Nullable <T>



Here we will look at the basics of nullable value types in principle (what they are, what they compile to in IL, etc.). The answer to the question from the assignment is discussed in the next section.



Let's look at a piece of code.



int? aVal = null;
int? bVal = new int?();
Nullable<int> cVal = null;
Nullable<int> dVal = new Nullable<int>();


Despite the fact that initialization of these variables looks different in C #, the same IL code will be generated for all of them.



.locals init (valuetype [System.Runtime]System.Nullable`1<int32> V_0,
              valuetype [System.Runtime]System.Nullable`1<int32> V_1,
              valuetype [System.Runtime]System.Nullable`1<int32> V_2,
              valuetype [System.Runtime]System.Nullable`1<int32> V_3)

// aVal
ldloca.s V_0
initobj  valuetype [System.Runtime]System.Nullable`1<int32>

// bVal
ldloca.s V_1
initobj  valuetype [System.Runtime]System.Nullable`1<int32>

// cVal
ldloca.s V_2
initobj  valuetype [System.Runtime]System.Nullable`1<int32>

// dVal
ldloca.s V_3
initobj  valuetype [System.Runtime]System.Nullable`1<int32>


As you can see, in C # everything is spiced up with syntactic sugar from the heart so that you and I can live better, in fact:



  • int? - significant type.
  • int? - the same as Nullable <int>. The IL code is working with Nullable <int32> .
  • int? aVal = null is the same as Nullable <int> aVal = new Nullable <int> () . In IL, this expands into an initobj statement that performs default initialization at the loaded address.


Consider the following piece of code:



int? aVal = 62;


We figured out the default initialization - we saw the corresponding IL code above. What happens here when we want to initialize aVal to 62?



Let's take a look at the IL code:



.locals init (valuetype [System.Runtime]System.Nullable`1<int32> V_0)
ldloca.s   V_1
ldc.i4.s   62
call       instance void valuetype 
           [System.Runtime]System.Nullable`1<int32>::.ctor(!0)


Again, nothing complicated - the address aVal is loaded onto the evaluation stack , as well as the value 62, and then the constructor with the signature Nullable <T> (T) is called. That is, the following two expressions will be completely identical:



int? aVal = 62;
Nullable<int> bVal = new Nullable<int>(62);


You can see the same by looking at the IL code again:



// int? aVal;
// Nullable<int> bVal;
.locals init (valuetype [System.Runtime]System.Nullable`1<int32> V_0,
              valuetype [System.Runtime]System.Nullable`1<int32> V_1)

// aVal = 62
ldloca.s   V_0
ldc.i4.s   62
call       instance void valuetype                           
           [System.Runtime]System.Nullable`1<int32>::.ctor(!0)

// bVal = new Nullable<int>(62)
ldloca.s   V_1
ldc.i4.s   62
call       instance void valuetype                             
           [System.Runtime]System.Nullable`1<int32>::.ctor(!0)


What about inspections? For example, what does the following code actually look like?



bool IsDefault(int? value) => value == null;


That's right, for understanding, let's turn to the corresponding IL code again.



.method private hidebysig instance bool
IsDefault(valuetype [System.Runtime]System.Nullable`1<int32> 'value')
cil managed
{
  .maxstack  8
  ldarga.s   'value'
  call       instance bool valuetype 
             [System.Runtime]System.Nullable`1<int32>::get_HasValue()
  ldc.i4.0
  ceq
  ret
}


As you might have guessed, there is actually no null - all that happens is a call to the Nullable <T> .HasValue property . That is, the same logic in C # can be written more explicitly in terms of the entities used as follows.



bool IsDefaultVerbose(Nullable<int> value) => !value.HasValue;


IL code:



.method private hidebysig instance bool 
IsDefaultVerbose(valuetype [System.Runtime]System.Nullable`1<int32> 'value')
cil managed
{
  .maxstack  8
  ldarga.s   'value'
  call       instance bool valuetype 
             [System.Runtime]System.Nullable`1<int32>::get_HasValue()
  ldc.i4.0
  ceq
  ret
}




Let's summarize:



  • Nullable value types are implemented at the expense of the Nullable <T> type ;
  • int? - actually the constructed type of the generic value type Nullable <T> ;
  • int? a = null - initialization of an object of type Nullable <int> with the default value, there is actually no null here;
  • if (a == null) - again, there is no null , there is a call to the Nullable <T> .HasValue property .


The source code of the Nullable <T> type can be viewed, for example, on GitHub in the dotnet / runtime repository - a direct link to the source code file . There is not much code, so for the sake of interest, I advise you to look through. From there, you can learn (or remember) the following facts.



For convenience, the Nullable <T> type defines:



  • implicit conversion operator from T to Nullable <T> ;
  • explicit conversion operator from Nullable <T> to T .


The main logic of work is implemented due to two fields (and corresponding properties):



  • T value - the value itself, wrapped over which is Nullable <T> ;
  • bool hasValue is a flag indicating whether the wrapper contains a value. In quotation marks, as in fact Nullable <T> always contains a value of type T .


Now that we have a refresher on nullable value types, let's see what's up with the packaging.



Nullable <T> packing



Let me remind you that when packing an object of a value type, a new object will be created on the heap. The following code snippet illustrates this behavior:



int aVal = 62;
object obj1 = aVal;
object obj2 = aVal;

Console.WriteLine(Object.ReferenceEquals(obj1, obj2));


The result of comparing references is expected to be false , since 2 boxing operations have occurred and two objects were created, references to which were written in obj1 and obj2 .



Now change int to Nullable <int> .



Nullable<int> aVal = 62;
object obj1 = aVal;
object obj2 = aVal;

Console.WriteLine(Object.ReferenceEquals(obj1, obj2));


The result is still expected - false .



And now, instead of 62, we write the default value.



Nullable<int> aVal = new Nullable<int>();
object obj1 = aVal;
object obj2 = aVal;

Console.WriteLine(Object.ReferenceEquals(obj1, obj2));


Iii ... the result is suddenly true . It would seem that we have all the same 2 packing operations, creating two objects and links to two different objects, but the result is true !



Yeah, it's probably sugar again, and something has changed at the IL code level! Let's see.



Example N1.



C # code:



int aVal = 62;
object aObj = aVal;


IL code:



.locals init (int32 V_0,
              object V_1)

// aVal = 62
ldc.i4.s   62
stloc.0

//  aVal
ldloc.0
box        [System.Runtime]System.Int32

//     aObj
stloc.1


Example N2.



C # code:



Nullable<int> aVal = 62;
object aObj = aVal;


IL code:



.locals init (valuetype [System.Runtime]System.Nullable`1<int32> V_0,
              object V_1)

// aVal = new Nullablt<int>(62)
ldloca.s   V_0
ldc.i4.s   62
call       instance void
           valuetype [System.Runtime]System.Nullable`1<int32>::.ctor(!0)

//  aVal
ldloc.0
box        valuetype [System.Runtime]System.Nullable`1<int32>

//     aObj
stloc.1


Example N3.



C # code:



Nullable<int> aVal = new Nullable<int>();
object aObj = aVal;


IL code:



.locals init (valuetype [System.Runtime]System.Nullable`1<int32> V_0,
              object V_1)

// aVal = new Nullable<int>()
ldloca.s   V_0
initobj    valuetype [System.Runtime]System.Nullable`1<int32>

//  aVal
ldloc.0
box        valuetype [System.Runtime]System.Nullable`1<int32>

//     aObj
stloc.1


As we can see, packing is done in the same way everywhere - the values ​​of local variables are loaded onto the evaluation stack ( ldloc instruction ), after which packing itself takes place by calling the box command , for which it is indicated which type we will be packing.



We turn to the Common Language Infrastructure specification , look at the description of the box command and find an interesting note regarding nullable types:



If typeTok is a value type, the box instruction converts val to its boxed form. ...If it is a nullable type, this is done by inspecting val's HasValue property; if it is false, a null reference is pushed onto the stack; otherwise, the result of boxing val's Value property is pushed onto the stack.



From here there are several conclusions that dot the 'i':



  • the state of the Nullable <T> object is taken into account (the HasValue flag we considered earlier is checked ). If Nullable <T> does not contain a value ( HasValue is false ), the box will result in null ;
  • if Nullable <T> contains the value ( HasValue - true ), then not the Nullable <T> object will be packed , but an instance of the T type , which is stored in the value field of the Nullable <T> type ;
  • the specific logic for handling packing Nullable <T> is not implemented at the C # level or even at the IL level - it is implemented in the CLR.


Let's go back to the Nullable <T> examples discussed above.



First:



Nullable<int> aVal = 62;
object obj1 = aVal;
object obj2 = aVal;

Console.WriteLine(Object.ReferenceEquals(obj1, obj2));


Item condition before packing:



  • T -> int ;
  • value -> 62 ;
  • hasValue -> true .


The value 62 is packed twice (remember that in this case, instances of int type are packed , not Nullable <int> ), 2 new objects are created, 2 references to different objects are obtained, the result of which is false .



Second:



Nullable<int> aVal = new Nullable<int>();
object obj1 = aVal;
object obj2 = aVal;

Console.WriteLine(Object.ReferenceEquals(obj1, obj2));


Item condition before packing:



  • T -> int ;
  • value -> default (in this case, 0 is the default value for int );
  • hasValue -> false .


Since hasValue is false , no objects are created on the heap, and the box operation returns null , which is written to the variables obj1 and obj2 . Comparing these values, as expected, gives true .



In the original example, which was at the very beginning of the article, exactly the same thing happens:



static void NullableTest()
{
  int? a = null;       // default value of Nullable<int>
  object aObj = a;     // null

  int? b = new int?(); // default value of Nullable<int>
  object bObj = b;     // null

  Console.WriteLine(Object.ReferenceEquals(aObj, bObj)); // null == null
}


For fun, let's take a look at the CoreCLR source code from the dotnet / runtime repository mentioned earlier . We are interested in the object.cpp file , specifically - the Nullable :: Box method , which contains the logic we need:



OBJECTREF Nullable::Box(void* srcPtr, MethodTable* nullableMT)
{
  CONTRACTL
  {
    THROWS;
    GC_TRIGGERS;
    MODE_COOPERATIVE;
  }
  CONTRACTL_END;

  FAULT_NOT_FATAL();      // FIX_NOW: why do we need this?

  Nullable* src = (Nullable*) srcPtr;

  _ASSERTE(IsNullableType(nullableMT));
  // We better have a concrete instantiation, 
  // or our field offset asserts are not useful
  _ASSERTE(!nullableMT->ContainsGenericVariables());

  if (!*src->HasValueAddr(nullableMT))
    return NULL;

  OBJECTREF obj = 0;
  GCPROTECT_BEGININTERIOR (src);
  MethodTable* argMT = nullableMT->GetInstantiation()[0].AsMethodTable();
  obj = argMT->Allocate();
  CopyValueClass(obj->UnBox(), src->ValueAddr(nullableMT), argMT);
  GCPROTECT_END ();

  return obj;
}


Here is everything that we talked about above. If we don't store the value, we return NULL :



if (!*src->HasValueAddr(nullableMT))
    return NULL;


Otherwise, we produce packaging:



OBJECTREF obj = 0;
GCPROTECT_BEGININTERIOR (src);
MethodTable* argMT = nullableMT->GetInstantiation()[0].AsMethodTable();
obj = argMT->Allocate();
CopyValueClass(obj->UnBox(), src->ValueAddr(nullableMT), argMT);


Conclusion



For the sake of interest, I propose to show an example from the beginning of the article to my colleagues and friends. Will they be able to give the correct answer and substantiate it? If not, invite them to read the article. If they can - well, my respect!



I hope it was a small but fun adventure. :)



PS Someone might have a question: how did the immersion in this topic begin? We made a new diagnostic rule in PVS-Studio about the fact that Object.ReferenceEquals works with arguments, one of which is represented by a significant type. Suddenly it turned out that with Nullable <T> there is an unexpected moment in packing behavior. We looked at the IL code - box as box... Have a look at the CLI specification - yeah, that's it! It seemed that this is a rather interesting case, which is worth telling - once! - and the article is in front of you.





If you want to share this article with an English-speaking audience, please use the translation link: Sergey Vasiliev. Check how you remember nullable value types. Let's peek under the hood .



PPS By the way, recently I have been a little more active on Twitter, where I post some interesting code snippets, retweet some interesting news from the .NET world and something like that. I propose to look through, if you are interested - subscribe ( link to the profile ).



All Articles