Sunday, 18 November 2012

Nullable Types

In this post, I'll talk about nullable types in .Net. The material here are basically what was available in the book and I will try to do as good as a job as Jon has. Okay then, let's dive in.

Why do we need null ?

Firstly, let's see why we need to have a value representing the non-existence of a value. In development environments mostly in all fields, there is always a time when a piece of data is missing at a certain point in the application's execution and it would get filled later on. This can be a field representing the date of an account's closing or the time of death of an individual or mobile phone number of a person who does not have a cell phone yet. Databases handle this non existence quite easily. In DBMSs we can have the value NULL for any type of field. That is not the case with strongly typed programming languages like C# or Java. In C#, we have already seen that references can be null. That just means that the object does not point to anything on the heap at the current moment. But value types are another story. In order for a value type variable to hold null, a special value has to be designated as null. This solution is known as the "Magic Number" solution in which one sacrifices one special value to be able to represent missing data (e.g. DateTime.MinValue). This can not always be done though since for example which possible number for a byte should be chosen to represent the null value ? There are 256 possible values and the assignment of any of these values to null would mean that we can no longer read byte data as input properly. The need to use null values then, has lead to all different sorts of patterns to make it available. 

Attempting to solve the problem in C# 1.0

One of the ways we have already discussed and is already available in C# 1.0 is the magic number pattern. This as we stated before is suboptimal and would waste memory and also produce all sorts of problems. Another method that is possible is wrapping the value type in an object. This effectively would allow the object to hold null for its reference and the value itself would be on the heap. This solution kind of goes against the reason for using value types in the first place. Value types are not objects and they should not have to require garbage collection, memory management, etc. Not only would this approach cause a lot of casting and consequentially performance problems but also, this would waste memory since each object in the .Net framework in a 32-bit CLR for example has 8 bytes of overhead. 
Yet another solution is to store a boolean flag with the value type we want to make nullable. The flag would basically be true when the value is valid and false when the value is not(null). But then this would mean that you would have to create a type(struct) for any of the values you want to make nullable ! If you have been following with the generics posts you might be thinking that well why not create a generic type ? Aaand you would be right ! that's actually how the language designers of C# have gone about this. There is more to what they did though and we'll know of the differences by the end of this post.

Nullable<T> and System.Nullable

The Nullable<T> generic type represents a value type that can be made null. This is where the information we learned on generics comes in handy. How can we limit the type arguments supplied to the generic class to be value types? With type constraints of course(the struct constraint). Also remember that we can submit all value types to the parameter of a generic with a "struct" constraint BUT the Nullable<T> type. So you can't have nested nullable types. The Nullable<T> type has two important parameters:
  • HasValue()
  • Value
You can probably guess what each of these properties do. The Value property return the real non-nullable value which is also known as the "underlying type" or if there is no "real" value it returns an InvalidOperationException. How does it know that the value is real or not ? It intrinsically holds a boolean flag which is set when the value is non-null. This is were we get into the differences that the Nullable<T> type has with  a structure that we could define ourselves. Firstly, if we had created our own structure, when the structure was wrapped in a reference it would have been a reference which points to a struct in memory. With Nullable<T> however this is not the case. The .Net framework would strip the underlying type from the struct and would put either null as the variable or the real type wrapped in an object. So boxing and unboxing do not follow the regular rules. Also, an implicit conversion exists from a value type to a nullable type and an explicit conversion exists from a nullable type to a value type. The conversion is basically the same as using the Value property; meaning you'll get an exception if there is no real value or you'll get the real value if it exists.
Nullable<T> has a peculiar cousin. The System.Nullable static class. This class holds only three methods, two of which are generic methods for checking equality and comparison and one method for getting the underlying type of the nullable. I really don't understand why these three methods couldn't be in the nullable type itself.
So far we saw how the nullable type is implemented in .Net. But all this use of generics means that we have to wrap the value type inside another type and then use it which kind of takes the attention away from the type itself. This is why in .Net 2.0 we have a specific syntax for these types. The syntax is basically putting a ? after the type's name:

int? nullableInteger; 
nullableInteger = new int?();
nullableInteger = 2;
nullableInteger = null;
if(nullableInteger == null)
    nullableInteger = new int?(2);
All of the statements seen in the snippet are valid and also any occurrence of int? can be replaced by Nullable<T> since they both produce exactly the same IL.
The comparison between nullable value types and non-nullable value types is quite similar to ordinary comparison. As mentioned before the conversion from the value type to it's nullable counter part is implicit and the opposite is explicit. What abour operators ? This is another difference between the framework's implementation of nullables with a hypothetical custom nullable type written by a user. Most operators (unary, binary) work the same way with nullable types. The difference is with the return type of these operators for example the return type in binary operators like +, -, * and / would change to Nullable<T> instead of T if any of the operators is a nullable. Those operators with return value of bool however keep their type and return false if either side of the operator is null. The operators that are overloaded for nullables are called lifted operators. The rule of thumb is to just expect normal behaviour but be careful if your code breaks as the culprit may lie with the differences introduced by nullable conversion/overloading. 
Lastly, there is another operator introduced with nullables in C# 2.0. This operator is called the null coalescing operator("??"). This operator could be very useful when dealing with nulls: 

int? a = null;
int b = 5
int? c = a ?? b;
int? d = b ?? a;
int? e = c ?? b ?? a;
The value of c in the snippet above would be null and the value of b would be 5 and the value of e would be also 5. This operator can be used in a chain as shown above. The result is the first non-null operand !
Some new design patterns can be thought of with the introduction of these operators as well. For example returning a value from a function as well as a "successful", "failed" logic. This obviously can be implemented by having the function return a nullable type instead of using an output parameter plus a boolean return value.
This is it for nullable types. I really wanted to keep this post short. It is shorter than generics anyway ! From the next post we would start looking into some more interesting stuff with delegates, anonymous methods and iterator blocks. Until then, happy coding :)

No comments:

Post a Comment