Sunday, 18 November 2012

Nullable Types

In this post, I'll talk about nullable types in .Net. The material here are basically what was available in the book and I will try to do as good as a job as Jon has. Okay then, let's dive in.

Why do we need null ?

Firstly, let's see why we need to have a value representing the non-existence of a value. In development environments mostly in all fields, there is always a time when a piece of data is missing at a certain point in the application's execution and it would get filled later on. This can be a field representing the date of an account's closing or the time of death of an individual or mobile phone number of a person who does not have a cell phone yet. Databases handle this non existence quite easily. In DBMSs we can have the value NULL for any type of field. That is not the case with strongly typed programming languages like C# or Java. In C#, we have already seen that references can be null. That just means that the object does not point to anything on the heap at the current moment. But value types are another story. In order for a value type variable to hold null, a special value has to be designated as null. This solution is known as the "Magic Number" solution in which one sacrifices one special value to be able to represent missing data (e.g. DateTime.MinValue). This can not always be done though since for example which possible number for a byte should be chosen to represent the null value ? There are 256 possible values and the assignment of any of these values to null would mean that we can no longer read byte data as input properly. The need to use null values then, has lead to all different sorts of patterns to make it available. 

Attempting to solve the problem in C# 1.0

One of the ways we have already discussed and is already available in C# 1.0 is the magic number pattern. This as we stated before is suboptimal and would waste memory and also produce all sorts of problems. Another method that is possible is wrapping the value type in an object. This effectively would allow the object to hold null for its reference and the value itself would be on the heap. This solution kind of goes against the reason for using value types in the first place. Value types are not objects and they should not have to require garbage collection, memory management, etc. Not only would this approach cause a lot of casting and consequentially performance problems but also, this would waste memory since each object in the .Net framework in a 32-bit CLR for example has 8 bytes of overhead. 
Yet another solution is to store a boolean flag with the value type we want to make nullable. The flag would basically be true when the value is valid and false when the value is not(null). But then this would mean that you would have to create a type(struct) for any of the values you want to make nullable ! If you have been following with the generics posts you might be thinking that well why not create a generic type ? Aaand you would be right ! that's actually how the language designers of C# have gone about this. There is more to what they did though and we'll know of the differences by the end of this post.

Nullable<T> and System.Nullable

The Nullable<T> generic type represents a value type that can be made null. This is where the information we learned on generics comes in handy. How can we limit the type arguments supplied to the generic class to be value types? With type constraints of course(the struct constraint). Also remember that we can submit all value types to the parameter of a generic with a "struct" constraint BUT the Nullable<T> type. So you can't have nested nullable types. The Nullable<T> type has two important parameters:
  • HasValue()
  • Value
You can probably guess what each of these properties do. The Value property return the real non-nullable value which is also known as the "underlying type" or if there is no "real" value it returns an InvalidOperationException. How does it know that the value is real or not ? It intrinsically holds a boolean flag which is set when the value is non-null. This is were we get into the differences that the Nullable<T> type has with  a structure that we could define ourselves. Firstly, if we had created our own structure, when the structure was wrapped in a reference it would have been a reference which points to a struct in memory. With Nullable<T> however this is not the case. The .Net framework would strip the underlying type from the struct and would put either null as the variable or the real type wrapped in an object. So boxing and unboxing do not follow the regular rules. Also, an implicit conversion exists from a value type to a nullable type and an explicit conversion exists from a nullable type to a value type. The conversion is basically the same as using the Value property; meaning you'll get an exception if there is no real value or you'll get the real value if it exists.
Nullable<T> has a peculiar cousin. The System.Nullable static class. This class holds only three methods, two of which are generic methods for checking equality and comparison and one method for getting the underlying type of the nullable. I really don't understand why these three methods couldn't be in the nullable type itself.
So far we saw how the nullable type is implemented in .Net. But all this use of generics means that we have to wrap the value type inside another type and then use it which kind of takes the attention away from the type itself. This is why in .Net 2.0 we have a specific syntax for these types. The syntax is basically putting a ? after the type's name:

int? nullableInteger; 
nullableInteger = new int?();
nullableInteger = 2;
nullableInteger = null;
if(nullableInteger == null)
    nullableInteger = new int?(2);
All of the statements seen in the snippet are valid and also any occurrence of int? can be replaced by Nullable<T> since they both produce exactly the same IL.
The comparison between nullable value types and non-nullable value types is quite similar to ordinary comparison. As mentioned before the conversion from the value type to it's nullable counter part is implicit and the opposite is explicit. What abour operators ? This is another difference between the framework's implementation of nullables with a hypothetical custom nullable type written by a user. Most operators (unary, binary) work the same way with nullable types. The difference is with the return type of these operators for example the return type in binary operators like +, -, * and / would change to Nullable<T> instead of T if any of the operators is a nullable. Those operators with return value of bool however keep their type and return false if either side of the operator is null. The operators that are overloaded for nullables are called lifted operators. The rule of thumb is to just expect normal behaviour but be careful if your code breaks as the culprit may lie with the differences introduced by nullable conversion/overloading. 
Lastly, there is another operator introduced with nullables in C# 2.0. This operator is called the null coalescing operator("??"). This operator could be very useful when dealing with nulls: 

int? a = null;
int b = 5
int? c = a ?? b;
int? d = b ?? a;
int? e = c ?? b ?? a;
The value of c in the snippet above would be null and the value of b would be 5 and the value of e would be also 5. This operator can be used in a chain as shown above. The result is the first non-null operand !
Some new design patterns can be thought of with the introduction of these operators as well. For example returning a value from a function as well as a "successful", "failed" logic. This obviously can be implemented by having the function return a nullable type instead of using an output parameter plus a boolean return value.
This is it for nullable types. I really wanted to keep this post short. It is shorter than generics anyway ! From the next post we would start looking into some more interesting stuff with delegates, anonymous methods and iterator blocks. Until then, happy coding :)

Saturday, 10 November 2012

Advanced Generics and Comparison with C++ and Java

In this post I will touch on some aspects of advanced generics mentioned in the book. The book covers some other aspects as well like "Reflection" which I don' see very important to cover here.
Firstly, we will discuss static fields and generics. Then we would continue with how JIT handles generics and finally we'll close with a comparison between the C++ and Java generics and C#.

Static fields defined in Generics

Static fields defined in a class do not belong to any specific object of the class but to the type itself. This means that when you declare a static field no matter how many objects you create there would be only one instance of these fields and they are shared between the classes. 
Now the question is : If we have a generic type that has some static variables. Should these variables be shared between different types T or should they be independent between types ?
The answer would be obvious if we look at the class we used in the previous post :

public class CustomList<T> : IEquatable<CustomList<T>> where T : IEqualityComparer<T>
{
    private T[] list;
    private static IEqualityComparer<T> comparer = EqualityComparer<T>.Default;

If the static member comparer is shared between types all sorts of problems will happen and of course, the choice of independent static members is the right one. This means that for Class<T>, any closed type would have it's own set of static or non-static members. This means that Class<string> and Class<int> would have different and independent members.

Generic iteration (IEnumerator<T>)

Chances are you have used this interface many times without knowing so. For example when you traversed a generic type in a foreach statement. In order for a type to be iteratable in a foreach statement, that type should implement the IEnumerable interface. This interface in turn would return a IEnumerator type that implements MoveNext() and Current. Now what this would mean for the generics ? What if we want to be able to iterate a generic class. If we implement the IEnumerable interface then we would be able to successfully iterate our generic type. But there is a problem to all this. A closer look at the IEnumerable interface and the methods we have to implement shows  that we have to implement this method in IEnumerator:

Object Current { get; }

This is were the problem arises. This method has to return the current element in the collection being iterated. Of course our generic type holds a collection of T objects and to return that type we need a cast to Object ! Isn't that why we got generics ? No extra boxing/unboxing, strong typing ?
Well the anwer is the IEnumerator<T> interface. Most interfaces have been extended to include generic types as well. IEnumerator<T> extends the IEnumertor to allow strong typing when working with generics. This though is not as smooth as you'd think since due to a design decision in the framework IEnumerator<T> extends the non-generic IEnumerator interface. This means that any class implementing IEnumerator<T> has to also implement IEnumerator ! So in order to implement the IEnumerator<T> interface we have to implement:

object IEnumerator.Current{}
{
    return Current();
}
public T Current{}

This design decision seems to be due to backward compatibility reasons(C# 1.0). As you can see here the implementation of two methods with the same name and different return values is possible by an explicit implementation of the non-generic one and having it call the generic version.

C# Generics vs. Counterparts in C++ and Java

In C++ generics exist as templates. These templates act as place holders in a macro definition. Basically after definition of a template type(type parameter), the compiler would just replace the value of the template with it's equivalent types at compile time. For example in the code below T is once replaced by int and once by long.


template<class T>
T min(T first, T second)
{
    return first<second ? first:second;
}

int main(int argc, char *argv[])
{
    long l1 = 2, l2 = 4;
    int i1 = 3, i2 = 5;
    long min1 = min<long>(l1, l2);
    int min2 = min<int>(i1, i2);
}

This of course, would mean we no longer need to add constraints to the type parameter used so that compile type checks can be conducted, since any type can be used for that position and at compile-time the compiler would check to see if any operation conducted is available for that type. This would add much more flexibility. This would allow the use of let's say operators on the type parameters. Doing so is not possible in C# and there is no constraint to enforce the availability of a certain operator overload. This would also mean that the C++ compiler can conduct optimization based on the types used. Although there is an optimization done in .Net that is not done in C++ and that is sharing of code for generics. If the generic is used in 5 different places in the code with different type arguments, the IL code would not have 5 different variations but only a reference to a common shared code. JIT then, would create as many different variations needed at execution time. The code is shared between generics with reference types as the type argument and not shared for value types. The reason for this is that a reference type would always have the same size (4 bytes for 32-bit CLR) but the value types can have various size(int, long, structs, ...). Lastly, C++ also allows the type arguments to not be types at all. In C++ intrinsic datatypes can be sent as arguments as well as functions.
I did talk about the concept of "variance" in .Net in the previous posts, although I have not gone into detail as to what it is. I may do a entire post on the subject later on but for now I'll assume that you know about this concept. In C# up to C#(4.0) generics are strictly invariant. Meaning that for example you cannot have List<String> and try to handle it as a List<Object>. The same limitation exists in C++.
Contrary to C++, in Java the generic library is inferior to C#. Basically the java byte code does not know about generics at all ! Generic types in Java would be converted to their non-generic equivalents with the casts necessary for the conversion. We also get some compile time checking with them as well. Another feature that is very annoying is the fact that Java's built-in value types cannot be used as type arguments in generics ! So you'll end up having to use the boxed version for them (List<Integer> vs List<int>) which is very inefficient. One feature that Java has that C# lacks however is generic variance(C# 4.0 introduced some variance for generics which we'll discuss later on). Java allows generic variance using wildcards.
With this, we're finally done with generics ! This has been a very long topic and we're not even touching the surface with it. Although everything was not covered here, you now know enough to delve into the details of the language specification if you're inclined. If you have had enough already, don't worry you would rarely need to go any more advanced than what has been discussed here.

Friday, 2 November 2012

Generics contd. Declaring constraint on type parameters

In this post I would talk about constraining the type parameter in a generic type or generic method. General purpose generic types like List<T> don't constrain their type arguments. This is due to the fact that they are library generics and they  should be applicable in general cases and should hold the least amount of assumption possible. Custom generic types however, may only work with certain types. There should be a way that the compiler can guarantee that the types arguments are of a certain type. This way we get more compile time type checking and also the members of that type can now be called on the type argument(IntelliSense) There are 4 different types of constraints:
  • Reference types(class): 
    • This type of constraint restricts the type argument to be a reference type. For example classes, interfaces, delegates or arrays or any other type known to be a reference type.
  • Value types(struct): 
    • This type of constraint restricts the type argument to be a value type. For example structs, data types, enums. This excludes the use of nullable types. 
  • Parameterless public constructor types(new): 
    • This type constraint restricts the type argument to have a parameterless public constructor. Notice that this excludes sealed, static and classes without an explicit parameterless constructor. Value types are said to be okay since they all have a default public parameterless constructor. This constraint to allow the generic type to make new instances of the type.
  • Conversion types(<interface, base class>):
    • This type constraint restricts the type argument to be convertible to the types specified. The type specified could be an interface or a base class. There could be more than one interface but only one class naturally since not class can inherit from more than one class anyway and specifying classes in a single inheritance hierarchy is redundant. 
In order to apply any of these constraints to a generic declaration, the "where" keyword should be used after the type definition:

class GenericType<T> where T : <constrain type>
{
}

A mixture of these constraints may be applied. Of course some mixtures are not valid as in "where T : class, struct".

Declaring your own generic types

Now we'll move to declaring of custom generic types. In this section we would try to create a custom list class which is capable of comparing it's elements for equality. Why we do we need to implement the equality operator you might ask. Why can't we just use == or != on the type ? Well, the reason is that the type may not have overloaded that operation. You may say what about the scenario where we have constrained the type to be of something specific which has actually overloaded those operators ? Well let's look at the answer exhaustively :

  • Reference type constraint: With this type of constraints the only assumption being made is that the value of the two types at any point are references to objects on the heap. So the only possible and "correct" action is to compare the references. So == and != are basically going to compare the references. This may seem counter intuitive if you don't know how overloading is resolved in C# since you may think that why wouldn't we use == and != for reference types that already overload this operator ? Let's we pass in a string which is a reference type and already overloads the operators. Then can't the JIT just use == and != ?. To understand function overloading resolutions in .Net, I suggest reading this article by Jon Skeet. Basically the C# compiler always does function overloading resolutions at compile time. This decision is made to prevent a set of problems like the "brittle base class" problem blogged extensively by Eric Lippert(link). The brittle base class problem happens due to the use of forwarding rather than delegation when a subclass calls a base class's method.
  • Value type constraint: When the type is constrained by this constraint type, the use of == and != are prohibited ? Why ? because value types include structs which may or may not overload == and !=. Again, we can't wait until runtime to see which value type is being passed so any use of these operators are prohibited.
  • Conversion type constraint: This is where we can guarantee that the type argument will always either overload the operators directly or be a subclass of a class that has overloaded them. The compiler checks to see if this is true and if it is, the operators may be used.
Okay now after this rather long digression, let's get to the point. We want to create a custom list class. This class should be able to allow comparison between the elements. In order to do this we are going to constrain our type argument using a conversion type constraint as we see below:

public class CustomList<T> : IEquatable<CustomList<T>> where T : IEqualityComparer<T>
{
    private T[] list;
    private static IEqualityComparer<T> comparer = EqualityComparer<T>.Default;

    public int ListSize {get; set;}

    public T this[int i ]
    {
        // for brevity we'll ignore the check for a valid index.
        get { return list[i]; }
    }

    public void Add(T newItem)
    {
        T[] temp;

        if (ListSize == 0)
            list = new T[2];
        else if(ListSize == list.Length)
        {
            temp = list;
            list = new T[ListSize * 2];
            Array.Copy(temp, list, ListSize);
        }
        list[ListSize++] = newItem;
    }

    public bool Equals(CustomList<T> other)
    {
        int size = Math.Min(this.ListSize, other.ListSize);
        for (int i = 0; i < size; i++)
            if (!this[i].Equals(other[i]))
                return false;
        return true;

    }
}

Okay. So let's see what we have done here. We have a class that wraps an array. Also, this class being a generic, can hold any type of element. There is one constraint on the elements though ! The elements have to implement the IEquatable<T> generic interface. This interface forces the class to implement a strongly typed equals(T) method. We did this because we wanted to make sure that the statement at line 34 is a strong typed comparison rather than a comparison of references.
This concludes my post on intermediate generics. From my next post I'm going to move into advanced generics and we're going to see how JIT handles generics.