Blogging my way through "C# in depth": 2012

Saturday, 15 December 2012

Iterator Blocks

Hi Again,

The Iterator design pattern used in object oriented design is a pattern that aims to separate the container to be iterated over from the specific algorithm that is used for the iteration. The Iterator Pattern designates the classes to use to be able to access elements of different containers using a common interface. The implementation of foreach and the IEnumerable, IEnumerator pair(and their generic counterparts) are an example of implementation of this pattern in C#. Most programming languages out there support and implement this functionality as in C#, Python, C++, Java, etc.
The implementation of iterators in C# 1.0 required creating a type to implement the IEnumerable interface which would create another type that implemented the IEnumerator interface. This requires implementation of GetEnumerator(), MoveNext() and Current. This of course was too much of a hassle to go through in order to allow lazy access to your container class's elements. In C# 2.0 a new concept called iterator blocks are introduced which make it quite easy to implement this pattern. I must admit that I think this is a rather bizarre implementation of the pattern and goes against some of the previously known knowledge of the average developer. Closures and anonymous methods were rather natural to me but these are just weird ! Even so, soon you'll find that all these pieces would fall into place and make implementation of powerful libraries like LINQ so easy.

Simple Iterator blocks

Let's assume that I want to implement the IEnumerable interface for a class that contains the number between a certain range. I want to be able to access these numbers lazily with an iterator. In C# 1.0 we needed to add another type to supply the enumerator for us and we also have to implement all those mentioned methods, take care of the state of the iterator manually and increment the position manually as well. In C# 2.0 all this can be done with the following piece of code. In other hand this is all you should write in order to implement the iterator model:

public IEnumerator GetEnumerator()
{
    for(int i = 0; i < <Collection>.length; i++)
        yield return <Collection>[i];
}

In this example I have assumed that the container class that is implementing this interface holds its collection as an array. <Collection> is the place holder for the array's name. The only difference we see here from the old C# 1.0 syntax is the yield return statement. Effectively what this method currently does is returning the ith element of the collection each time a MoveNext() method is called on the enumerator. But you may ask where is this enumerator ? where is this state saved ? Who and how knows how far we've gone in the collection ? Yes I know. It seems rather bizarre but this method is a special method now since an iterator block was implemented inside of it. The method would no longer be executed sequentially ! What the compiler does after you create an iterator block in a method is to create a custom nested type to hold all the information(current position, last value yielded, reached end of collection, etc). This nested type is actually a state machine which would recurs until you reach the end of the collection and then always return the last value yielded. This solution works because in C# a nested type has access to even the private variables of the enclosing type.
In order to visualize how this state machine's execution translates into execution in code you have to think of this like so:

The method is only called when the first call to MoveNext() is made on the iterator and not when the enumerator is created.
After the method is called the execution continues from the top to the first mention of the yield return type(note that remember that the only allowed return values for a method that implements a iterator block is IEnumerator, IEnumerable and their generic types).
From this point the method freezes. Meaning the the execution halts until another mention of the MoveNext() method. When the next call to this method is made on the iterator the execution resumes from after the yield return statement.

The iteration stops when the loop ends and the method normally terminates or a yield break command is issued. It is important to remind you again that you cannot return any value other than those mentioned from this method. Also the allowed types after the yield return element are object when we're implementing a non-generic iterator and the type T when we are implementing IEnumerable<T>.

A better way to understand this flow is to implement one yourself and just print before and after calls to MoveNext() and Current to see which part of the code above is executed. If you do so you'll see that there are two very important things that we have to remember when working with iterator blocks:

Firstly as before non of our code is executed until the first call to MoveNext() so never ever add input validation or code that has to be executed immediately in the method implementing the iterator block. You can not do this if you are implementing the IEnumerable interface for a class but you may not always implement that interface. You can use iterator blocks to return the IEnumerable interface without implementing the IEnumerable in the class. In that case your method may accept input parameters and it may seem perfectly ok to do input validation there. But this would cause big debugging problems since the code doesn't get called right after the iterator is made.
Secondly it is important to know that non of our code will ever be executed when Current is accessed from the iterator. That value is basically stored int the nested type created for us by the compiler and doesn't need any execution of our code.

Finally Blocks

Iterators can not be used in try statements that are paired with catch blocks. But they can be used with try regions that are paired with finally blocks. It is important for an iterator class to have a dispose method to release any allocated resource after its execution. In order to enable the release of resources if yield break statement was met or if the iterator exited normally we can pair the iterator definition with a finally block that would get executed no matter what after we are done with the iterator.

The foreach programming construct already has this mechanism built in. Meaning that it would call the Dispose() method on the iterator that it's using. Calling the Dispose() method on the iterator that is implemented using iterator blocks would call its finally block.

Example From the Book

Although I didn't want to use examples that are exactly as are in the book I have no choice to do so in this case. This example is just too cool to leave behind. No worries though, this chapter is a free sampler chapter anyways.

public static IEnumerable<string> ReadLines(Func<TextReader> provider)
{
    using(TextReader reader = provider())
    {
        string line;
        while((line = reader.ReadLine()) != null)
            yield return line;
    }
}

So in the above example, we are receiving a generic delegate as an argument. The Func<TResult> is a generic delegate that doesn't get any parameters and returns a value of TResult as the return value. Here the provider delegate points to the method to call to get the proper text reader with the right encoding. We also own the provider and we can dispose of it ourselves. Also the lines in the files are iterated lazily which matters if we are working with big files. This example encapsulates the use of delegates and iterator blocks. Actually there is an easy way to create different providers in a different method and have them call this method. That example would include anonymous methods and the concept of closures as well which is all we have been talking about. I'll leave that to you as an exercise.
As seen from the previous example this is where everything is starting to fall into place. We now have the power of delegates and anonymous methods and we can also use iterator blocks to access containers lazily with much less effort. In the next post I will look at chapter 7 of the book which concludes C# 2.0's latest features and paves the road to enter into the world of C# 3.0. Stay tuned !

Sunday, 2 December 2012

Delegates and Anonymous Methods

Hi Again !

I have to tell you that today's topic is super cool. It is the stepping stone to anonymous methods and the idiomatic C# 3.0 constructs. In this post I'll cover chapter 5 from the book which is titled "Fast-tracked delegates". I absolutely love the new functional approach in C#, although it doesn't always produce the most readable code(if you put everything in one line) but I love the fact that you get so much flexibility and power with just a few lines of code and we have not even gotten to LINQ yet !

Let's start with C# 1.0 yet again. So in C# 1.0 whenever you want to create a delegate you first have to define a delegate type which would consist of the signature of the methods that can be called from that delegate and the name of the new type. Then you would have to instantiate that type as seen below:

public delegate void DelegateType();

This is all good. Everything seems normal since we are declaring a new type and then instantiating it. But sometimes the C# 1.0's approach to delegates is both restrictive and hard to read. This is seen when we have a lot of event declarations in our code. In that case we for example we have to keep instantiating the EventHandler delegate type and assign a method group name to it. Can't the compiler just induce the types on its own with regards to the event handler it is assigned to? This can be seen below:

this.checkBox1.CheckedChanged += new System.EventHandler(this.checkBox1_CheckedChanged);
this.button1.Click += new System.EventHandler(this.button1_Click);
this.checkBox1.KeyPress += new System.Windows.Forms.KeyPressEventHandler(this.checkBox1_KeyPress);

The section of the code where the actual name of the method to be executed is mentioned is called a method group. It is called so because of possible overloads to the method. Now in C# 1.0 there is no guess work about which of these overloads is going to be used for the delegate since not only the type has to be mentioned but the signatures should be exactly the same(no delegate variance). As can be seen from the example each of these events are explicitly defined to be of type KeyPressEventHandler or EventHandler. The issue is that the KeyPress event of the CheckBox control already is set to only accept delegates that are of type KeyPressEventHandler so the mention of this to create the delegate is extra. Indeed in C# 2.0 we can omit the delegate type and have the compiler decide which delegate type it is. This would be an implicit cast from method groups to delegates:

this.checkBox1.KeyPress += this.checkBox1_KeyPress

This implicit cast also comes with the added capability of variance. Just as in function overload resolution the argument types are checked for the proper overload to choose. We have talked about variance and their existence or non-existence in various parts of the language before. In this case we would be able to use parameter contra-variance and return value variance with our delegates. This means that the defined delegate may have been declared for a derived type and we would be able to use a group method that uses a less derived type(or a base class) as a parameter. For the latter case we would be able to call a function that is returning a more derived type than its delegate signature stipulates. Now what would happen to the functions return type after you use a variant delegate or the parameter that is now less derived ? You would basically lose the information associated with the derived type and you're stuck with the base class. An example of contra-variance can be seen below:

static void DoSameThing(Object sender, EventArgs e)
{
    Console.WriteLine("I'm not doing anything useful");
}

Form form = new Form();
form.Click += DoSameThing;
form.KeyPress += DoSameThing;
form.MouseClick += DoSameThing;

As can be seen above we are now able to use a delegate that has a parameter that is less derived than the defined parameter and assign it to the even handlers. This can be useful if you want to general purpose tasks no matter which method is called since with this method you're actually losing the specific information that the derived type carries. The variance example could be like so:

    public delegate A sampleDelegate();
    public class A
    {
        public void Hi()
        {
            Console.WriteLine("A");
        }
    }
    public class B : A
    {
        new public void Hi()
        {
            Console.WriteLine("B");
        }
    }

    public class RunExample
    {
        public B getSomeB()
        {
            return new B();
        }

        public void run()
        {
            sampleDelegate ourDelegate = getSomeB;
            getSomeB().Hi();
            ourDelegate().Hi();
        }
    }
    /*
    Outputs:
    B
    A
    */

As we noted earlier, here although the method getSomeB() is returning a B object because we are using the delegate through variance although it is a legal call but we won't have access to B anymore.
The addition of delegate variance in C# 2.0 was a breaking change since some previous code would no longer work. An example scenario is shown below:

public delegate void generalDelegate(BufferedStream sr);

public class parentClass
{
    public void DoSomethingWithBuffer(BufferedStream sr)
    {
        Console.WriteLine("Did something in parentClass");
    }
}
public class derivedClass : parentClass
{
    public void DoSomethingWithBuffer(Stream sr)
    {
        Console.WriteLine("Did something in derivedClass");
    }
}
...
derivedClass c = new derivedClass();
generalDelegate gd = new generalDelegate(c.DoSomethingWithBuffer);
gd(new BufferedStream());

In the above example the method called by gd in C# 1.0 would be the parent method's and in C# 2.0 the derived class's. Although this is a breaking change I would say that it is not usually the case for a derived class to implement a more general parameter that its parent anyway. The derived class is there to specialize the base class's methods.

Anonymous Methods

Okay so here is where the fun begins. Anonymous methods are a way of inlining the use of delegates. Let's say we have a list of student objects(List<Student>). Each student has a name and let's say we want to get students that have a name starting with 'A'. This There are many ways to go about this of course. The more straightforward way is to iterate through the list and just filter it according to the predicate but we can do this in a more elegant way using delegates. The list generic class in .Net supplies a FindAll method with the following signature:

public List<T> FindAll(
 Predicate<T> match
)

as This method accepts a Predicate<T> generic delegate and returns a List of all the elements in the list that matched the predicate. On way to use to this method is writing a method that has the Predicate<T> generic delegate's signature and just pass them method's name to FindAll using an implicit method group conversion. This although doable is not very elegant. Since the method could be doing something very trivial and introducing a new method that only makes sense in this scope kind of introduces a lot of noise in intellisense and basically gets in the way. Fortunately anonymous methods come to the rescue here:

List<Student> filteredList = studentList.FindAll(
                                delegate(Student std){ 
                                    return std.Name.StartsWith("A"); 
                                });

This code is so readable and concise and consequently appealing to me that I have actually gone to great lengths to keep myself from using it in every single scenario that they are applicable. Okay so let's get into the detail of things. What exactly is an anonymous method ? Is it really a method ? What does it mean to return from an anonymous method ?
The answer to the questions above is almost yes. You can almost do anything you can do in normal methods in anonymous methods as well. For example you can have loops or local variables, can return, etc. Actually the compiler is creating a method and setting it as the target for an instance delegate behind the scenes. Usually this method is created inside the same class and is named something like <className>.c__2 which is called an unspeakable name. These names are made like so, so that there are no name conflicts. You can use ILSpy to see your anonymous methods in your IL after the compilation(.Net reflector is not free anymore and should be avoided).
Note that when you specify the return statement in the anonymous method you are truly returning from that method not the enclosing method. It is easy to get those two mixed up.
Okay there are two things that remain in this section and I will just mention them without getting into any details about them. Firstly anonymous methods are not contravariant meaning that the method you define should have the exact same signatures as the delegate type expected and secondly the () after the delegate can be ignored if there is no ambiguous resolution for the delegate type and you don't need to supply an argument to the method.

Closures

This is were the true power of anonymous methods is revealed. Closures can be confusing for some and second nature to others. I found them quite straight forward so hopefully you will too. Closures are absolutely crucial to lambda expressions and LINQ. Jon warns readers to make sure they are awake and have some time to spend on the section since it could get confusing. But don't be alarmed, there are countless other articles on the internet about them if you find the topic hard to grasp here.

Closure put in simple terms is for a function to be able to interact with an environment beyond the parameters supplied to it. Let's make this abstract definition a little more concrete but before that we need to define two types of variables:

Outer Variables: These are variables that have an anonymous method declared in their scope.
Captured [Outer] Variables: These types of variables are outer variables that are used inside the anonymous method.

To go back to the definition of closures, the anonymous method is the function and the captured variables are the environment beyond its own that they interact with.

void SampleMethod()
{
    string capturedVariable = "test";
    int outerVariable = 3
    
     MethodInvoker ourDelegate = delegate(){
                                  string variable = "amazing";
                                  Console.WriteLine("This is an " + variable + capturedVariable);
                                        }
     ourDelegate();
}

td This code should be blowing your mind right now ! or maybe not ? The fact that we were able to just use the captured variable as if it was declared inside the anonymous method seems really strange and it should go against your previous knowledge of methods. After all methods are only allowed to interact with the parameters that is passed to them. Maybe also the this operator in an instance method. But surely not with an environment beyond their own.
It is important to note two things at this point. The anonymous method is not called when it is defined. So when we are declaring the MethodInvoker delegate above we are not executing the anonymous method so any captured variable that is changed inside the method is not touched until it is executed. Also, the captured variable is used inside the anonymous method is the same variable that is used anywhere inside the enclosing method.
Now why should we use captured variables and why are they useful ? Well remember our example with the student names above ? We had to hard-code the character that the name started with. With captured variables now we can have a method that gets the character that accepts this character and then capture it in an anonymous method. I will leave the details of this approach to yourself.
So far so good, you may be wandering at this point that well this seems okay. There is nothing terribly complex about captured variables so far. But here is where things get a little strange non the least ! What if I told you that local variables in a method that are captured by an anonymous method's delegate can live on after the method has returned ? Pretty crazy isn't it ! It's enough hard of a job to understand the sentence let alone understanding it. I will give an example of what this means and the repercussions of such behavior shall be evident by the time we are through.

public MethodInvoker GetDelegate()
{
    int localVar = 10;

    MethodInvoker returningDelegate = delegate{
                                         Console.WriteLine(localVar);
                                         localVar ++;
                                               };
    return returningDelegate;
}
...
MethodInvoker y = GetDelegate();
y();

So what will happen in the above snippet after we call y() ? If the variable localVar was not a captured variable we would have expected it to be destroyed when the method returned. After all local variables live on the stack and when the function returns and the stack frame is popped the variable is destroyed. But the fact is that the localVar local variable is not actually on stack but stored in a class that lived on the heap. The GetDelegate method and the anonymous method both have a reference to that special class and can so access it by the means of the class. Captured variables live at least as long as the delegate instance who references them.
There is more, capture variables can actually be shared among the many different delegates that reference them ! The key thing to remember is that a captured variables is captured each time it is instantiated. A variable is said to be instantiated whenever execution enters the scope in which it is declared. So in the example below the index variable(i) of the for statement and the list variable are shared and the counter variable is not since it is declared inside the for statement and is hence instantiated in each loop.

List<MethodInvoker> list = new List<MethodInvoker>();

for(int i = 0; i < 10; i++)
{
    int counter = i * 4;
    list.Add(delegate
             {
                Console.WriteLine("Counter: " + counter + " Index: " + i);
                counter++;
             });
}
foreach(MethodInvoker method in list)
{
    method();
}
...
list[0]()
list[0]();
}

If you run the code above you'd see that the counter variable is instantiated in each of the anonymous methods but the other variables are shared among the rest.
Finally to sum up, the rule of thumb in using captured variables is that you should avoid scenarios that make the code too complex to understand. Mixing shared and distinct variables can make the code very unreadable and the results unpredictable. But as I showed you before closures can be powerful methods when used properly. Hopefully this post has impressed upon you the power and beautiful world of anonymous methods and will result in you getting the push to use them every now and then when the circumstances are right.
The next topic would be iterator blocks. I will try to get around to do a post on them throughout the next week. For now, be well and try to stay warm !

Sunday, 18 November 2012

Nullable Types

In this post, I'll talk about nullable types in .Net. The material here are basically what was available in the book and I will try to do as good as a job as Jon has. Okay then, let's dive in.

Why do we need null ?

Firstly, let's see why we need to have a value representing the non-existence of a value. In development environments mostly in all fields, there is always a time when a piece of data is missing at a certain point in the application's execution and it would get filled later on. This can be a field representing the date of an account's closing or the time of death of an individual or mobile phone number of a person who does not have a cell phone yet. Databases handle this non existence quite easily. In DBMSs we can have the value NULL for any type of field. That is not the case with strongly typed programming languages like C# or Java. In C#, we have already seen that references can be null. That just means that the object does not point to anything on the heap at the current moment. But value types are another story. In order for a value type variable to hold null, a special value has to be designated as null. This solution is known as the "Magic Number" solution in which one sacrifices one special value to be able to represent missing data (e.g. DateTime.MinValue). This can not always be done though since for example which possible number for a byte should be chosen to represent the null value ? There are 256 possible values and the assignment of any of these values to null would mean that we can no longer read byte data as input properly. The need to use null values then, has lead to all different sorts of patterns to make it available.

Attempting to solve the problem in C# 1.0

One of the ways we have already discussed and is already available in C# 1.0 is the magic number pattern. This as we stated before is suboptimal and would waste memory and also produce all sorts of problems. Another method that is possible is wrapping the value type in an object. This effectively would allow the object to hold null for its reference and the value itself would be on the heap. This solution kind of goes against the reason for using value types in the first place. Value types are not objects and they should not have to require garbage collection, memory management, etc. Not only would this approach cause a lot of casting and consequentially performance problems but also, this would waste memory since each object in the .Net framework in a 32-bit CLR for example has 8 bytes of overhead.

Yet another solution is to store a boolean flag with the value type we want to make nullable. The flag would basically be true when the value is valid and false when the value is not(null). But then this would mean that you would have to create a type(struct) for any of the values you want to make nullable ! If you have been following with the generics posts you might be thinking that well why not create a generic type ? Aaand you would be right ! that's actually how the language designers of C# have gone about this. There is more to what they did though and we'll know of the differences by the end of this post.

Nullable<T> and System.Nullable

The Nullable<T> generic type represents a value type that can be made null. This is where the information we learned on generics comes in handy. How can we limit the type arguments supplied to the generic class to be value types? With type constraints of course(the struct constraint). Also remember that we can submit all value types to the parameter of a generic with a "struct" constraint BUT the Nullable<T> type. So you can't have nested nullable types. The Nullable<T> type has two important parameters:

HasValue()
Value

You can probably guess what each of these properties do. The Value property return the real non-nullable value which is also known as the "underlying type" or if there is no "real" value it returns an InvalidOperationException. How does it know that the value is real or not ? It intrinsically holds a boolean flag which is set when the value is non-null. This is were we get into the differences that the Nullable<T> type has with a structure that we could define ourselves. Firstly, if we had created our own structure, when the structure was wrapped in a reference it would have been a reference which points to a struct in memory. With Nullable<T> however this is not the case. The .Net framework would strip the underlying type from the struct and would put either null as the variable or the real type wrapped in an object. So boxing and unboxing do not follow the regular rules. Also, an implicit conversion exists from a value type to a nullable type and an explicit conversion exists from a nullable type to a value type. The conversion is basically the same as using the Value property; meaning you'll get an exception if there is no real value or you'll get the real value if it exists.

Nullable<T> has a peculiar cousin. The System.Nullable static class. This class holds only three methods, two of which are generic methods for checking equality and comparison and one method for getting the underlying type of the nullable. I really don't understand why these three methods couldn't be in the nullable type itself.
So far we saw how the nullable type is implemented in .Net. But all this use of generics means that we have to wrap the value type inside another type and then use it which kind of takes the attention away from the type itself. This is why in .Net 2.0 we have a specific syntax for these types. The syntax is basically putting a ? after the type's name:

int? nullableInteger; 
nullableInteger = new int?();
nullableInteger = 2;
nullableInteger = null;
if(nullableInteger == null)
    nullableInteger = new int?(2);

All of the statements seen in the snippet are valid and also any occurrence of int? can be replaced by Nullable<T> since they both produce exactly the same IL.
The comparison between nullable value types and non-nullable value types is quite similar to ordinary comparison. As mentioned before the conversion from the value type to it's nullable counter part is implicit and the opposite is explicit. What abour operators ? This is another difference between the framework's implementation of nullables with a hypothetical custom nullable type written by a user. Most operators (unary, binary) work the same way with nullable types. The difference is with the return type of these operators for example the return type in binary operators like +, -, * and / would change to Nullable<T> instead of T if any of the operators is a nullable. Those operators with return value of bool however keep their type and return false if either side of the operator is null. The operators that are overloaded for nullables are called lifted operators. The rule of thumb is to just expect normal behaviour but be careful if your code breaks as the culprit may lie with the differences introduced by nullable conversion/overloading.
Lastly, there is another operator introduced with nullables in C# 2.0. This operator is called the null coalescing operator("??"). This operator could be very useful when dealing with nulls:

int? a = null;
int b = 5
int? c = a ?? b;
int? d = b ?? a;
int? e = c ?? b ?? a;

The value of c in the snippet above would be null and the value of b would be 5 and the value of e would be also 5. This operator can be used in a chain as shown above. The result is the first non-null operand !
Some new design patterns can be thought of with the introduction of these operators as well. For example returning a value from a function as well as a "successful", "failed" logic. This obviously can be implemented by having the function return a nullable type instead of using an output parameter plus a boolean return value.
This is it for nullable types. I really wanted to keep this post short. It is shorter than generics anyway ! From the next post we would start looking into some more interesting stuff with delegates, anonymous methods and iterator blocks. Until then, happy coding :)

Saturday, 10 November 2012

Advanced Generics and Comparison with C++ and Java

In this post I will touch on some aspects of advanced generics mentioned in the book. The book covers some other aspects as well like "Reflection" which I don' see very important to cover here.
Firstly, we will discuss static fields and generics. Then we would continue with how JIT handles generics and finally we'll close with a comparison between the C++ and Java generics and C#.

Static fields defined in Generics

Static fields defined in a class do not belong to any specific object of the class but to the type itself. This means that when you declare a static field no matter how many objects you create there would be only one instance of these fields and they are shared between the classes.

Now the question is : If we have a generic type that has some static variables. Should these variables be shared between different types T or should they be independent between types ?

The answer would be obvious if we look at the class we used in the previous post :

public class CustomList<T> : IEquatable<CustomList<T>> where T : IEqualityComparer<T>
{
    private T[] list;
    private static IEqualityComparer<T> comparer = EqualityComparer<T>.Default;

If the static member comparer is shared between types all sorts of problems will happen and of course, the choice of independent static members is the right one. This means that for Class<T>, any closed type would have it's own set of static or non-static members. This means that Class<string> and Class<int> would have different and independent members.

Generic iteration (IEnumerator<T>)

Chances are you have used this interface many times without knowing so. For example when you traversed a generic type in a foreach statement. In order for a type to be iteratable in a foreach statement, that type should implement the IEnumerable interface. This interface in turn would return a IEnumerator type that implements MoveNext() and Current. Now what this would mean for the generics ? What if we want to be able to iterate a generic class. If we implement the IEnumerable interface then we would be able to successfully iterate our generic type. But there is a problem to all this. A closer look at the IEnumerable interface and the methods we have to implement shows that we have to implement this method in IEnumerator:

Object Current { get; }

This is were the problem arises. This method has to return the current element in the collection being iterated. Of course our generic type holds a collection of T objects and to return that type we need a cast to Object ! Isn't that why we got generics ? No extra boxing/unboxing, strong typing ?
Well the anwer is the IEnumerator<T> interface. Most interfaces have been extended to include generic types as well. IEnumerator<T> extends the IEnumertor to allow strong typing when working with generics. This though is not as smooth as you'd think since due to a design decision in the framework IEnumerator<T> extends the non-generic IEnumerator interface. This means that any class implementing IEnumerator<T> has to also implement IEnumerator ! So in order to implement the IEnumerator<T> interface we have to implement:

object IEnumerator.Current{}
{
    return Current();
}
public T Current{}

This design decision seems to be due to backward compatibility reasons(C# 1.0). As you can see here the implementation of two methods with the same name and different return values is possible by an explicit implementation of the non-generic one and having it call the generic version.

C# Generics vs. Counterparts in C++ and Java

In C++ generics exist as templates. These templates act as place holders in a macro definition. Basically after definition of a template type(type parameter), the compiler would just replace the value of the template with it's equivalent types at compile time. For example in the code below T is once replaced by int and once by long.

template<class T>
T min(T first, T second)
{
    return first<second ? first:second;
}

int main(int argc, char *argv[])
{
    long l1 = 2, l2 = 4;
    int i1 = 3, i2 = 5;
    long min1 = min<long>(l1, l2);
    int min2 = min<int>(i1, i2);
}

This of course, would mean we no longer need to add constraints to the type parameter used so that compile type checks can be conducted, since any type can be used for that position and at compile-time the compiler would check to see if any operation conducted is available for that type. This would add much more flexibility. This would allow the use of let's say operators on the type parameters. Doing so is not possible in C# and there is no constraint to enforce the availability of a certain operator overload. This would also mean that the C++ compiler can conduct optimization based on the types used. Although there is an optimization done in .Net that is not done in C++ and that is sharing of code for generics. If the generic is used in 5 different places in the code with different type arguments, the IL code would not have 5 different variations but only a reference to a common shared code. JIT then, would create as many different variations needed at execution time. The code is shared between generics with reference types as the type argument and not shared for value types. The reason for this is that a reference type would always have the same size (4 bytes for 32-bit CLR) but the value types can have various size(int, long, structs, ...). Lastly, C++ also allows the type arguments to not be types at all. In C++ intrinsic datatypes can be sent as arguments as well as functions.
I did talk about the concept of "variance" in .Net in the previous posts, although I have not gone into detail as to what it is. I may do a entire post on the subject later on but for now I'll assume that you know about this concept. In C# up to C#(4.0) generics are strictly invariant. Meaning that for example you cannot have List<String> and try to handle it as a List<Object>. The same limitation exists in C++.
Contrary to C++, in Java the generic library is inferior to C#. Basically the java byte code does not know about generics at all ! Generic types in Java would be converted to their non-generic equivalents with the casts necessary for the conversion. We also get some compile time checking with them as well. Another feature that is very annoying is the fact that Java's built-in value types cannot be used as type arguments in generics ! So you'll end up having to use the boxed version for them (List<Integer> vs List<int>) which is very inefficient. One feature that Java has that C# lacks however is generic variance(C# 4.0 introduced some variance for generics which we'll discuss later on). Java allows generic variance using wildcards.
With this, we're finally done with generics ! This has been a very long topic and we're not even touching the surface with it. Although everything was not covered here, you now know enough to delve into the details of the language specification if you're inclined. If you have had enough already, don't worry you would rarely need to go any more advanced than what has been discussed here.

Friday, 2 November 2012

Generics contd. Declaring constraint on type parameters

In this post I would talk about constraining the type parameter in a generic type or generic method. General purpose generic types like List<T> don't constrain their type arguments. This is due to the fact that they are library generics and they should be applicable in general cases and should hold the least amount of assumption possible. Custom generic types however, may only work with certain types. There should be a way that the compiler can guarantee that the types arguments are of a certain type. This way we get more compile time type checking and also the members of that type can now be called on the type argument(IntelliSense) There are 4 different types of constraints:

Reference types(class):

This type of constraint restricts the type argument to be a reference type. For example classes, interfaces, delegates or arrays or any other type known to be a reference type.

Value types(struct):

This type of constraint restricts the type argument to be a value type. For example structs, data types, enums. This excludes the use of nullable types.

Parameterless public constructor types(new):

This type constraint restricts the type argument to have a parameterless public constructor. Notice that this excludes sealed, static and classes without an explicit parameterless constructor. Value types are said to be okay since they all have a default public parameterless constructor. This constraint to allow the generic type to make new instances of the type.

Conversion types(<interface, base class>):

This type constraint restricts the type argument to be convertible to the types specified. The type specified could be an interface or a base class. There could be more than one interface but only one class naturally since not class can inherit from more than one class anyway and specifying classes in a single inheritance hierarchy is redundant.

In order to apply any of these constraints to a generic declaration, the "where" keyword should be used after the type definition:

class GenericType<T> where T : <constrain type>
{
}

A mixture of these constraints may be applied. Of course some mixtures are not valid as in "where T : class, struct".

Declaring your own generic types

Now we'll move to declaring of custom generic types. In this section we would try to create a custom list class which is capable of comparing it's elements for equality. Why we do we need to implement the equality operator you might ask. Why can't we just use == or != on the type ? Well, the reason is that the type may not have overloaded that operation. You may say what about the scenario where we have constrained the type to be of something specific which has actually overloaded those operators ? Well let's look at the answer exhaustively :

Reference type constraint: With this type of constraints the only assumption being made is that the value of the two types at any point are references to objects on the heap. So the only possible and "correct" action is to compare the references. So == and != are basically going to compare the references. This may seem counter intuitive if you don't know how overloading is resolved in C# since you may think that why wouldn't we use == and != for reference types that already overload this operator ? Let's we pass in a string which is a reference type and already overloads the operators. Then can't the JIT just use == and != ?. To understand function overloading resolutions in .Net, I suggest reading this article by Jon Skeet. Basically the C# compiler always does function overloading resolutions at compile time. This decision is made to prevent a set of problems like the "brittle base class" problem blogged extensively by Eric Lippert(link). The brittle base class problem happens due to the use of forwarding rather than delegation when a subclass calls a base class's method.
Value type constraint: When the type is constrained by this constraint type, the use of == and != are prohibited ? Why ? because value types include structs which may or may not overload == and !=. Again, we can't wait until runtime to see which value type is being passed so any use of these operators are prohibited.
Conversion type constraint: This is where we can guarantee that the type argument will always either overload the operators directly or be a subclass of a class that has overloaded them. The compiler checks to see if this is true and if it is, the operators may be used.

Okay now after this rather long digression, let's get to the point. We want to create a custom list class. This class should be able to allow comparison between the elements. In order to do this we are going to constrain our type argument using a conversion type constraint as we see below:

public class CustomList<T> : IEquatable<CustomList<T>> where T : IEqualityComparer<T>
{
    private T[] list;
    private static IEqualityComparer<T> comparer = EqualityComparer<T>.Default;

    public int ListSize {get; set;}

    public T this[int i ]
    {
        // for brevity we'll ignore the check for a valid index.
        get { return list[i]; }
    }

    public void Add(T newItem)
    {
        T[] temp;

        if (ListSize == 0)
            list = new T[2];
        else if(ListSize == list.Length)
        {
            temp = list;
            list = new T[ListSize * 2];
            Array.Copy(temp, list, ListSize);
        }
        list[ListSize++] = newItem;
    }

    public bool Equals(CustomList<T> other)
    {
        int size = Math.Min(this.ListSize, other.ListSize);
        for (int i = 0; i < size; i++)
            if (!this[i].Equals(other[i]))
                return false;
        return true;

    }
}

Okay. So let's see what we have done here. We have a class that wraps an array. Also, this class being a generic, can hold any type of element. There is one constraint on the elements though ! The elements have to implement the IEquatable<T> generic interface. This interface forces the class to implement a strongly typed equals(T) method. We did this because we wanted to make sure that the statement at line 34 is a strong typed comparison rather than a comparison of references.
This concludes my post on intermediate generics. From my next post I'm going to move into advanced generics and we're going to see how JIT handles generics.

Tuesday, 30 October 2012

Generics(The basics)

I have talked about generics in this blog's first post and how it is what I think the biggest contribution of C# 2. In this post I'll try to summarize what I learned from Chapter 3. This chapter is quite long(50 pages) and it gets kind of complicated at some point. Even so it provides the reader with a good explanation of how certain mechanisms of generics work in the framework.
We saw how C# 1's collection namespace lacked the strong typing needed for better compile time type checking and in a lot of scenarios created performance issues by the excess use of explicit and implicit(foreach) casts.
We are going to continue on the same front. Generics in C# can be thought to have two types:

Generic Types: These are basically wrapper classes around a group of elements(classes, delegates, interfaces and structures). Depending on the generic wrapper used, these elements may be held in an actual array(Lists<T>), doubly linked list(LinkedList<T>), hashtable(Dictionary<TKey, TValue) or a red-black tree(SortedDictonary<TKey, TValue>).
Generic Methods: These are methods that introduce generic types themselves. These are more complicated to understand. We'll get back to them after we cover generic types.

As you saw above generics are usually shown by a name appended by a "<>" sign and with some words delimited by commas in between. The words within the angled brackets are called type parameters which are placeholders for any type that can be accepted for that generic. When we work with a certain generic type we have to declare the real type that the generic type would be working with. The real types are called type arguments. This is the same kind of idea of Parameters/Arguments in functions. As an example if we want to implement an actual dictionary we would have to declare our Dictionary<TKey, TValue> type to have type arguments of string. So the type parameters are going to be replaced by the type arguments in the declaration:

    Dictionary<string, string> dic;

The generic types having its type parameters is called a unbound generic type since the type of it's arguments are still not known(Dictionary<TKey, TValue>). When a type's type arguments are specified it turns into a constructed type. Although the terminology is kind of confusing at first, we'll see why they are needed to understand some concepts as we move towards advanced generics. The dic variable for example in the above snippet is an instance of the constructed type (Dictionary<string, string>) which is a dictionary of strings to strings.
Generic methods are methods that introduce new types. Notice that methods that accept a constructed generic type or return a constructed generic type are not generic methods for example the function below is not generic:

public List<string> GetStrings(string[] stringArray);

The best way to understand generic methods is seeing examples of them. The ConvertAll<> method of the Array and List<T> types is a good example. This method has the following signature:

public List<TOutput> ConvertAll<TOutput> (Converter<TInput, TOutput> converter);

This declaration states that the return value of the function ConvertAll is a List of a certain type. Notice that the type is unspecified since TOutput is a type parameter not a real type or an type argument. The method also takes a generic delegate as input. This delegate receives a TInput type and returns a TOutput type and is called converter. The TOutput type returned by the delegate is the same type parameter declared in the function declaration.
Whenever the declarations of generic types or generic methods get too complicated to be understood abstractly. It is easier to replace type parameters with some arguments. Try this technique with the above generic method and things would start to make sense.

To see this method in action the snippet below uses this method to convert elements of a list from strings to integers:

 
public static int convert(string s)
{
    int num = 0;
    for(int i = 0; i < s.Length; i++)
        num += s[i];
    return num;
}
...
List<string> strings = new List<string>();
strings.Add("Hello");
strings.Add("This is a test");
strings.Add("Hey hey hey");
List<int> integers = strings.ConvertAll<int>(convert);
for (int i = 0; i < integers.Count; i++)
    Console.WriteLine(integers[i]);
Console.ReadKey();

Unless you have experience with generic methods before this should still seem confusing. But don't worry it is just going to get even more confusing ! But I'm just joking of course...or am I? Anyways, the best to do now is to look at some generic types available throughout the library. I would continue this series of posts on generics with type constraints which let you define the type that the generic can wrap, then we would move to creating our own generic types and then finally move to advanced generics and finally at the end we'll have a comparison between generics in .Net with their counterparts in C++ and Java.

Until then, happy coding.

Wednesday, 24 October 2012

Struct vs Class

Struct vs Class(improving the definition in the previous post)

There is a very important and commonly unknown difference between these two in .Net. The confusion sometimes stems from another source. Not knowing what value types and reference types actually are. Given that we have already gone through that, I will continue to shed more light into this matter.

The most correct way of differentiating between a struct and a class is that structs are value types and classes are reference types. Also another very well known answer to this question is that structs are allocated on the stack whereas reference types are allocated on the heap. This is also inaccurate as we showed in the previous section. Value types at times may be allocated on the heap. Even local variables when part of an anonymous method ends up on the heap. As Eric Lippert puts it in his article on this, the placing of value types on the stack is just an implementation detail. We should not be thinking of where the CLR is deciding to do with the value type or the reference type. That shouldn't point our decision. The real decision is in the semantics. Is the type a value or a reference ?

An article in MSDC elucidates this even further. There are some properties that a type should have in order to be a struct. If it doesn't have these properties, just define a class. The properties are as follows:

The type represents a single logical value like other primitive values.
It is immutable.
It needs less than 16 bytes of memory.
It is not boxed frequently.

Now let's see why each of these conditions have to apply. Firstly the type should represent a primitive type. This is what we mean by it being semantically a value type. If the type logically represents a single primitive value, it should be considered for being a value type.

Also, the type should be immutable. There is a very well known class that is immutable and we work with extensively. Yes, string. Ever wandered why strings are immutable? after all they're a reference type and a value type. The answer to this question is the same reasons we want our value types to be immutable. Firstly why are strings not classes now that they have some properties of a struct? They also represent a single logical value. But there is another side to this.

Strings can easily get over 16 bytes. They are passed around quite a lot., they are compared very often(they are often keys in dictionaries), they are usually repeated, operations like copy, range, etc are used often, etc. All of these properties bring lots of head ache if they are mutable. Being passed around creates all sorts of problems for threading since mutable objects have to be locked to avoid race conditions. Being compared often means that we have to parse through the two strings to find out if they are they same, where we could use interning to make all strings like "hello" to point to one memory reference and then we could just compare the memory locations. This also helps with repeated strings as all of them are now basically references to the same location which is immutable and wouldn't change. Also copying and range operations are made very easy with copying being only returning "this" and the range operations maybe being only "this" plus a start and end position maybe ? Basically put, what I want to impress upon you is the amount of compiler optimization is achieved when strings are made immutable. This is why a lot of languages like C#, Java and Python have opted for this design.

Okay, so after this rather lengthy digression. Let's get back to the task at hand. We were analyzing the reasons a type may be defined as a struct. We have come all the way to them being less that 16 bytes. I would say that the reason for this is that simply stacks are not that big ! In windows for example each thread is given an 1Mb array as its stack and this space is shared with lots of other elements. So it makes sense to use it sparingly.

Last but not least is the boxing issue. This is the most obvious property as the boxing/unboxing operation is expensive and has to be avoided if done in abundance.

Value Types vs Reference Types and Parameter Passing

Value types/Reference types and Parameter Passing

To simply put what's happening in C#: unless you declare it explicitly everything is passed by value. Firstly, it is important to differentiate between value types/reference types and variables being passed as out/ref. When talking about value types, we mean that a variable of a type holds the actual value that the type represents. For example the statement "int a=2;" creates a variable of type "int" which is a value type and then assigns the value 2 to it. On the other hand reference types are variables which represent types like arrays, interfaces, classes and delegates. Variables representing these types do not actually contain that type but hold a reference to it. As an example the statement "StringBuilder sb = new StringBuilder()" creates a StringBuilder object and the assigns its reference to the variable sb. In other words sb does not hold a StringBuilder but a reference to where the object actually lives.

Another misconception about value types and reference types is the saying that "Reference types live on the heap while value types live on the stack". This statement is inaccurate. Variables position in the managed memory depends on the context in which they are declared. All local variables(variables declared inside methods) are stored on the stack. Even if the variable is a reference type, the variable itself is stored on the stack. Instance variables are located on the heap where the actual containing object is. Value types like structs live wherever the variable used to declare them resides. Meaning if the value type is a local variable or a parameter it will live on the stack, if it is a instance member it lives on the heap. Also static variables always live on the heap.

Now that we know what a reference type and what a value type is, we can continue with different types of parameter passing. In the default form in C# all the parameters are passed by value. What does this mean for value types ? It means that a new variables is created in the callee and the value of the value type is copied to the new variable. Reference type are handled the same way. A new variable is created in the callee and the reference is copied into that new variable. Although both these variables now accommodate access to the object, notice that the they are independent from one another. For example in the previous code what is the value of obj1 after the final line is executed ?

   
public void Foo(StringBuilder obj)
{
    obj = new StringBuilder("Good Bye");
}
public void Bar()
{
    StringBuilder obj1 = new StringBuilder("Hello");
    Foo(obj1);
    Console.WriteLine(obj1);
}

If didn't get the answer "Hello", try again and see why the answer is not "Good Bye". The reason is that like I said before the value that is being passed to the method Foo is a copy of the reference to a StringBuilder object and since the variable obj in Foo and the variable obj1 are two independent variables then the changes to obj are not noticed by obj1. If obj had made any changes to the actual object however the changes would have been seen in Bar as seen below:

   
public void Foo(StringBuilder obj)
{
    obj.Append(" There");;
}
public void Bar()
{
    StringBuilder obj1 = new StringBuilder("Hello");
    Foo(obj1);
    Console.WriteLine(obj1);
}

The answer is "Hello There". If we had used a struct instead of the StringBuilder class however since a new value of the struct had been passed to Foo, Bar would now have seen the changes.
Now since we can already use the above mechanism to pass the references of objects by value and then manipulating the object directly, why would be want to use a ref directive?
In order to use the ref directive. One has to put it before the parameter both where the function is called and in the function definition. With this mechanism, instead of a new variable being allocated for the sake of the callee method, the same variable(slot in memory) is used for both the variable being passed and the input parameter. This means that the same code as above now does really result in "Good Bye".

   
public void Foo(StringBuilder obj)
{
    obj = new StringBuilder("Good Bye");
}
public void Bar()
{
    StringBuilder obj1 = new StringBuilder("Hello");
    Foo(obj1);
    Console.WriteLine(obj1);
}

The out operator is the same as ref with the difference that with this operator the variable being passed does not have to be initialized since it is assumed that it is going to be assigned in the callee. Also it is required to be assigned in the callee and is considered uninitialized in the callee's context.

Type System in C#

Type system in C#

In order to understand what the typing system in C# actually is we first need to define what we mean by statistically/dynamic typing, implicit/explicit typing and safe/unsafe typing.

C# 1.0 is a statistically typed language; meaning that the type of any variable is known at compile time. There are also no implicit typing either. Meaning that the compiler doesn't have to derive the type of an expression from the code in compile time either, since the type is given by the programmer. Actually C# is completely statistically typed up to C# 4.0 were some dynamic typing was added which will be discussed in the next posts. Starting from C# 3.0 some implicit typing was added to the language as well due to LINQ(with the keyword var). In these types of scenarios, although the explicit type is not mentioned by the programmer, the compiler still infers the type at compile type and we still have compile time type checking.

Type safety is a very interesting issue. For example in C++ we have all sorts of freedom when dealing with types. We can cast any type to another and the C++ compiler would not complain. For example we can cast a char* to a int* and then with a dereference operator we'll be able to get a number out of the four first bytes of the string ! (with 32 bit integers and little-endian architecture). This of course gives us all sorts of flexibility in our code. But this flexibility also adds quite a lot of rope to hang ourselves with. In C#, this is no longer a case since C# is type safe. Meaning that although you can have type conversions using casting only those compatible conversions are allowed.

If we define strongly typed languages as those who don't allow any type conversions then C# definitely is not strongly typed. With allowing implicit type conversions certain complex scenarios develop which create the need for runtime type checking. A very interesting example of this is the result of array covariance. Arrays are reference types and implicit casting from an array to another is allowed as long as the elements in the array are convertible. This however does not mean that the language doesn't check for proper covariance:

   
    string[] strArray = new string[10];
    object[] objArray = strArray;
    objArray[0] = new Button();

In the above example we have defined and array of type string and then assigned it by reference to the object array. This requires an implicit cast and type check in compile time. This means that the compiler checks to see if the type of strArray or its elements are compatible with objArray or their elements. If that is the case this conversion is allowed(covariance), otherwise a compile error is thrown. Now, in the next line we try to assign a Button to an element in the object array. This is perfectly legal as long as the compiler is concerned. Since the object array can hold elements of type object. Notice that the compiler cannot know what the actual object array is referencing until runtime. This would result in a runtime error. The reason for that is the fact that both objArray and strArray are referencing the same array of strings. Although objArray is allowed to reference that array it cannot change the type of the array since the type is statically set to string. Consequently, the runtime knowing the type of the array disallows an operation to store anything other than strings in that position.

Delegate Type vs Delegate Instance

In the previous post I talked about the evolution of C# as a language. In this post I'm going to continue the same trend. In this post however the focus in on the elements of C# that are usually misunderstood. Jon has had a lot of experience on Stack Overflow and due to this, he has seen many confusion in the community regarding the concepts covered here. I will summarize the key concepts here. I will also put each section into a new post. I've noticed that the posts are too long for any interested reader !

Delegates

Delegates could be defines as entities encapsulating a behavior with certain parameters and a return type. Now the forms of a delegate's definition has already been covered in my previous post. But there seems to be a confusion about the word delegate since it is both used as the "delegate type" and the "delegate instance". The code below shows the difference between the two:

   
    delegate int DelegateType(int a, int b);

    public int Add(int a, int b)
    {
        return a + b;
    }
...
    DelegateType delegateInstance = Add;
    Console.WriteLine(delegateInstance(3, 4));

As seen in the above code snippet, in order to create a delegate firstly a delegate type should have been declared. With the first line of code we are actually creating a new reference type called DelegateType. This type can then be instantiated, passed to functions and basically used wherever a reference type is used. Also remember that delegates like strings are immutable and thread safe. In the next lines we have created an instance of the delegate type and then invoked it. We could have used the Invoke() method of the delegate instance too but a C# shortcut is to just call it like you call a normal method. Each delegate has an invocation list. When the delegate in invoked all the functions in the invocation list are called. These functions usually have no return type but the last. Since if they do, the return type is just thrown away as the return type of the delegate is the return type of the last element in its invocation list.
In C# 1.0 the delegate instance accepts only functions with exactly the same signature defined by the delegate type. Let's say in the scenario of a parent class human and two sub-classes man and woman and with a function with the signature "void Run(Man)", one would expect to be able to have a function with the signature void Run(Human) added to the invocation list. This however can not be done in C# 1.0. This is also known as parameter contravariance. Another interesting scenario is being able to add a function with a signature that returns a derived type of the return value in the signature defined by the delegate type, also known as covariance. These two scenarios are made possible in C# 2.0 for delegates. The concept is still to be made available for implementing methods of an interface even in C# 4.0.
Another misconception is thinking of "Events" as delegates. Although events in the .Net framework rely heavily on delegates to function, they are not delegates themselves. The way that the framework is implemented however makes them look like delegates. But just like properties that look like fields to the wold outside the scope of the object and actually have getters and setters on the inside, Events are implemented as fields of type of the delegate inside their class and to the outside they seem like fields too but are actually add and delete method pairs that add or delete functions to the invocation list of the delegate field.

Friday, 19 October 2012

Chapter 1 - C#'s Evolution(1.0 to 4.0)

Starting Simple

I really like the way Jon has started the book. This book is mostly for those who have already some experience with C# and want to know how things work under the hood. He starts by programming a simple hello world program which is mostly used in teaching e-commerce site design perhaps. The program contains a Product class. I really don't know if I would have permission to post code from the book here, so I guess it's even better practice if I come up with my own code . I would also add material I know here and there. I have also tried to show some design patterns used in the .Net framework as we get to each subject.
What Jon has tried to convey in the first chapter is how C# has evolved as a language from it's initial humble begining(version(1.0)) up to version(4.0). This chapter's aim is to merely impress the reader rather than educate. To show that how the language's evolution has benefited the programmer.
Okay so let's get right to it. Let's say we have a class "Letter" as follows:

public class Letter
{
    public string letterNo;
    public DateTime letterDate;

    public DateTime LetterDate
    {
        get { return letterDate; }
    }

    public string LetterNo
    {
        get { return letterNo; }
    }

    public Letter(string ltrNo, DateTime ltrDate)
    {
        this.letterNo = ltrNo;
        this.letterDate = ltrDate;
    }

    public static ArrayList GetSampleLetters()
    {
        ArrayList list = new ArrayList();
        list.Add(new Letter("1", DateTime.MinValue));
        list.Add(new Letter("2", DateTime.MaxValue));
        list.Add(new Letter("3", DateTime.Now));
        return list;
    }

    public override string ToString()
    {
        return string.Format("Letter {0}({1})", letterNo, letterDate);
    }
}

This class encapsulates the functionality of a letter class. There are some limitations here though:

If we want to have a matching setter for our properties, it has to be public.
The ArrayList class has no compile-time information about the object it contains.
We have gone through a lot of code just to encapsulate two members, namely: letterNo and letterDate.

In C# 2.0 we can improve this a little. For example with C# 2.0 the concept of private setters is introduced and also one of the biggest improvements to C# in my opinion: Generics. So we can now change the program like so:

public class Letter
{
    public string letterNo;
    public DateTime letterDate;

    public DateTime LetterDate
    {
        get { return letterDate; }
        private set { letterDate = value; }
    }

    public string LetterNo
    {
        get { return letterNo; }
        private set { letterNo = value; }
    }

    public Letter(string ltrNo, DateTime ltrDate)
    {
        LetterNo = ltrNo;
        LetterDate = ltrDate;
    }

    public static List<Letter> GetSampleLetters()
    {
        List list = new List();
        list.Add(new Letter("1", DateTime.MinValue));
        list.Add(new Letter("2", DateTime.MaxValue));
        list.Add(new Letter("3", DateTime.Now));
        return list;
    }

    public override string ToString()
    {
        return string.Format("Letter {0}({1})", letterNo, letterDate);
    }
}

As seen above, now we can use generics and private setters in our code. Generics help greatly here since we can now get compile errors if an object of unknown type is added to the list. Also, after fetching an element from the list we don't have to case it anymore since the elements are statically typed. We still have a problem we have not addressed here and that is the use of abundant code to encapsulate the class members. C# 3.0 comes to our help. In C# 3.0 we can use automatic properties.

public class Letter
{
    public DateTime LetterDate {get; private set;}

    public string LetterNo { get; private set; }
   
    public Letter() {}

    public static List<Letter> GetSampleLetters()
    {
        return new List<Letter>() 
        {
            new Letter {LetterNo = "1", LetterDate = DateTime.MinValue},
            new Letter {LetterNo = "2", LetterDate = DateTime.MaxValue},
            new Letter {LetterNo = "3", LetterDate = DateTime.Now}
        };
    }

    public override string ToString()
    {
        return string.Format("Letter {0}({1})", LetterNo, LetterDate);
    }
}

Now this is more like it. Now we have much less code with more functionality. Notice that we don't have the variables anymore which would force us to use properties everywhere adding to consistency. Also, the List initialization is also different here. Now another subtle problem here is this: Although we have set a property to have a private setter and effectively made it read-only for the code outside the scope of the class we have not really made it read-only on the inside. Maybe it would be better to explicitly make it read-only. This can be done in C# 4.0 using the r"readonly" keyword.

public class Letter
{
    readonly DateTime LetterDate {get; private set;}

    readonly string LetterNo { get; private set; }
   
    public Letter() {}

    public static List<Letter> GetSampleLetters()
    {
        return new List<Letter>() 
        {
            new Letter {LetterNo = "1", LetterDate = DateTime.MinValue},
            new Letter {LetterNo = "2", LetterDate = DateTime.MaxValue},
            new Letter {LetterNo = "3", LetterDate = DateTime.Now}
        };
    }

    public override string ToString()
    {
        return string.Format("Letter {0}({1})", LetterNo, LetterDate);
    }
}

Sorting and Filtering

In the previous section we looked at the evolution of C# from an encapsulation point of view. But let's try another approach. If we want to sort our letters by the "LetterDate" property in C# 1.0 we would have to add another type that would implement the interface "IComparer" and then implement the Compare(object, object) method. Does this remind us of a known design pattern BTW? Yes, this is the "Strategy Pattern" (for more known design patterns in .Net you can take a look at this article in MSDN). With this pattern we have allowed more than one strategy for sorting the Array List. Let's say once we want to sort by the letter date and another time by the letter number ? The solution is to add two types which implement the IComparer interface and implement the compare method differently. As we see below:

public class LetterComparer : IComparer
{
    public int Compare(object x, object y)
    {
        Letter first = (Letter)x;
        Letter second = (Letter)y;
        return first.LetterDate.CompareTo(second.LetterDate);
    }
}
...
    ArrayList letters = Letter.GetSampleLetters();
    letters.Sort(new LetterComparer());
    foreach(Letter letter in letters)
        Console.WriteLine(letter.ToString());

There are however a couple of things we don't like with this setup. Firstly we have to add an extra type for each new strategy that we have. Second, we see a lot of casts. Some explicit as in the Compare method and some implicit as in the foreach for the print out. Not only is that a performance issue, it is also the presumtion that we are always passing only Letters to the method. You may say that we can check foreach call to the method. But isn't that another overhead? Fortunately, C# 2.0 comes to our rescue:

public class LetterComparer : IComparer<Letter>
{

    public int Compare(Letter x, Letter y)
    {
        return x.LetterDate.CompareTo(y.LetterDate);
    }
}
...
    List<Letter> letters = Letter.GetSampleLetters();
    letters.Sort(new LetterComparer());
    foreach(Letter letter in letters)
        Console.WriteLine(letter.ToString());

So we have fixed the ArrayList problem. But we are still creating a new type for a simple task of comparing a member. C# 2.0 solves this problem with "Anonymous Methods":

    List<Letter> letters = Letter.GetSampleLetters();
    letters.Sort(delegate(Letter x, Letter y){ return x.LetterDate.CompareTo(y.LetterDate)}; );
    foreach(Letter letter in letters)
        Console.WriteLine(letter.ToString());

Wow now that's a lot of change. Notice here that this is not always necessarily the right thing to do. If the class requires a more complex approach for object comparison then we would definitely stick with the Strategy Pattern.
Here we have made the code quite compact. But can C# 3.0 do better ? Of course it can...With the use of lambda expressions:

    List<Letter> letters = Letter.GetSampleLetters();
    letters.Sort((x,y)=> x.LetterDate.CompareTo(y.LetterDate));
    foreach(Letter letter in letters)
        Console.WriteLine(letter.ToString());

The section bolded shows a "lambda expression". This still creates a delegate. But this time we haven't really specified the type of the parameters of the delegate. As it turns out C# 3.0 can even make this much easier by using "extension methods":

    List<Letter> letters = Letter.GetSampleLetters();
    foreach(Letter letter in letters.OrderBy(p => p.LetterDate)
        Console.WriteLine(letter.ToString());

In the above snippet we are calling a method on the letters object which actually does not exist in the List<T> object members. We'll talk about extension methods later. This sorting operation is not even in-place. We are sorting the elements and then printing them out the List remains unsorted.

Querying Collections

If we were to query a collection and find all letters with the letterNo property higher than a certain value, this is how we would have done it in C# 1.0

    ArrayList Letters = Letter.GetSampleLetters();
    foreach(Letter letter in Letters)
        if(letter.LetterNo > 20)
            Console.WriteLine(letter.ToString());

Here is how we can do the same task with C# 2.0:

    
    List letters = Letter.GetSampleLetters();
    Predicate test = delegate(Letter l){return l.letterDate > 20;};
    List matches = letters.FindAll(test);
    Action print = Console.WriteLine;
    matches.ForEach(print);

This is by no means less complicated than what we could do in C# 1.0. Actually the code written by C# 1.0 is much more readable. Although what we have written in C# 2.0 is almost English ! What we have to realize here is not the number of lines of code we have written but the power we have over the operations. Now we are able to have the action and the predicate in variables. This immediately triggers a possible use of the "Template Method Pattern" in my head.
This pattern is really close to the Strategy Pattern with the difference that in the latter, the algorithms are usually radically different than one another. Whereas in the former, the sub-classes all have the same structure as the parent algorithm with some steps overridden in each child. The Template Pattern is also known for "fill in the blanks pattern". In this case for example the sub-classes could each override methods like "Predicate GetPredicate()" and "Predicate GetAction()" and effectively change the action and the predicate on what is being done in the collection.
The above code can of course be written even in a single line which is probably due to the use of the "Decorator Pattern":

    
    Letter.GetSampleLetters().FindAll(delegate(Letter l){return l.letterDate > 20;}).ForEach(Console.WriteLine);

There is still some as Joe puts it "fluf" around the definition of the delegate. Indeed we can even make the code shorter by the use of lambda expressions.

    
    foreach(Letter letter in Letter.GetSampleLetters().Where(l => l.letterNo > 20))
        Console.WriteLine(l);

To recap, what we have done in C# 2.0 has effectively improved the separation of concern issue by decoupling the predicate and the action. With the help of C# 3.0 we were able to further shorten our code and make it more readable. C# 4.0 does not give us any added benefits in this area.

Handling missing data

In C# 1.0 there where three possible options to handle missing data. Let's say in our Letter class some letters are still waiting to be numbered and they don't have a letterNo yet. But can we set the DateTime type to null ? No. That is not a nullable type. The three solutions were to either add a magic number to the field(DateTime.MinValue()), Wrap it in a sentinel object or add a boolean field to check if the letterNo has been inserted or not. None of these seem straightforward methods. In C# 2.0 Nullable<T> structure is introduced. This structure coupled with the syntactic goodness shown below gives us the flexibility:

    
    int? number;
    public int? number
    {
        get{ return number; }
        private set{ number = value; }
    }

Optional Parameters and Default Values in C# 4.0

So far we have not talk much about C# 4.0 . A new feature added to this version of C# which in my opinion is long overdue. Is the use of default values for the parameters and also optional parameters for methods:

    
    public Letter(int? LetterNo = null, DateTime letterDate)
    {
        this.letterNo = letterNo;
        this.letterDate = letterDate;
    }

Introducing LINQ

LINQ or Language Integrated Query is all C# 3.0 is all about. C# 2.0 introduced generics and was basically fixing up on the inaccuracies of it's ancestor. With C# 3.0 a very powerful method of querying on different data sources was introduced. The idea is to use a common language or expressive query language that could be used to talk to any kind of data source. From a database and collection of objects to XML and COM interoperability.

We have already worked with some aspects of LINQ but we have not shown the actual "query expressions". For example if we want to write the same code for finding letters with the letterNo > 20 with an explicit query expression it would look like this:

    List letters = Letter.GetSampleLetters();
    var result = from Letter l in letters
                 where l.LetterNo > 20
                 select l;
    foreach (Letter l in result)
        Console.WriteLine(l.ToString());

As you can see in the above code, the expression looks very similar to an SQL statement. In the data-driven world of today most developers already know SQL. The choice of a SQL approach to the LINQ syntax then, IMO is an apt one. One might think why we have gone to such lengths to find a letter with a certain letterNo where we could just basically iterate through the list or another might take issue with the performance of pulling down the data from the database wrapping it up into objects and then querying the object. Although the former issue is a valid one and a statement like the above makes no sense in a real life scenario. The latter issue can't be any further from the truth. In the case of LINQ to SQL, LINQ knows not to do just that. The above query expression is going to be translated into a SQL statement and is going to be executed directly on the database.
So far we have talked about querying collections of objects and querying databases. We can also use LINQ to query XML documents. For example if our data is stored as an XML document, we would be able to use an expression very similar to the one above to search for data. We can write our own providers for LINQ. All these and other features that make LINQ a very flexible feature will be discussed later on.

Thursday, 18 October 2012

Why am I doing this?

Okay so as it happens...I have realized that after graduation from my Msc I really need to hone my programming skills if I want to work as a software developer. Although I do have experience in programming in C# professionally before, I feel the need to level up in understanding the tools I use to program. To this end and to also improve my writing skills as well as keeping a journal or a notebook if you will, I have decided to blog my way through everything I learn from now on.
The book I have chosen with not much research is "C# in depth" by Jon Skeet. The book was very engaging from the moment I started reading and I guess that comes from Jon's great passion for programming and for C#. I think I would have a lot of fun going through the chapters and I wish they could be of interest to others.