Saturday, 15 December 2012

Iterator Blocks

Hi Again,

The Iterator design pattern used in object oriented design is a pattern that aims to separate the container to be iterated over from the specific algorithm that is used for the iteration. The Iterator Pattern designates the classes to use to be able to access elements of different containers using a common interface. The implementation of foreach and the IEnumerable, IEnumerator pair(and their generic counterparts) are an example of implementation of this pattern in C#. Most programming languages out there support and implement this functionality as in C#, Python, C++, Java, etc.
The implementation of iterators in C# 1.0 required creating a type to implement the IEnumerable interface which would create another type that implemented the IEnumerator interface. This requires implementation of GetEnumerator(), MoveNext() and Current. This of course was too much of a hassle to go through in order to allow lazy access to your container class's elements. In C# 2.0 a new concept called iterator blocks are introduced which make it quite easy to implement this pattern. I must admit that I think this is a rather bizarre implementation of the pattern and goes against some of the previously known knowledge of the average developer. Closures and anonymous methods were rather natural to me but these are just weird ! Even so, soon you'll find that all these pieces would fall into place and make implementation of powerful libraries like LINQ so easy.

Simple Iterator blocks

Let's assume that I want to implement the IEnumerable interface for a class that contains the number between a certain range. I want to be able to access these numbers lazily with an iterator. In C# 1.0 we needed to add another type to supply the enumerator for us and we also have to implement all those mentioned methods, take care of the state of the iterator manually and increment the position manually as well. In C# 2.0 all this can be done with the following piece of code. In other hand this is all you should write in order to implement the iterator model:

public IEnumerator GetEnumerator()
{
    for(int i = 0; i < <Collection>.length; i++)
        yield return <Collection>[i];
}
In this example I have assumed that the container class that is implementing this interface holds its collection as an array. <Collection> is the place holder for the array's name. The only difference we see here from the old C# 1.0 syntax is the yield return statement. Effectively what this method currently does is returning the ith element of the collection each time a MoveNext() method is called on the enumerator. But you may ask where is this enumerator ? where is this state saved ? Who and how knows how far we've gone in the collection ? Yes I know. It seems rather bizarre but this method is a special method now since an iterator block was implemented inside of it. The method would no longer be executed sequentially ! What the compiler does after you create an iterator block in a method is to create a custom nested type to hold all the information(current position, last value yielded, reached end of collection, etc). This nested type is actually a state machine which would recurs until you reach the end of the collection and then always return the last value yielded. This solution works because in C# a nested type has access to even the private variables of the enclosing type.
In order to visualize how this state machine's execution translates into execution in code you have to think of this like so:

  • The method is only called when the first call to MoveNext() is made on the iterator and not when the enumerator is created.
  • After the method is called the execution continues from the top to the first mention of the yield return type(note that remember that the only allowed return values for a method that implements a iterator block is IEnumerator, IEnumerable and their generic types).
  • From this point the method freezes. Meaning the the execution halts until another mention of the MoveNext() method. When the next call to this method is made on the iterator the execution resumes from after the yield return statement. 
The iteration stops when the loop ends and the method normally terminates or a yield break command is issued. It is important to remind you again that you cannot return any value other than those mentioned from this method. Also the allowed types after the yield return element are object when we're implementing a non-generic iterator and the type T when we are implementing IEnumerable<T>.
A better way to understand this flow is to implement one yourself and just print before and after calls to MoveNext() and Current to see which part of the code above is executed. If you do so you'll see that there are two very important things that we have to remember when working with iterator blocks: 
  • Firstly as before non of our code is executed until the first call to MoveNext() so never ever add input validation or code that has to be executed immediately in the method implementing the iterator block. You can not do this if you are implementing the IEnumerable interface for a class but you may not always implement that interface. You can use iterator blocks to return the IEnumerable interface without implementing the IEnumerable in the class. In that case your method may accept input parameters and it may seem perfectly ok to do input validation there. But this would cause big debugging problems since the code doesn't get called right after the iterator is made. 
  • Secondly it is important to know that non of our code will ever be executed when Current is accessed from the iterator. That value is basically stored int the nested type created for us by the compiler and doesn't need any execution of our code.

Finally Blocks

Iterators can not be used in try statements that are paired with catch blocks. But they can be used with try regions that are paired with finally blocks. It is important for an iterator class to have a dispose method to release any allocated resource after its execution. In order to enable the release of resources if yield break statement was met or if the iterator exited normally we can pair the iterator definition with a finally block that would get executed no matter what after we are done with the iterator.
The foreach programming construct already has this mechanism built in. Meaning that it would call the Dispose() method on the iterator that it's using. Calling the Dispose() method on the iterator that is implemented using iterator blocks would call its finally block. 

Example From the Book

Although I didn't want to use examples that are exactly as are in the book I have no choice to do so in this case. This example is just too cool to leave behind. No worries though, this chapter is a free sampler chapter anyways.

public static IEnumerable<string> ReadLines(Func<TextReader> provider)
{
    using(TextReader reader = provider())
    {
        string line;
        while((line = reader.ReadLine()) != null)
            yield return line;
    }
}

So in the above example, we are receiving a generic delegate as an argument. The Func<TResult> is a generic delegate that doesn't get any parameters and returns a value of TResult as the return value. Here the provider delegate points to the method to call to get the proper text reader with the right encoding. We also own the provider and we can dispose of it ourselves. Also the lines in the files are iterated lazily which matters if we are working with big files. This example encapsulates the use of delegates and iterator blocks. Actually there is an easy way to create different providers in a different method and have them call this method. That example would include anonymous methods and the concept of closures as well which is all we have been talking about. I'll leave that to you as an exercise.
As seen from the previous example this is where everything is starting to fall into place. We now have the power of delegates and anonymous methods and we can also use iterator blocks to access containers lazily with much less effort. In the next post I will look at chapter 7 of the book which concludes C# 2.0's latest features and paves the road to enter into the world of C# 3.0. Stay tuned !

Sunday, 2 December 2012

Delegates and Anonymous Methods

Hi Again !

I have to tell you that today's topic is super cool. It is the stepping stone to anonymous methods and the idiomatic C# 3.0 constructs. In this post I'll cover chapter 5 from the book which is titled "Fast-tracked delegates". I absolutely love the new functional approach in C#, although it doesn't always produce the most readable code(if you put everything in one line) but I love the fact that you get so much flexibility and power with just a few lines of code and we have not even gotten to LINQ yet !

Let's start with C# 1.0 yet again. So in C# 1.0 whenever you want to create a delegate you first have to define a delegate type which would consist of the signature of the methods that can be called from that delegate and the name of the new type. Then you would have to instantiate that type as seen below:

public delegate void DelegateType();

This is all good. Everything seems normal since we are declaring a new type and then instantiating it. But sometimes the C# 1.0's approach to delegates is both restrictive and hard to read. This is seen when we have a lot of event declarations in our code. In that case we for example we have to keep instantiating the EventHandler delegate type and assign a method group name to it. Can't the compiler just induce the types on its own with regards to the event handler it is assigned to? This can be seen below:
this.checkBox1.CheckedChanged += new System.EventHandler(this.checkBox1_CheckedChanged);
this.button1.Click += new System.EventHandler(this.button1_Click);
this.checkBox1.KeyPress += new System.Windows.Forms.KeyPressEventHandler(this.checkBox1_KeyPress);
The section of the code where the actual name of the method to be executed is mentioned is called a method group. It is called so because of possible overloads to the method. Now in C# 1.0 there is no guess work about which of these overloads is going to be used for the delegate since not only the type has to be mentioned but the signatures should be exactly the same(no delegate variance). As can be seen from the example each of these events are explicitly defined to be of type KeyPressEventHandler or EventHandler. The issue is that the KeyPress event of the CheckBox control already is set to only accept delegates that are of type KeyPressEventHandler so the mention of this to create the delegate is extra. Indeed in C# 2.0 we can omit the delegate type and have the compiler decide which delegate type it is. This would be an implicit cast from method groups to delegates:
this.checkBox1.KeyPress += this.checkBox1_KeyPress
This implicit cast also comes with the added capability of variance. Just as in function overload resolution the argument types are checked for the proper overload to choose. We have talked about variance and their existence or non-existence in various parts of the language before. In this case we would be able to use parameter contra-variance and return value variance with our delegates. This means that the defined delegate may have been declared for a derived type and we would be able to use a group method that uses a less derived type(or a base class) as a parameter. For the latter case we would be able to call a function that is returning a more derived type than its delegate signature stipulates. Now what would happen to the functions return type after you use a variant delegate or the parameter that is now less derived ? You would basically lose the information associated with the derived type and you're stuck with the base class. An example of contra-variance can be seen below:
static void DoSameThing(Object sender, EventArgs e)
{
    Console.WriteLine("I'm not doing anything useful");
}

Form form = new Form();
form.Click += DoSameThing;
form.KeyPress += DoSameThing;
form.MouseClick += DoSameThing;
As can be seen above we are now able to use a delegate that has a parameter that is less derived than the defined parameter and assign it to the even handlers. This can be useful if you want to general purpose tasks no matter which method is called since with this method you're actually losing the specific information that the derived type carries. The variance example could be like so:
    public delegate A sampleDelegate();
    public class A
    {
        public void Hi()
        {
            Console.WriteLine("A");
        }
    }
    public class B : A
    {
        new public void Hi()
        {
            Console.WriteLine("B");
        }
    }

    public class RunExample
    {
        public B getSomeB()
        {
            return new B();
        }

        public void run()
        {
            sampleDelegate ourDelegate = getSomeB;
            getSomeB().Hi();
            ourDelegate().Hi();
        }
    }
    /*
    Outputs:
    B
    A
    */
As we noted earlier, here although the method getSomeB() is returning a B object because we are using the delegate through variance although it is a legal call but we won't have access to B anymore.
The addition of delegate variance in C# 2.0 was a breaking change since some previous code would no longer work. An example scenario is shown below:

public delegate void generalDelegate(BufferedStream sr);

public class parentClass
{
    public void DoSomethingWithBuffer(BufferedStream sr)
    {
        Console.WriteLine("Did something in parentClass");
    }
}
public class derivedClass : parentClass
{
    public void DoSomethingWithBuffer(Stream sr)
    {
        Console.WriteLine("Did something in derivedClass");
    }
}
...
derivedClass c = new derivedClass();
generalDelegate gd = new generalDelegate(c.DoSomethingWithBuffer);
gd(new BufferedStream());

In the above example the method called by gd in C# 1.0 would be the parent method's and in C# 2.0 the derived class's. Although this is a breaking change I would say that it is not usually the case for a derived class to implement a more general parameter that its parent anyway. The derived class is there to specialize the base class's methods.

Anonymous Methods

Okay so here is where the fun begins. Anonymous methods are a way of inlining the use of delegates. Let's say we have a list of student objects(List<Student>). Each student has a name and let's say we want to get students that have a name starting with 'A'. This There are many ways to go about this of course. The more straightforward way is to iterate through the list and just filter it according to the predicate but we can do this in a more elegant way using delegates. The list generic class in .Net supplies a FindAll method with the following signature:

public List<T> FindAll(
 Predicate<T> match
)
as This method accepts a Predicate<T> generic delegate and returns a List of all the elements in the list that matched the predicate. On way to use to this method is writing a method that has the Predicate<T> generic delegate's signature and just pass them method's name to FindAll using an implicit method group conversion. This although doable is not very elegant. Since the method could be doing something very trivial and introducing a new method that only makes sense in this scope kind of introduces a lot of noise in intellisense and basically gets in the way. Fortunately anonymous methods come to the rescue here:

List<Student> filteredList = studentList.FindAll(
                                delegate(Student std){ 
                                    return std.Name.StartsWith("A"); 
                                });
This code is so readable and concise and consequently appealing to me that I have actually gone to great lengths to keep myself from using it in every single scenario that they are applicable. Okay so let's get into the detail of things. What exactly is an anonymous method ? Is it really a method ? What does it mean to return from an anonymous method ?
The answer to the questions above is almost yes. You can almost do anything you can do in normal methods in anonymous methods as well. For example you can have loops or local variables, can return, etc. Actually the compiler is creating a method and setting it as the target for an instance delegate behind the scenes. Usually this method is created inside the same class and is named something like <className>.c__2 which is called an unspeakable name. These names are made like so, so that there are no name conflicts. You can use ILSpy to see your anonymous methods in your IL after the compilation(.Net reflector is not free anymore and should be avoided).
Note that when you specify the return statement in the anonymous method you are truly returning from that method not the enclosing method. It is easy to get those two mixed up.
Okay there are two things that remain in this section and I will just mention them without getting into any details about them. Firstly anonymous methods are not contravariant meaning that the method you define should have the exact same signatures as the delegate type expected and secondly the () after the delegate can be ignored if there is no ambiguous resolution for the delegate type and you don't need to supply an argument to the method.

Closures

This is were the true power of anonymous methods is revealed. Closures can be confusing for some and second nature to others. I found them quite straight forward so hopefully you will too. Closures are absolutely crucial to lambda expressions and LINQ. Jon warns readers to make sure they are awake and have some time to spend on the section since it could get confusing. But don't be alarmed, there are countless other articles on the internet about them if you find the topic hard to grasp here. 
Closure put in simple terms is for a function to be able to interact with an environment beyond the parameters supplied to it. Let's make this abstract definition a little more concrete but before that we need to define two types of variables:

  • Outer Variables: These are variables that have an anonymous method declared in their scope. 
  • Captured [Outer] Variables: These types of variables are outer variables that are used inside the anonymous method.
To go back to the definition of closures, the anonymous method is the function and the captured variables are the environment beyond its own that they interact with. 

void SampleMethod()
{
    string capturedVariable = "test";
    int outerVariable = 3
    
     MethodInvoker ourDelegate = delegate(){
                                  string variable = "amazing";
                                  Console.WriteLine("This is an " + variable + capturedVariable);
                                        }
     ourDelegate();
}
td This code should be blowing your mind right now ! or maybe not ? The fact that we were able to just use the captured variable as if it was declared inside the anonymous method seems really strange and it should go against your previous knowledge of methods. After all methods are only allowed to interact with the parameters that is passed to them. Maybe also the this operator in an instance method. But surely not with an environment beyond their own.
It is important to note two things at this point. The anonymous method is not called when it is defined. So when we are declaring the MethodInvoker delegate above we are not executing the anonymous method so any captured variable that is changed inside the method is not touched until it is executed. Also, the captured variable is used inside the anonymous method is the same variable that is used anywhere inside the enclosing method.
Now why should we use captured variables and why are they useful ? Well remember our example with the student names above ? We had to hard-code the character that the name started with. With captured variables now we can have a method that gets the character that accepts this character and then capture it in an anonymous method. I will leave the details of this approach to yourself.
So far so good, you may be wandering at this point that well this seems okay. There is nothing terribly complex about captured variables so far. But here is where things get a little strange non the least ! What if I told you that local variables in a method that are captured by an anonymous method's delegate can live on after the method has returned ? Pretty crazy isn't it ! It's enough hard of a job to understand the sentence let alone understanding it. I will give an example of what this means and the repercussions of such behavior shall be evident by the time we are through.

public MethodInvoker GetDelegate()
{
    int localVar = 10;

    MethodInvoker returningDelegate = delegate{
                                         Console.WriteLine(localVar);
                                         localVar ++;
                                               };
    return returningDelegate;
}
...
MethodInvoker y = GetDelegate();
y();

So what will happen in the above snippet after we call y() ? If the variable localVar was not a captured variable we would have expected it to be destroyed when the method returned. After all local variables live on the stack and when the function returns and the stack frame is popped the variable is destroyed. But the fact is that the localVar local variable is not actually on stack but stored in a class that lived on the heap. The GetDelegate method and the anonymous method both have a reference to that special class and can so access it by the means of the class. Captured variables live at least as long as the delegate instance who references them.
There is more, capture variables can actually be shared among the many different delegates that reference them ! The key thing to remember is that a captured variables is captured each time it is instantiated. A variable is said to be instantiated whenever execution enters the scope in which it is declared. So in the example below the index variable(i) of the for statement and the list variable are shared and the counter variable is not since it is declared inside the for statement and is hence instantiated in each loop. 
List<MethodInvoker> list = new List<MethodInvoker>();

for(int i = 0; i < 10; i++)
{
    int counter = i * 4;
    list.Add(delegate
             {
                Console.WriteLine("Counter: " + counter + " Index: " + i);
                counter++;
             });
}
foreach(MethodInvoker method in list)
{
    method();
}
...
list[0]()
list[0]();
}

If you run the code above you'd see that the counter variable is instantiated in each of the anonymous methods but the other variables are shared among the rest.
Finally to sum up, the rule of thumb in using captured variables is that you should avoid scenarios that make the code too complex to understand. Mixing shared and distinct variables can make the code very unreadable and the results unpredictable. But as I showed you before closures can be powerful methods when used properly. Hopefully this post has impressed upon you the power and beautiful world of anonymous methods and will result in you getting the push to use them every now and then when the circumstances are right.
The next topic would be iterator blocks. I will try to get around to do a post on them throughout the next week. For now, be well and try to stay warm !