The Iterator design pattern used in object oriented design is a pattern that aims to separate the container to be iterated over from the specific algorithm that is used for the iteration. The Iterator Pattern designates the classes to use to be able to access elements of different containers using a common interface. The implementation of foreach and the IEnumerable, IEnumerator pair(and their generic counterparts) are an example of implementation of this pattern in C#. Most programming languages out there support and implement this functionality as in C#, Python, C++, Java, etc.
The implementation of iterators in C# 1.0 required creating a type to implement the IEnumerable interface which would create another type that implemented the IEnumerator interface. This requires implementation of GetEnumerator(), MoveNext() and Current. This of course was too much of a hassle to go through in order to allow lazy access to your container class's elements. In C# 2.0 a new concept called iterator blocks are introduced which make it quite easy to implement this pattern. I must admit that I think this is a rather bizarre implementation of the pattern and goes against some of the previously known knowledge of the average developer. Closures and anonymous methods were rather natural to me but these are just weird ! Even so, soon you'll find that all these pieces would fall into place and make implementation of powerful libraries like LINQ so easy.
Simple Iterator blocks
Let's assume that I want to implement the IEnumerable interface for a class that contains the number between a certain range. I want to be able to access these numbers lazily with an iterator. In C# 1.0 we needed to add another type to supply the enumerator for us and we also have to implement all those mentioned methods, take care of the state of the iterator manually and increment the position manually as well. In C# 2.0 all this can be done with the following piece of code. In other hand this is all you should write in order to implement the iterator model:
public IEnumerator GetEnumerator()
{
for(int i = 0; i < <Collection>.length; i++)
yield return <Collection>[i];
}
In this example I have assumed that the container class that is implementing this interface holds its collection as an array. <Collection> is the place holder for the array's name. The only difference we see here from the old C# 1.0 syntax is the yield return statement. Effectively what this method currently does is returning the ith element of the collection each time a MoveNext() method is called on the enumerator. But you may ask where is this enumerator ? where is this state saved ? Who and how knows how far we've gone in the collection ? Yes I know. It seems rather bizarre but this method is a special method now since an iterator block was implemented inside of it. The method would no longer be executed sequentially ! What the compiler does after you create an iterator block in a method is to create a custom nested type to hold all the information(current position, last value yielded, reached end of collection, etc). This nested type is actually a state machine which would recurs until you reach the end of the collection and then always return the last value yielded. This solution works because in C# a nested type has access to even the private variables of the enclosing type.In order to visualize how this state machine's execution translates into execution in code you have to think of this like so:
- The method is only called when the first call to MoveNext() is made on the iterator and not when the enumerator is created.
- After the method is called the execution continues from the top to the first mention of the yield return type(note that remember that the only allowed return values for a method that implements a iterator block is IEnumerator, IEnumerable and their generic types).
- From this point the method freezes. Meaning the the execution halts until another mention of the MoveNext() method. When the next call to this method is made on the iterator the execution resumes from after the yield return statement.
The iteration stops when the loop ends and the method normally terminates or a yield break command is issued. It is important to remind you again that you cannot return any value other than those mentioned from this method. Also the allowed types after the yield return element are object when we're implementing a non-generic iterator and the type T when we are implementing IEnumerable<T>.
A better way to understand this flow is to implement one yourself and just print before and after calls to MoveNext() and Current to see which part of the code above is executed. If you do so you'll see that there are two very important things that we have to remember when working with iterator blocks:
- Firstly as before non of our code is executed until the first call to MoveNext() so never ever add input validation or code that has to be executed immediately in the method implementing the iterator block. You can not do this if you are implementing the IEnumerable interface for a class but you may not always implement that interface. You can use iterator blocks to return the IEnumerable interface without implementing the IEnumerable in the class. In that case your method may accept input parameters and it may seem perfectly ok to do input validation there. But this would cause big debugging problems since the code doesn't get called right after the iterator is made.
- Secondly it is important to know that non of our code will ever be executed when Current is accessed from the iterator. That value is basically stored int the nested type created for us by the compiler and doesn't need any execution of our code.
Finally Blocks
Iterators can not be used in try statements that are paired with catch blocks. But they can be used with try regions that are paired with finally blocks. It is important for an iterator class to have a dispose method to release any allocated resource after its execution. In order to enable the release of resources if yield break statement was met or if the iterator exited normally we can pair the iterator definition with a finally block that would get executed no matter what after we are done with the iterator.
The foreach programming construct already has this mechanism built in. Meaning that it would call the Dispose() method on the iterator that it's using. Calling the Dispose() method on the iterator that is implemented using iterator blocks would call its finally block.
Example From the Book
Although I didn't want to use examples that are exactly as are in the book I have no choice to do so in this case. This example is just too cool to leave behind. No worries though, this chapter is a free sampler chapter anyways.
public static IEnumerable<string> ReadLines(Func<TextReader> provider)
{
using(TextReader reader = provider())
{
string line;
while((line = reader.ReadLine()) != null)
yield return line;
}
}
So in the above example, we are receiving a generic delegate as an argument. The Func<TResult> is a generic delegate that doesn't get any parameters and returns a value of TResult as the return value. Here the provider delegate points to the method to call to get the proper text reader with the right encoding. We also own the provider and we can dispose of it ourselves. Also the lines in the files are iterated lazily which matters if we are working with big files. This example encapsulates the use of delegates and iterator blocks. Actually there is an easy way to create different providers in a different method and have them call this method. That example would include anonymous methods and the concept of closures as well which is all we have been talking about. I'll leave that to you as an exercise.
As seen from the previous example this is where everything is starting to fall into place. We now have the power of delegates and anonymous methods and we can also use iterator blocks to access containers lazily with much less effort. In the next post I will look at chapter 7 of the book which concludes C# 2.0's latest features and paves the road to enter into the world of C# 3.0. Stay tuned !