Tuesday, 17 September 2013

LINQ

In this post I will talk about LINQ as a whole. There are books written about this subject and you can't really explain it all in one post. It would need its own series of posts. I haven't look through all the nook and crannies of it neither and you don't really have to in order to be productive in it. What you have to know is more about how and when to use it and more importantly how to use it efficiently.
This is basically where all the things we talked about in the previous posts come together. Ideas and features like Lambda Expressions, Delegates, Deferred Execution and Extension Methods all go hand in hand to give us what we will see below.

In order for us to get started, we need a set of classes to actually run the queries on. The following shows the classes that we will be using to run our queries. We will fill in the objects with some sample data using object initialization we learned in previous posts:

class Book
{
    public string Name { get; set; }
    public string Genre { get; set; }
    public decimal Price { get; set; }
    public override string ToString()
    {
        return Name;
    }
}
class Library
{
    public string Name { get; set; }
    public List<Book> AllBooks { get; set; }
    public List<Member> AllMembers { get; set; }
    public override string ToString()
    {
        return Name;
    }
}
class Member
{
    public string Name { get; set; }
    public string Address { get; set; }
    public short Age { get; set; }
    public override string ToString()
    {
        return Name;
    }
}
class Reservation
{
    public Library InLibrary { get; set; }
    public DateTime StartDate { get; set; }
    public DateTime EndDate { get; set; }
    public Book ReservedBook { get; set; }
    public Member MemberWhoTookIt { get; set; }
}
...
..
.
Library newLibrary = new Library
{
    Name = "Biggest Library Ever!!",
    AllBooks = new List<Book>
    {
        new Book { Name = "When I was at the gym", Genre = "Horror", Price=1000},
        new Book { Name = "Hills can run", Genre = "Comedy", Price = 200},
        new Book { Name = "Inspector Bandex", Genre = "Romantic", Price = 500}
    },
    AllMembers = new List<Member>
    {
        new Member { Name = "John", Address = "Saturn", Age = 23},
        new Member { Name = "Joon", Address = "Sun", Age = 1500},
        new Member { Name = "Jeen", Address = "Moon", Age = 2}
    }
};

List<Reservation> allReservations = new List<Reservation>();
allReservations.Add(new Reservation
{
    InLibrary = newLibrary,
    StartDate = DateTime.Now,
    EndDate = DateTime.Now.Add(new TimeSpan(2, 0, 0, 0)),
    MemberWhoTookIt = 
        newLibrary.AllMembers.First(member 
                                        => 
                                    member.Name.Equals("John")),
    ReservedBook = 
        newLibrary.AllBooks.First(book 
                                    => 
                                    book.Name.Equals("Inspector Bandex"))
});

allReservations.Add(new Reservation
{
    InLibrary = newLibrary,
    StartDate = DateTime.Now,
    EndDate = DateTime.Now.Add(new TimeSpan(1, 0, 0, 0)),
    MemberWhoTookIt = 
        newLibrary.AllMembers.First(member 
                                        => 
                                    member.Name.Equals("Jeen")),
    ReservedBook = newLibrary.AllBooks.First(book 
                                                => 
                                             book.Name.Equals("Romantic"))
});

public static void PrintContent(object obj)
{
    Console.WriteLine("--------");
    PropertyInfo[] properties = 
        obj.GetType().GetProperties(BindingFlags.Instance | BindingFlags.Public);
    foreach(var property in properties)
        Console.WriteLine(String.Format("{0} : {1}", 
                          property.Name, 
                          property.GetValue(obj)));
}

As you see above, I have created a Library class that holds certain members and books. Another Reservation class is responsible to hold different Reservations for the library and its members and books. I have also added a PrintContent method which would help us print out all the properties of an object it receives as input using reflection. Don't worry about the PrintContent method and how it works as this post is not really about reflection. I will go ahead and write the first query on this setup:

var query = from reservation in allReservations
            where reservation.StartDate.Equals(DateTime.MinValue)
            select reservation;

foreach (var item in query)
    PrintContent<Reservation>(item);
            
Console.ReadKey();

The code above would have the following output:

--------
InLibrary : Biggest Library Ever!!
StartDate : 0001-01-01 12:00:00 AM
EndDate : 0001-01-03 12:00:00 AM
ReservedBook : Inspector Bandex
MemberWhoTookIt : John
--------
InLibrary : Biggest Library Ever!!
StartDate : 0001-01-01 12:00:00 AM
EndDate : 0001-01-02 12:00:00 AM
ReservedBook : Inspector Bandex
MemberWhoTookIt : Jeen

You can definitely tell what the query is doing by looking at the outcome and by basically reading it ! Just read it the plain english. I'm trying to find all reservations in the allReservation collection where the StartDate of the resevation is of a certain value. Now you may wander what the "select" statement is for. We will get to that soon. But first let's see what exactly happens when you write those lines of code.

Firstly, you will notice that I have used the "var" keyword. This keyword if you remember is to ask the compiler to infer the type itself. Let's just say for now that finding the actual type of a LINQ query can sometime be complicated and more importantly usually it's not much of concern. That is why you usually would see the var keyword used instead of the actual type the query would return.

Another point to remember is that what we have written as LINQ query statements and assigned it to the query local variable are just operations to be performed later. In other words by assigning these to the variable we are not doing any type of processing on the collection yet. Instead a representation of this query is generated using expression trees and this tree would later be traversed by the LINQ engine when the query is actually executed(remember deferred execution and the yield return statement?). The process of transforming the LINQ query statements to function calls is completely mechanical in the sense that the compiler doesn't try to do any kind of optimization at this point. The statement is transformed to a series of function calls. For example the above statement would be translated to the following:

var query = from reservation in allReservations
            where reservation.StartDate.Equals(DateTime.MinValue)
            select reservation;

allReservations.Where(reservation 
                          => 
                      reservation.StartDate.Equals(DateTime.MinValue)
               .Select(reservation => reservation)
               
Now if you are familiar with SQL and you had a nagging feeling as to why this query is written in reverse(instead of SELECT * FROM ...), I can answer you why. As you can see they way we have written the query is the same way the query is translated into code.

Now you may be wandering when exactly the processing starts then!? It actually starts the first time the query is used in an IEnumerable context. That could be either by calling MoveNext() and Current on an Enumerator(either implicitly in a foreach statement or calling it explicitly on an enumerator), or it could also be by calling a method that would accept an IEnumerable<T> and call these methods for us.

Up to here you have probably guessed that the return value of most LINQ operators(from, where, select, etc) should be IEnumerable<T>. That is true! There is another return type called IQueryable which we would analyze in the next posts. For now what we know is that LINQ to Object would receive a collection that implements IEnumerable<T> as input and in a sequence of operators, this input is passed along from one to the next all in some form of IEnumerable<T>. But let's delve a little bit deeper and check what the order of these operations are.

Deferred execution in LINQ actually has a maybe at first, strange behavior. Look at the translation from the query expression to the method calls above. Although, the methods are translated like so, the execution starts from the most inner method. Meaning that first the select method is called. This method would ask the outer method for the first element. The outer method(here where) would ask its outer method for the first element. The outer method here is an object which implements IEnumerable<T> so it would just yield the first element. This element would pass through all the methods all the way to the most inner method which would then yield the output element to the consumer of the LINQ query. This is called streaming of the sequence of the collection objects' elements. Most LINQ operators are streaming operators. There are some methods that require the entire list of items to perform their task. For example the sort or reverse operators are these types of operators which "buffer" the data and don't stream it. This is also why you should be careful where in the query you use these operators. For example it is usually better to use these operations after a where operator which would cause the buffering operation to be done on lesser items.

A Side Note

There are many operators in LINQ of which I would name a few more in passing. These all make sense if you have some experience with SQL. These include:
  • OrderBy: When applied this operator would enable ordering of data in ascending and descending format. If you are ordering the elements by more than one criteria you can use the ThenBy operators in combination with OrderBy as well. In order to sort in a descending form you use the OrderByDescending and ThenByDescending operators.

Joins

LINQ was here to bring us a structured query language that we could use to query objects. What is a query without joins? Joins are mainly used in relational models where the only way of reaching a referenced data in another table is to join them on that data. In languages like C# however, we use references to reach the piece of data. Considering this, there are still scenarios in which we can use Joins to find objects of interest in a well expressed format. In this section I would talk about three main Join types. "Inner Joins", "Outer Joins" and "Cross Joins", all of which are available to us using LINQ.

Let's start with the Inner Join. In this type of join both the left and the right elements of the tables have to have the item joined on in order for a row to be generated in the result group. This means that if there is an element that exists in the left group and not in the right group, the result would not have a row corresponding to this value and so is true with the right group. The syntax of an inner join operation in LINQ is as follows:

var joinQuery = from member in newLibrary.AllMembers
                where member.Age > 20
                join reservation in allReservations
                on member.Name equals reservation.MemberWhoTookIt.Name
                select member;
foreach (var item in joinQuery)
    PrintContent(item);

Here I've joined the AllMembers List with the allReservation List on the Name property. Also the left or outer sequence(AllMembers) is filtered before the join. Had we wanted to filter the sequence on the right, the sequence would have been more complicated.

var joinQuery = from member in newLibrary.AllMembers
                where member.Age > 20
                join reservation in (
                                    from reservation in allReservations
                                    where reservation.StartDate == DateTime.MinValue
                                    select reservation
                                    )
                on member.Name equals reservation.MemberWhoTookIt.Name
                select member;
Pay attention to the sub query that is used here. It is also worth mentioning that since the left sequence is streamed and the right sequence is buffered for key lookups, it is a better idea to put the sequence with more elements as the left sequence.

Outer Joins are needed to have a one to one relationship between the elements of a group and the result group. Meaning that regardless of the existence of an equivalent element in the right group we may want a row representing an element of the left group. This join's syntax is as follows:

var joinQuery = from member in newLibrary.AllMembers
                join reservation in allReservations
                on member.Name equals reservation.MemberWhoTookIt.Name
                into joinedElementsFromRight
                select new { Member = member, 
                             MembersReservations = joinedElementsFromRight };
foreach (var item in joinQuery)
{
    PrintContent(item.Member);
    foreach(var reservation in item.MembersReservations)
        PrintContent(reservation);
}

The last type of join is Cross Join. This is the same as the Cartesian product of all the elements of the two groups. There are is no matching done. But since in this type of join the right group is streamed as well, this join can be used quite elegantly to produce a product in which the elements of the right group are dependent on the elements of the left group. See the example below:

var query = from num1 in Enumerable.Range(1, 10)
            from num2 in Enuemrable.Range(1, num1)
            select new { Left = num1, Right = num2 }

Run this query and check out the result. As you'd expect just like a nested loop the join would yield different number of elements for the right sequence for each item from the left sequence. It is worth to repeat that in this join the right sequence is not buffered and streamed also. This means that this join can be quite useful for unknown or endless streams of data as only one item is fetched and processed at any given time.

Group By

This is the last operator that Jon has covered in the book. This is a very important operator indeed and used much more than Join in my experience. This operator allows grouping of elements in the sequence by a key. The returned value from this operator is IGrouping<TKey,TValue> which is extended from IEnumerable<T> to only have a key per each element. Here is an example:

var joinQuery = from reservation in allReservations
                group reservation by reservation.MemberWhoTookIt;
foreach (var item in joinQuery)
{
    PrintContent(item.Key);
    foreach(var reservation in item)
        PrintContent(reservation);
}
There you are ! We have now covered LINQ's most used operators and have now covered all new features added in C# 3. Now it's time to delve deeper. In the next post I will talk about what is actually going on behind the scenes with the C# compiler. Until then ciao.

Wednesday, 13 March 2013

Extension Methods

This post tells you all you have to know about extension methods. I'll look into why they were introduced, how to declare and use them and finally we'll explore some of the added extension methods that make LINQ possible. Let's get to it then.
I personally think that extension methods are a double edged sword. They can both add readability to code and make it obscure at the same time. As the name suggest, they extend functionality of a class. You may now be thinking that wait didn't we use inheritance for that purpose?
The answer is : well, maybe...it all depends. Inheritance is used when there is a form of specialization going on and an added state is required for all the objects of the extended class. There are times where you don't really need to add this functionality to the class since you're not specializing anything. Nonetheless, this would rarely keep you from extending that type if you own the code. But there are times when that is basically not an option.
For example, when you don't own the class's code or you are adding functionality to an interface, you can't really make any changes to legacy code. At these times you may add a static method and just pass in the class to the method for that functionality. Although this technique is widely used, this wouldn't feel that much object oriented when you look at the code since the functionality is not really called on the object itself.
In case of interfaces, when you add a method to an interface you have basically broken all legacy code that used that interface. All those classes wouldn't build anymore and would require implementation of that method. This is due to the rule that states that each implementation of an interface has to implement all the methods that are declared in that interface.
Finally Changes that LINQ needed are mostly to the interfaces. This is why extension methods were added to the language. Yes the language only. As you have probably guessed by now the changes are yet again only syntactic sugar added by the compiler. Extension methods are converted to a static class that receives each class as an argument behind the scenes. The syntax just make it look like the type now has the functionality.
Declaring extension methods is almost too simple to forget! All you have to do is adding the extended type to as a first parameter to the method and add a "this" keyword before the type too:

    
public static int MultiplyBy(this int extendedType, int multiplyBy)
{
    return extendedType * multiplyBy;
}

Now every integer has a "MultiplyBy" method. It is truly as easy as that. There are some limitation as to where the extension method can be declared. The class in which the extension method is implemented should be non-nested, non-generic and static.
Extension methods are resolved as follows:
The compiler would firstly look for an instance method with the signature defined in the caller code. If there no instance methods found in the type then all imported namespaces are searched for compatible extension methods. Compatible extension methods are those that have exactly the same syntax or have signature that can be implicitly converted. Note that if an extension method has the same signature as an instance method, the extension methods is never called. Also no warning is given by the compiler on the occurrence of said event.
Now we look at another piece of code:
    
public static bool Equals(this object obj1, object obj2)
{
    if(obj1 == null)
        return (obj1 == obj2);
    else
        return obj1.Equals(obj2);
}

Object obj1 = null;
Object obj2 = new Object();

Console.WriteLine(obj1.Equals(obj2));

What do you think the outcome of the code above is? Does it compile? Do we get a run-time error? Can we call methods on an object that is null(obj1)? We sure were not able to up to now. The code above compiles and runs without issue. The reason is that you can actually call extension methods on NULL objects ! If you think about it, it kind of makes sense. After all we talked about extension methods being syntactic sugar on top of the language. So actually what is happening is the compiler creating a static class containing the method and passing the object as an argument. There is no problem with an argument being NULL now is it ?
That's all there is to say on extension methods. Next we will look at the extension methods added to .Net 3.5.

IEnumerable<T> Extension Methods

This interface is one those extended with extension methods in .Net 3.5. A host of extension methods that work on the sequence of Ts yielded by the type. Filtering, aggregation, projection, search, grouping are just a few of the uses of these added methods. It is really fun to play around with the methods of this interface. I can't really look at them all in this post and I haven't even used all of them but I'll go through the filtering operation of Where<T>, projection in Select<T> and maybe some groupings.

Where<T> extension method

If you're familiar with SQL you can think of this method as a counterpart to the where clause in a select statement. If you aren't familiar with SQL then an example should make it clear:

Enumerable.Range(1, 10).Where(x => x < 5);

In the example above first the static method Range of the Enumerable static class is used to get the numbers between 1-10. Notice that the return value of this function is an IEnumerable. This method is defined using an iterator block which uses deferred execution. The Where extension methods is then used combined with a Lambda expression to get all the numbers that are between 1-10 and are smaller than 5. This is really where, whatever we have talked about in the previous posts all come together. In that one line of code a lot of powerful concepts can now be see. The use of iterator blocks, static classes, generics, two phase type inference and lambda expressions. These have all resulted in a lazy filtering of elements of a collection without any effort on your part. You may be thinking that Where should have a very complicated implementation. But it turns out that, that is far from the truth. You can implement your own Where method with just a few lines of code as below:



public static IEnumerable<T> Where<T>(this IEnumerable<T> sequence, Func<T, bool> predicate)
{
    if (sequence == null || predicate == null)
        throw new ArgumentNullException();

    foreach (T element in sequence)
    {
        if(predicate(element))
            yield return element;
    }
}
There is nothing to stop us from using a method group or a delegate instead of the lambda expression but usually as you are chaining these methods together you don't usually end up doing something that a lambda expression cannot handle.

Select<T> extension method

This method is also called the projection method. Basically what it does is receiving an input, project that object into another object by either selecting a part of it or adding to it, etc. For example you may have an object that carries around a lot of information and you may want to populate a grid view with the said object. One easy way of populating grid views is to assign a collection to the DataSource property. Since you can't really assign a collection of the entire object you can just select a piece of the object and add them to a collection and then set the property. In the inverse case you can create an anonymous type that contains an object:

var x = Enumerable.Range(1, 10).Select(x => new {Number = x, Inverse = 1/x});

Here I have created a new type that contains both the number and its inverse.

GroupBy<T> extension method

The GroupBy extension method has many overloads and is a very powerful tool. This extension methods would group the elements of the sequence according to a key which is designated in the first argument. The return value of this extension method is IEnumerable<IGrouping<TKey, TElement>>. The IGrouping<TKey, TElement> type is actually inherited from IEnumerable<T> and it also contains a property "Key". It is now kind of clear what this type is there for. The GroupBy method would return a enumeratable list of IGroupings which each contain a key and can also be enumerated. The simplest form can be seen in the example below.
var persons = new [] {
                        new { Name = "John", Age = "23"},
                        new { Name = "Mary", Age = "21"},
                        new { Name = "Joan", Age = "24"},
                        new { Name = "Tom", Age = "24"},
                        new { Name = "Hank", Age = "22"},
                        new { Name = "Steve", Age = "22"},
                        new { Name = "Bella", Age = "22"},
                    };

Console.WriteLine(
                    persons.GroupBy(p => p.Age)
                            .OrderBy(p => p.Key)
                );
In order to order a list by more than one field you can easily add a .ThenBy() to the end of the chain. Something to note here though is that the actual sequence's order is not changed by these commands. Most operations done in LINQ have been made to be side effect free. These operations just make copies of the sequence, make changes and pass it along.
There are some points to remember when deciding to write your own extension methods. Just like implicit typing there are pros and cons to defining extension methods. You have to realize that the code that you write in a group development environment is different than code you write for yourself. You now have to know your audience and your maintainers. You don't want to surprise any of the people working with the code. Extension methods could be really confusing for someone who's not familiar with them. They can also cause all kinds of problems. For example if you defined a method which is added to the framework in the next version, your code will break. Unless your lucky enough to get the exact same implementation of the method added. You can also cause a lot of problems if you don't define the extension methods in the right namespace. You definitely don't want to get all the extension methods of IEnumerable in your intellisense suggestions if you're not using them? Just think long and hard about defining your own extension methods and put them in the right namespace and with the right name. It may be worth it to standardize your extension methods naming scheme so that every person in the group knows it when they're calling an extension method.
Well...this is about it ! Guess what? You now know LINQ ! You just don't know that you do yet. LINQ is just syntactic sugar over all we have learned so far. It only allows a more familiar syntax for queries. Everything will fall into place with the next post which is officially about LINQ containing query expressions and LINQ to objects.
Stay tuned !

Monday, 11 March 2013

Lambda Expressions and Expression Trees

This chapter again is all about delegates. It might be a good idea to review delegates in previous posts if you think you don't remember much from them. Lambda expressions are here to make it easier and more straightforward than ever to create delegates.

Lambda Expressions

Lambda expressions are so called because Lambda calculus in Computer Science and Math deals with the definition and manipulation of functions. Other than the name I haven't seen the use of lambda calculus anywhere else so don't get intimidated by the name.
Since lambda expressions can be considered as a special case of anonymous methods and we have already covered them, I'm going to jump right into the syntax. There are many different forms of allowed syntax For lambda expressions and that is due to their wide use across different scenarios. As we get closer to LINQ, you'd see that they are used all over the place. Of course, as I said they can always be replaced by the more general anonymous methods. Here's a general form for lambda expressions :

        (explicit or implicit list of input arguments) ⇒ {code block}

As you see the most general form is quite similar to anonymous methods with the small difference of having the new ⇒ symbol and no mention of the delegate keyword. You can explicitly type the input arguments as well as have the compiler infer them. Also, another shortcut in the syntax, which is also the more important is that you can put a single expression in the {code block}. Just like anonymous methods the blocks needs to return a value. In the case that there is only one expression, this value is that expression. for example, in the Lambda expression below, the return value of the lambda expression is each employee's name :
    
    List<person> persons = new List <person>(); 
    persons.FindAll(p => p.Name == "Joe");

In the above example, the lambda expression has one input parameter(p). That is implicitly of type Person. The "=>" operator should be read as "goes to" when reading Lambda expressions. So in this example, p goes to p.Name which is the hypothetical person's name. What the above statement accomplishes then, is searching all elements of the list and then returning the element that matches the criteria. As we know FindAll expects the generic delegate Predicate<T>. The lambda expression is implicitly converted to a delegate instance and called for each element by the FindAll method. It is important to know that Lambda expressions by themselves don't represent a type. This is due to the fact that they can be converted to both delegate instances and expression trees which we'll cover in the next section.
You can imagine all the other forms that the Lambda expression can take by looking at the transformation from anonymous methods by removing/adding types/parantheses/etc. But the most popular form and the one that you will be using most is the one we just covered.
Captured variables are handled the Same way as anonymous methods here. You have to be careful about closures here just like we discussed in the post about anonymous methods.

Expression Trees

As their name suggests, expression trees are trees that have expressions as their nodes. Each node contains an expressions which after evaluation would be a child node of another expression. Expression trees and Lambda expressions are at the heart of LINQ. It is important to know why we need expressions trees in the first place. 
LINQ is here to streamline the process of querying objects, databases, XML documents, etc. It is here to provide one language and set of operators to be applied uniformely across all these inherently different topologies. The same thing that SQL in databases and VMs in cross platform development environments are doing. In order to query each specific topology you have to define and use custom queries. For example, the syntax to query a database using SQL is different than the query language used for XML(XQuery). In order to use the same syntax for all these different platforms, one can define a common series of operations possible on each platform and then translate these common operations for each respective topology. In order to do so, the program that is written using the common operations should be analyzed and translated. This means that the code itself is the data input to the translater.
The concept of using code as data then is key to creating such a mechanism. Expressions trees do just that. Then enable us to represent a piece of code in tree data structure which can later be parsed and analyzed. In .Net 3.5, the Expression class provides all the functionalities to do this. This class allows representation of actual C# code in a data structure which can then be parsed in the same or another program (in-process vs out-process). Expression trees in .Net 3.5 only allow some encapsulation of certain operations. In order to gain complete control over dynamically generated code one still has to use the CodeDom (a library to create language-independent dynamically generated code in .Net).

In the Expression namespace, each class extending the base abstract Expression class has two main attributes associated with it:
  • Type: This attributes is the actual .net type of the expression. Kind of like a return value of a function.
  • Node Type: Node type is selected from a defined enumeration in the Expression namespace. As we said before all different types of expressions derive from the Expression abstract base class. This is all fine and jolly but what should be we with all these different kinds of expressions that share a common structure ? We definitely want to end of with a huge inheritance chain. Here the design decision is to extend the hierarchy not in depth, but in breadth. Each general expression type as in a binary expression is grouped under a binary expression and different "Types" are defined for them. All supported node types for the binary expression class is defined here.
There is really no easy way to tell you how expression trees work unless to do this with an example. So here is an example of an expression tree that writes an output to the console.

    Type console = Type.GetType("System.Console");
    MethodInfo method = console.GetMethod("WriteLine", new[] { typeof(String) });

    //The target is null since WriteLine is a static method.
    ParameterExpression lambdaParam = Expression.Parameter(typeof(String));
    Expression methodCall = Expression.Call(null, method, new[] { lambdaParam });
         
    var rootExpression = Expression.Lambda<Action<string>>(methodCall, new[] { lambdaParam });
    var compiledMethod = rootExpression.Compile();
    compiledMethod("This was generated using an expression tree");
 
    Console.ReadKey(true);

I realize that the above code can be hard to grasp at first but bear with me for a little bit longer and soon you'd be able to get back to this code and understand it completely. In the above example, you can see how the Expression class provides you with factory methods to create expressions. Here we needed to create a single statement that would output a string to the standard output. If you think about this in .Net terms, this means calling the "WriteLine" method of the Console type.
 In order to translate this into the expression tree world, you first have to dissect the statement and see what are the elements of the code and how are they represented in the Expression namespace. Chances are that they may not be represented ! After all the Expression namespace is not here to replace the CodeDom ! At least not yet. Anyways, you need to specify a function call, specify the method used in this function call and then specify the target of the function(instance object) and the parameters.
 How do we provide information about a type or method? The Reflection namespace of course. Discussing what reflection is and how it is used is outside the scope of this article. In short, we will be able to specify a type or method's fully qualified name and get the .Net framework to search for and find the type we are talking about. This would allow specification of actual types with strings which would be searched for at run-time. If you've never heard of reflection before, this should feel very messy and counter-intuitive from an object oriented point of view, but it adds a lot of flexibility and its useful.
 Getting back to the issue at hand, we use the reflection namespace to obtain information on the method and it's parameters both of which, we supply with a string. What's left is specifying the expression types for each of the expressions and we have given the compiler all it is to know about the statement ! In the end we can ask the compiler to compile the expression tree into an element of type Expression<TDelegate>
Now things get a little bit confusing here due to a naming system used in the inheritance chain. So far we know that different types of expressions inherit from a base class called Expression. Classes like BinaryExpression, ConditionalExpression, LambdaExpression and so forth. Now there is a special generic class that extends the LambdaExpression class called the Expression<TDelegate> class. This class represents a strongly typed lambda expression and has a Compile() method which if used would create an actual Lambda expression whose type is specified by the TDelegate type parameter. In the example the type of the delegate is Action<String>. The .net library provides some premade and ready to use delegate types. This is quite handy as they are generic and have basically abolished the need to create a delegate types ever. Here we needed to specify a delegate type that accepts a string as a parameter and doesn't return anything. Action<T> and it's brothers(Action<T1, T2>, Action<T1, T2, T3>, etc) provide the types for void methods that receive 1 to 17 parameters(in .Net 4.0). What are we going to do if the method has a return type ? There is 17 generic delegates declared for that too(Func). Anyways, we use the Action delegate to represent the expression for the LambdaExpression class and call the Compile method. This method returns an actual delegate that can now be called like any other ! We have just dynamically created a delegate at run-time ! That looked like a lot of word for a rather simple task and indeed it is ! The good news is that .Net allows auto conversions of lambda expressions in their simplest case to expression trees. You just have to assign a Lambda expression to a Expression<TDelegate> instance and the compiler would do the rest. This can be seen in the example below:
    
    Expression<Action<String>> expression = (p) => Console.WriteLine(p);
    expression.Compile()("Hello");

The more complex lambda expressions which contain loops or condition blocks or even a single return statement are not supported in .net 3.5 but in .net 4.0 they were added since they were needed in the DLR framework.
The rest of the chapter in the book deals with the new rules on overload resolutions and type inference rules. In C# 2.0 each parameter was resolved independently and not much inference was going on. In C# 3.5 with the introduction of inferred types, lambda expressions and LINQ, type inference is basically going on everywhere. The rules are complicated but the gist is that in .net 3.5 type inference and parameter resolution
uses a collaborative effort among the different parameters; meaning that each parameter now can add some information to the inference process. This is needed to resolve some the method group resolutions and lambda type resolutions. I would not delve into any details on this. In case you are interested you can check out the language specifications which contains the rules or read the chapter on this in the book which goes into it in more detail.
Next up are extension methods and then we have basically covered the language side of LINQ.

Monday, 4 February 2013

Starting Coding in C# 3.0

From Chapter 8 on-wards we will introduce all the bits and pieces that would eventually go hand-in-hand to accommodate the creation of LINQ. In this chapter we would look at Automatic Properties, Implicitly typed local variables, Object and collection initializers, Implicitly typed arrays and Anonymous Types.

In some ways from this post on we will pave the way into LINQ. Basically after we cover anonymous types (this post), expressions trees and lambda expressions (next post) and extension methods (post after next) we have covered all the bits and pieces that enable LINQ. What is only introducing the syntax of query expressions and we're done ! So basically think of this post and the later ones as learning the under-pinnings of LINQ. What is really interesting about this approach is that not only would you learn LINQ and see it's power but when it is introduced you will know how it's actually working underneath. This would solve all the confusion about how to optimize your queries, when to use LINQ and etc.

Automatic Properties

Automatic Properties is a feature that is easy to learn and implement. It is one of those features that you would use a lot if you learn it. I have. Remember all that extra code you had to write just to enable encapsulated access to your private fields ? Although it was only a few lines of code but it created extra un needed clutter when you implemented a few trivial properties. Trivial is meant properties that lack any kind of validation or logging and are just simple access providers. You can now create this type of property with one line of code:

C# 2.0:
private int ID;
public GetID
{
    get
    {
         return ID;
    }
    set
    {
        ID = value;
    }
}

C# 3.0:
public GetID {get; set;}

You can use access modifiers for the getter/setters and make the automatic property static.Wandering what the static automatic property may be used for ? How about setting a private setter and public getter and then implementing a Singleton object ? You would need a static automatic property for that as seen below:

public class Singleton
{
    public static Singleton()
    {
        SingletonInstance = new Singleton();
    }
    public static Singleton SingletonInstance {private set; public get;}
}

Notice that we have not guaranteed thread safety in the above example and neither would the .Net framework. Actually automatic properties don't have anything to do with the CLR and are just some syntactic sugar added to the language by the nice people designing the C# compiler.

Implicitly typed local variables

In the first few posts on C# we covered the fact that the C# type system is static, explicit and safe. This is due to the fact that each and every variable has a known explicit type which remains static throughout its lifetime. It is also safe as opposed to the type system in C++. This fact has been entirely true up to C# 3.0 . In C# 3.0 the concept of implicitly typed local variables is introduced. Notice that the fact that C#'s type system is static and safe remains true until C# 4.0 where the static property is also challenged.
In order to define an implicitly typed local variable in C# one has to use the "var" keyword instead of the type descriptor in front of a local variable:

private int ID;
1) Type varWithType = new Type();
2) var varWithType = new Type();
3) var varWithType = new anotherType();
There are few things to take away from the above example. The first line of code defines a variable with an explicit type "Type". In the second line the same variable is defined now with an implicit keyword. The most important thing to realize here is that the compiled versions of both the first and second line are exactly the same thing. This means that we have been rather careless to say that now C# allows implicitly typed local variables. Because if that was true then we could have line 3 right after line 2. But this would result in a compile error. There are some limitations on where the "var" keyword can be used. The limitations are as follows:

  • It can'e be used when defining static or instance methods. The variable should be local variable. 
  • The variable defined should be initialized as part of the declaration.
  • It cannot have the value NULL.
  • You can't have multiple declarations in the same line when declaring one with the var keyword.
  • The initialization expression cannot be an anonymous function(delegates and lambda expressions) or method groups(Can guess why?).
In some cases above, the "var" keyword can be used but a cast is needed to tell the compiler which type we want to use but this is against the purpose of using var !
With great power comes great responsibility. It is important to know that although you can now define all your local variables as implicitly typed, it doesn't really mean that you should. As we'll see later on in this post the var keyword is there to fit in the bigger picture of anonymous types which are there to account for the bigger picture which is LINQ. So unless you want to use an anonymous type where you don't really know the type or are using LINQ there is not much need to declare things implicitly. 
Unless you have a good reason to do so. It all comes down to how you want to represent your code to the code maintainers. Do you want to emphasize on the algorithm rather than the extra fluff of types and variables ? Go ahead and use var a lot. Do you want them to be able to tell which types you're using because they're significant ? Try not to use var at all. Balance is key.

Object, Collection and Array Initializers

In C# 3.0 there are another set of features introduced as syntactic sugar to further streamline the creation of objects. This again fits into the bigger picture of LINQ. I will give an example of their use all in one snippet as I think they are easy enough to be learned all at once.

   class Article
   {
      public Article(string _title) { this.title = _title }
      public Article() {}

      string Title {get; set;}

      List<Line> lines = new List<Line>();
      List<Line> Lines { get{ return lines; } }
   }
   class Line
   {
      public string[] Words {get; set;}
      public Line() {}
   }

   ...

   Article sampleArticleObject = new Article
   {
      Title = "Banana Joe was announced as PM for Canada",
      lines = new Line 
                     { 
                        Words = new [] {"First", "line", "of", "article"}, 
                        Words = new [] {"Second", "line", "of", "article"},
                        Words = new [] {"Last", "line", "of", "article"},
                     }      
      
   }

There are a couple of things to mention here about each of the new features. We'll go through them one by one:

  • Object initializers: these allow you to declare and initialize thw object inline with the declaration. the important enhancements allowed with this feature is not only the number of lines you'll save but the fact that now the initialization and declaration are now in a single expression. This means that you can use this feature to pass an object and initialize it at the same time. although this is not really recommended since as always features that allow better readability at times cause quite the opposite effect when used incorrectly. the second thing to notice here is the amount of extra fluff reduced with this feature. Notice that now the actual data is much more emphasized compared to the situation where each member is assigned to with a property access on a separate line. Now the entire object and its contained data are initialize in a single line. Another thing to note is the added flexibility with the ability to mix and match the use use of constructors and initializers. This effectively allows passing of some parameters to the constructor(if one exists) and using the rest in the initialization block. Object initializers allow the creation of "embedded objects" as well. This means that if an object contains a reference to another member object, while initializing the container object, one can initialize the child as the container class is being initialized using the same syntax. 
  • Collection Initializers: As seen above, collections like List can be initialized now inline with the declaration as well. The story here is a little more interesting however since each collection usually has different Add () methods with different signatures . Here the C# team had to make a design decision between academic purity and flexibility. They went with flexibility here. The story is that for a collection to be able to be initialized inline it has to have an Add method to be used for each element specified in the initialization block. One way to enable this is to force the class to be initialized to implement the ICollection interface. This was the case in the draft of C# 3.0 language specification. On the other hand, this means a lot of limitation for the implementing classes. For example the Dictionary class had to now have an add method with only one parameter. This led to a change of strategy. The team decided to allow any Add method with any signature. The only requirement now was for the class to implement a method named "Add". In order to make sure that the class is indeed a collection what was left now was only IEnumerable which should be implemented by the class for the inline initialization to work. Interestingly enough as Jon explains, it seems like this condition is never used in the implementation of initialization blocks at all.

Anonymous Types

It is rather point-less to talk about anonymous types without being able to show you how they make life so much easier when used in LINQ. But for now think of a scenario where you query a database for some data. You usually select certain columns from that table and return the results. What if the same scenario happened with objects ? 
As the name suggests anonymous types allow the declaration of types without creating a new type. Strange ? Not really, here is an example:

    var anonType = new { Name = "Joe", Address="Banana Street" };

As you can see the syntax is somewhat similar to what we have see so far with object and collection initializers. Of course this can be combined with them as well. Also you don't really have to initialized anonymous types using only constants. The RHS of the assignment operation could be an expression or a method call too. Take note that we indeed cannot infer the type of the anonType variable here ! So now the only keyword that we CAN use is var. Now in order to select only certain properties from an object we can do something like this:
    Person p = new Person();
    var anonType = new[] 
    {
       new { Name = p.Name, Address= p.Address },
       new { Name = p.Name + " Big Joe", Address= p.Address + " back door" }
    };

What I've done here is define an array of anonymous types. This shows that both the two anonymous types declared above should have the same actual type ! otherwise how were they categorized under the same array ? Indeed, the C# compiler considers two anonymous types with the same number of property name and types and order to be the same. If you change any of the order, type or name elements, the type is different. I really don't want to talk about anonymous types much longer as we will see them all over the place from now on and in LINQ. Even if you feel like you don't have the handle on them yet, they will become second nature as we move more into LINQ. Next post is on Expression Trees and Lambda Expressions. Those are my most favorite additions in C# 3.0 followed closely by extension methods which are the post after next. Stay tuned.

Sunday, 27 January 2013

Finishing up on C# 2.0 and moving to 3.0

Hello again,

It's been a while since I posted. The new year's holidays are to blame ! With a rather late happy new year to everyone we'll begin.

In this post, I will be covering Chapter 7 of the book. Chapter 7 finishes up on all the new features in C# 2.0. I won't be covering Chapter 7 in detail since the features are not necessary used that often. On the other hand, it's worth it to know they are there so that when you get into a situation where they can be useful, you know they're there and you can go and read in detail how to use them. So far we have introduced 4 major features with C# 2.0. Generics, delegate improvements, nullable types and iterator blocks. Here is a list of some other features worth mentioning with a brief overview:

  • Partial Types: If you have ever used the VS designer to design forms. You have noticed a method in one of your project's classes called InitializeComponents(). You should have never changed the content of this method as it was auto generated by the designer. Having an auto generated part for the code and having manually entered code in the same file is dirty. It would be much better for the designer to write to a single file and then rewrite it whenever it wishes and for the developer to write in an separate file and not have to worry about the auto-generated sections. This is where partial types come into play. They give us the ability to define functionality for a single entity in separate files. At the time of compilation, the compiler would merge all these files into one and create the class. The list of uses go on in the book: Unit tests, dividing a bloated class into smaller functional units, etc. In C# 3.0 we get partial methods. These methods can be defined in the auto generated class and then the manually written class can define functionality for that hook and that functionality is going to be executed as if it was implemented in the auto-generated class. Before partial methods the auto-generated class had to define an event and the manually written one would have to subscribe to it. But this is much more elegant since the hooks that are not used by the manually written file would be deleted during compilation and you wouldn't get a bunch of unused even publishers.
  • Static Classes: This feature was introduced to clarify the use of utility classes in the project. Utility classes are common place among projects. Before you had to declare a utility class like so:
        public sealed UtilityClass
        {
            private UtilityClass() {}
            public static Method1()
            {
               ...
            }
            public static Method2()
            {
               ...
            }
        }
        
    The private constructor was to keep others from instantiating your class. Why put it there in the first place ? because the C# compiler by default supplies a public parameter-less constructor with a class that doesn't implement a constructor. It wouldn't do that with a class that has declared one though. That's why we declare one but make it private and empty. The sealed keyword is also there to keep others from inheriting from this class since the static UtilityClass doesn't have anything to specialize since all its members are static. Of course this is all fine and it works. But defining an empty parameterless private constructor is ugly. Also, we can't really keep some from using this class as a type now and a statement like UtilityClass a = null; would run without any compile errors. We want the compiler to do some compile-time checking for us in this case and don't allow such code from compiling. This is exactly what a static keyword in the class definition would do:
        public static UtilityClass
        {
            public static Method1()
            {
               ...
            }
            public static Method2()
            {
               ...
            }
        }
        
    You no longer need to specify the constructor or the sealed keyword. The compiler would also make sure that the class is used properly.
  • Different access modifiers for property getter/setter: Before we were forced to have the same access modifier level for both a setter and getter of a property. It is not really uncommon for you to want to be able to change a property in the class itself and not the world outside. This is was not really possible at least without trying to circumvent it with an instance method. In C# 2.0 you can now assign a private setter and a public getter to accomplish the task.
  • Namespace aliases: This feature is very useful if you want to use different versions of a type or are using two types that have the same qualified name. You can assign different aliases to their namespace when importing them and use the alias to reference them. For more details on this read section 7.4 of the book.
  • Pragma Directives: These are compiler preprocessor commands just like #if, #elif, etc which allow conditional compilation of the code. There are two pragma directives that work with the the Microsoft C# compiler(for other compilers like Mono consult the latest documentation). The two pragma directives are warning and checksum. Warning is used to suppress warnings generated by the compiler and the checksum is used in ASP.net to detect the right source code to debug.
  • Fixed-size buffers in unsafe code: These are used to represent unsafe arrays of fixed size. They are mentioned here only for the sake of completeness.
  • Friend Assembiles: This feature allows to set an assembly as a friend to another source assembly. This means that the friend assembly has access to all internal members of the source assembly. The only use that Jon thought of is in unit testing where you usually have to set your members to public just to test them. This is kind of ugly because you may forget to set them back. By setting the test class as a friend the test class would be able to test the source class(notice that the class still wouldn't have access to private members).
This concludes covering of Chapter 7. I will cover Chapter 8 in the next post.