Monday, 11 March 2013

Lambda Expressions and Expression Trees

This chapter again is all about delegates. It might be a good idea to review delegates in previous posts if you think you don't remember much from them. Lambda expressions are here to make it easier and more straightforward than ever to create delegates.

Lambda Expressions

Lambda expressions are so called because Lambda calculus in Computer Science and Math deals with the definition and manipulation of functions. Other than the name I haven't seen the use of lambda calculus anywhere else so don't get intimidated by the name.
Since lambda expressions can be considered as a special case of anonymous methods and we have already covered them, I'm going to jump right into the syntax. There are many different forms of allowed syntax For lambda expressions and that is due to their wide use across different scenarios. As we get closer to LINQ, you'd see that they are used all over the place. Of course, as I said they can always be replaced by the more general anonymous methods. Here's a general form for lambda expressions :

        (explicit or implicit list of input arguments) ⇒ {code block}

As you see the most general form is quite similar to anonymous methods with the small difference of having the new ⇒ symbol and no mention of the delegate keyword. You can explicitly type the input arguments as well as have the compiler infer them. Also, another shortcut in the syntax, which is also the more important is that you can put a single expression in the {code block}. Just like anonymous methods the blocks needs to return a value. In the case that there is only one expression, this value is that expression. for example, in the Lambda expression below, the return value of the lambda expression is each employee's name :
    
    List<person> persons = new List <person>(); 
    persons.FindAll(p => p.Name == "Joe");

In the above example, the lambda expression has one input parameter(p). That is implicitly of type Person. The "=>" operator should be read as "goes to" when reading Lambda expressions. So in this example, p goes to p.Name which is the hypothetical person's name. What the above statement accomplishes then, is searching all elements of the list and then returning the element that matches the criteria. As we know FindAll expects the generic delegate Predicate<T>. The lambda expression is implicitly converted to a delegate instance and called for each element by the FindAll method. It is important to know that Lambda expressions by themselves don't represent a type. This is due to the fact that they can be converted to both delegate instances and expression trees which we'll cover in the next section.
You can imagine all the other forms that the Lambda expression can take by looking at the transformation from anonymous methods by removing/adding types/parantheses/etc. But the most popular form and the one that you will be using most is the one we just covered.
Captured variables are handled the Same way as anonymous methods here. You have to be careful about closures here just like we discussed in the post about anonymous methods.

Expression Trees

As their name suggests, expression trees are trees that have expressions as their nodes. Each node contains an expressions which after evaluation would be a child node of another expression. Expression trees and Lambda expressions are at the heart of LINQ. It is important to know why we need expressions trees in the first place. 
LINQ is here to streamline the process of querying objects, databases, XML documents, etc. It is here to provide one language and set of operators to be applied uniformely across all these inherently different topologies. The same thing that SQL in databases and VMs in cross platform development environments are doing. In order to query each specific topology you have to define and use custom queries. For example, the syntax to query a database using SQL is different than the query language used for XML(XQuery). In order to use the same syntax for all these different platforms, one can define a common series of operations possible on each platform and then translate these common operations for each respective topology. In order to do so, the program that is written using the common operations should be analyzed and translated. This means that the code itself is the data input to the translater.
The concept of using code as data then is key to creating such a mechanism. Expressions trees do just that. Then enable us to represent a piece of code in tree data structure which can later be parsed and analyzed. In .Net 3.5, the Expression class provides all the functionalities to do this. This class allows representation of actual C# code in a data structure which can then be parsed in the same or another program (in-process vs out-process). Expression trees in .Net 3.5 only allow some encapsulation of certain operations. In order to gain complete control over dynamically generated code one still has to use the CodeDom (a library to create language-independent dynamically generated code in .Net).

In the Expression namespace, each class extending the base abstract Expression class has two main attributes associated with it:
  • Type: This attributes is the actual .net type of the expression. Kind of like a return value of a function.
  • Node Type: Node type is selected from a defined enumeration in the Expression namespace. As we said before all different types of expressions derive from the Expression abstract base class. This is all fine and jolly but what should be we with all these different kinds of expressions that share a common structure ? We definitely want to end of with a huge inheritance chain. Here the design decision is to extend the hierarchy not in depth, but in breadth. Each general expression type as in a binary expression is grouped under a binary expression and different "Types" are defined for them. All supported node types for the binary expression class is defined here.
There is really no easy way to tell you how expression trees work unless to do this with an example. So here is an example of an expression tree that writes an output to the console.

    Type console = Type.GetType("System.Console");
    MethodInfo method = console.GetMethod("WriteLine", new[] { typeof(String) });

    //The target is null since WriteLine is a static method.
    ParameterExpression lambdaParam = Expression.Parameter(typeof(String));
    Expression methodCall = Expression.Call(null, method, new[] { lambdaParam });
         
    var rootExpression = Expression.Lambda<Action<string>>(methodCall, new[] { lambdaParam });
    var compiledMethod = rootExpression.Compile();
    compiledMethod("This was generated using an expression tree");
 
    Console.ReadKey(true);

I realize that the above code can be hard to grasp at first but bear with me for a little bit longer and soon you'd be able to get back to this code and understand it completely. In the above example, you can see how the Expression class provides you with factory methods to create expressions. Here we needed to create a single statement that would output a string to the standard output. If you think about this in .Net terms, this means calling the "WriteLine" method of the Console type.
 In order to translate this into the expression tree world, you first have to dissect the statement and see what are the elements of the code and how are they represented in the Expression namespace. Chances are that they may not be represented ! After all the Expression namespace is not here to replace the CodeDom ! At least not yet. Anyways, you need to specify a function call, specify the method used in this function call and then specify the target of the function(instance object) and the parameters.
 How do we provide information about a type or method? The Reflection namespace of course. Discussing what reflection is and how it is used is outside the scope of this article. In short, we will be able to specify a type or method's fully qualified name and get the .Net framework to search for and find the type we are talking about. This would allow specification of actual types with strings which would be searched for at run-time. If you've never heard of reflection before, this should feel very messy and counter-intuitive from an object oriented point of view, but it adds a lot of flexibility and its useful.
 Getting back to the issue at hand, we use the reflection namespace to obtain information on the method and it's parameters both of which, we supply with a string. What's left is specifying the expression types for each of the expressions and we have given the compiler all it is to know about the statement ! In the end we can ask the compiler to compile the expression tree into an element of type Expression<TDelegate>
Now things get a little bit confusing here due to a naming system used in the inheritance chain. So far we know that different types of expressions inherit from a base class called Expression. Classes like BinaryExpression, ConditionalExpression, LambdaExpression and so forth. Now there is a special generic class that extends the LambdaExpression class called the Expression<TDelegate> class. This class represents a strongly typed lambda expression and has a Compile() method which if used would create an actual Lambda expression whose type is specified by the TDelegate type parameter. In the example the type of the delegate is Action<String>. The .net library provides some premade and ready to use delegate types. This is quite handy as they are generic and have basically abolished the need to create a delegate types ever. Here we needed to specify a delegate type that accepts a string as a parameter and doesn't return anything. Action<T> and it's brothers(Action<T1, T2>, Action<T1, T2, T3>, etc) provide the types for void methods that receive 1 to 17 parameters(in .Net 4.0). What are we going to do if the method has a return type ? There is 17 generic delegates declared for that too(Func). Anyways, we use the Action delegate to represent the expression for the LambdaExpression class and call the Compile method. This method returns an actual delegate that can now be called like any other ! We have just dynamically created a delegate at run-time ! That looked like a lot of word for a rather simple task and indeed it is ! The good news is that .Net allows auto conversions of lambda expressions in their simplest case to expression trees. You just have to assign a Lambda expression to a Expression<TDelegate> instance and the compiler would do the rest. This can be seen in the example below:
    
    Expression<Action<String>> expression = (p) => Console.WriteLine(p);
    expression.Compile()("Hello");

The more complex lambda expressions which contain loops or condition blocks or even a single return statement are not supported in .net 3.5 but in .net 4.0 they were added since they were needed in the DLR framework.
The rest of the chapter in the book deals with the new rules on overload resolutions and type inference rules. In C# 2.0 each parameter was resolved independently and not much inference was going on. In C# 3.5 with the introduction of inferred types, lambda expressions and LINQ, type inference is basically going on everywhere. The rules are complicated but the gist is that in .net 3.5 type inference and parameter resolution
uses a collaborative effort among the different parameters; meaning that each parameter now can add some information to the inference process. This is needed to resolve some the method group resolutions and lambda type resolutions. I would not delve into any details on this. In case you are interested you can check out the language specifications which contains the rules or read the chapter on this in the book which goes into it in more detail.
Next up are extension methods and then we have basically covered the language side of LINQ.

No comments:

Post a Comment