Tuesday, 30 October 2012

Generics(The basics)

I have talked about generics in this blog's first post and how it is what I think the biggest contribution of C# 2. In this post I'll try to summarize what I learned from Chapter 3. This chapter is quite long(50 pages) and it gets kind of complicated at some point. Even so it provides the reader with a good explanation of how certain mechanisms of generics work in the framework.
We saw how C# 1's collection namespace lacked the strong typing needed for better compile time type checking and in a lot of scenarios created performance issues by the excess use of explicit and implicit(foreach) casts.
We are going to continue on the same front. Generics in C# can be thought to have two types:

  • Generic Types: These are basically wrapper classes around a group of elements(classes, delegates, interfaces and structures). Depending on the generic wrapper used, these elements may be held in an actual array(Lists<T>), doubly linked list(LinkedList<T>), hashtable(Dictionary<TKey, TValue) or a red-black tree(SortedDictonary<TKey, TValue>).
  • Generic Methods: These are methods that introduce generic types themselves. These are more complicated to understand. We'll get back to them after we cover generic types.
As you saw above generics are usually shown by a name appended by a "<>" sign and with some words delimited by commas in between. The words within the angled brackets are called type parameters which are placeholders for any type that can be accepted for that generic. When we work with a certain generic type we have to declare the real type that the generic type would be working with. The real types are called type arguments. This is the same kind of idea of Parameters/Arguments in functions. As an example if we want to implement an actual dictionary we would have to declare our Dictionary<TKey, TValue> type to have type arguments of string. So the type parameters are going to be replaced by the type arguments in the declaration:

    Dictionary<string, string> dic;

The generic types having its type parameters is called a unbound generic type since the type of it's arguments are still not known(Dictionary<TKey, TValue>). When a type's type arguments are specified it turns into a constructed type. Although the terminology is kind of confusing at first, we'll see why they are needed to understand some concepts as we move towards advanced generics. The dic variable for example in the above snippet is an instance of the constructed type (Dictionary<string, string>) which is a dictionary of strings to strings.
Generic methods are methods that introduce new types. Notice that methods that accept a constructed generic type or return a constructed generic type are not generic methods for example the function below is not generic:

public List<string> GetStrings(string[] stringArray);

The best way to understand generic methods is seeing examples of them. The ConvertAll<> method of the Array and List<T> types is a good example. This method has the following signature:

public List<TOutput> ConvertAll<TOutput> (Converter<TInput, TOutput> converter);

This declaration states that the return value of the function ConvertAll is a List of a certain type. Notice that the type is unspecified since TOutput is a type parameter not a real type or an type argument. The method also takes a generic delegate as input. This delegate receives a TInput type and returns a TOutput type and is called converter. The TOutput type returned by the delegate is the same type parameter declared in the function declaration.
Whenever the declarations of generic types or generic methods get too complicated to be understood abstractly. It is easier to replace type parameters with some arguments. Try this technique with the above generic method and things would start to make sense.
To see this method in action the snippet below uses this method to convert elements of a list from strings to integers:
 
public static int convert(string s)
{
    int num = 0;
    for(int i = 0; i < s.Length; i++)
        num += s[i];
    return num;
}
...
List<string> strings = new List<string>();
strings.Add("Hello");
strings.Add("This is a test");
strings.Add("Hey hey hey");
List<int> integers = strings.ConvertAll<int>(convert);
for (int i = 0; i < integers.Count; i++)
    Console.WriteLine(integers[i]);
Console.ReadKey();
 
Unless you have experience with generic methods before this should still seem confusing. But don't worry it is just going to get even more confusing ! But I'm just joking of course...or am I? Anyways, the best to do now is to look at some generic types available throughout the library. I would continue this series of posts on generics with type constraints which let you define the type that the generic can wrap, then we would move to creating our own generic types and then finally move to advanced generics and finally at the end we'll have a comparison between generics in .Net with their counterparts in C++ and Java.

Until then, happy coding.

Wednesday, 24 October 2012

Struct vs Class

Struct vs Class(improving the definition in the previous post)

There is a very important and commonly unknown difference between these two in .Net. The confusion sometimes stems from another source. Not knowing what value types and reference types actually are. Given that we have already gone through that, I will continue to shed more light into this matter. 
The most correct way of differentiating between a struct and a class is that structs are value types and classes are reference types. Also another very well known answer to this question is that structs are allocated on the stack whereas reference types are allocated on the heap. This is also inaccurate as we showed in the previous section. Value types at times may be allocated on the heap. Even local variables when part of an anonymous method ends up on the heap. As Eric Lippert puts it in his article on this, the placing of value types on the stack is just an implementation detail. We should not be thinking of where the CLR is deciding to do with the value type or the reference type. That shouldn't point our decision. The real decision is in the semantics. Is the type a value or a reference ? 
An article in MSDC elucidates this even further. There are some properties that a type should have in order to be a struct. If it doesn't have these properties, just define a class. The properties are as follows:
  • The type represents a single logical value like other primitive values.
  • It is immutable. 
  • It needs less than 16 bytes of memory.
  • It is not boxed frequently.
Now let's see why each of these conditions have to apply. Firstly the type should represent a primitive type. This is what we mean by it being semantically a value type. If the type logically represents a single primitive value, it should be considered for being a value type. 
Also, the type should be immutable. There is a very well known class that is immutable and we work with extensively. Yes, string. Ever wandered why strings are immutable? after all they're a reference type and a value type. The answer to this question is the same reasons we want our value types to be immutable. Firstly why are strings not classes now that they have some properties of a struct? They also represent a single logical value. But there is another side to this.
Strings can easily get over 16 bytes. They are passed around quite a lot.,  they are compared very often(they are often keys in dictionaries), they are usually repeated, operations like copy, range, etc are used often, etc. All of these properties bring lots of head ache if they are mutable. Being passed around creates all sorts of problems for threading since mutable objects have to be locked to avoid race conditions. Being compared often means that we have to parse through the two strings to find out if they are they same, where we could use interning to make all strings like "hello" to point to one memory reference and then we could just compare the memory locations. This also helps with repeated strings as all of them are now basically references to the same location which is immutable and wouldn't change. Also copying and range operations are made very easy with copying being only returning "this" and the range operations maybe being only "this" plus a start and end position maybe ? Basically put, what I want to impress upon you is the amount of compiler optimization is achieved when strings are made immutable. This is why a lot of languages like C#, Java and Python have opted for this design.
Okay, so after this rather lengthy digression. Let's get back to the task at hand. We were analyzing the reasons a type may be defined as a struct. We have come all the way to them being less that 16 bytes. I would say that the reason for this is that simply stacks are not that big ! In windows for example each thread is given an 1Mb array as its stack and this space is shared with lots of other elements. So it makes sense to use it sparingly. 
Last but not least is the boxing issue. This is the most obvious property as the boxing/unboxing operation is expensive and has to be avoided if done in abundance.

Value Types vs Reference Types and Parameter Passing


Value types/Reference types and Parameter Passing

To simply put what's happening in C#: unless you declare it explicitly everything is passed by value. Firstly, it is important to differentiate between value types/reference types and variables being passed as out/ref. When talking about value types, we mean that a variable of a type holds the actual value that the type represents. For example the statement "int a=2;" creates a variable of type "int" which is a value type and then assigns the value 2 to it. On the other hand reference types are variables which represent types like arrays, interfaces, classes and delegates. Variables representing these types do not actually contain that type but hold a reference to it. As an example the statement "StringBuilder sb = new StringBuilder()" creates a StringBuilder object and the assigns its reference to the variable sb. In other words sb does not hold a StringBuilder but a reference to where the object actually lives.
Another misconception about value types and reference types is the saying that "Reference types live on the heap while value types live on the stack". This statement is inaccurate. Variables position in the managed memory depends on the context in which they are declared. All local variables(variables declared inside methods) are stored on the stack. Even if the variable is a reference type, the variable itself is stored on the stack. Instance variables are located on the heap where the actual containing object is. Value types like structs live wherever the variable used to declare them resides. Meaning if the value type is a local variable or a parameter it will live on the stack, if it is a instance member it lives on the heap. Also static variables always live on the heap. 
Now that we know what a reference type and what a value type is, we can continue with different types of parameter passing. In the default form in C# all the parameters are passed by value. What does this mean for value types ? It means that a new variables is created in the callee and the value of the value type is copied to the new variable. Reference type are handled the same way. A new variable is created in the callee and the reference is copied into that new variable. Although both these variables now accommodate access to the object, notice that the they are independent from one another. For example in the previous code what is the value of obj1 after the final line is executed ?
   
public void Foo(StringBuilder obj)
{
    obj = new StringBuilder("Good Bye");
}
public void Bar()
{
    StringBuilder obj1 = new StringBuilder("Hello");
    Foo(obj1);
    Console.WriteLine(obj1);
}  

If didn't get the answer "Hello", try again and see why the answer is not "Good Bye". The reason is that like I said before the value that is being passed to the method Foo is a copy of the reference to a StringBuilder object and since the variable obj in Foo and the variable obj1 are two independent variables then the changes to obj are not noticed by obj1. If obj had made any changes to the actual object however the changes would have been seen in Bar as seen below:
   
public void Foo(StringBuilder obj)
{
    obj.Append(" There");;
}
public void Bar()
{
    StringBuilder obj1 = new StringBuilder("Hello");
    Foo(obj1);
    Console.WriteLine(obj1);
}  

The answer is "Hello There". If we had used a struct instead of the StringBuilder class however since a new value of the struct had been passed to Foo, Bar would now have seen the changes.
Now since we can already use the above mechanism to pass the references of objects by value and then manipulating the object directly, why would be want to use a ref directive?
In order to use the ref directive. One has to put it before the parameter both where the function is called and in the function definition. With this mechanism, instead of a new variable being allocated for the sake of the callee method, the same variable(slot in memory) is used for both the variable being passed and the input parameter. This means that the same code as above now does really result in "Good Bye".
   
public void Foo(StringBuilder obj)
{
    obj = new StringBuilder("Good Bye");
}
public void Bar()
{
    StringBuilder obj1 = new StringBuilder("Hello");
    Foo(obj1);
    Console.WriteLine(obj1);
}  

The out operator is the same as ref with the difference that with this operator the variable being passed does not have to be initialized since it is assumed that it is going to be assigned in the callee. Also it is required to be assigned in the callee and is considered uninitialized in the callee's context.

Type System in C#


Type system in C#

In order to understand what the typing system in C# actually is we first need to define what we mean by statistically/dynamic typing, implicit/explicit typing and safe/unsafe typing. 
C# 1.0 is a statistically typed language; meaning that the type of any variable is known at compile time. There are also no implicit typing either. Meaning that the compiler doesn't have to derive the type of an expression from the code in compile time either, since the type is given by the programmer. Actually C# is completely statistically typed up to C# 4.0 were some dynamic typing was added which will be discussed in the next posts. Starting from C# 3.0 some implicit typing was added to the language as well due to LINQ(with the keyword var). In these types of scenarios, although the explicit type is not mentioned by the programmer, the compiler still infers the type at compile type and we still have compile time type checking. 
Type safety is a very interesting issue. For example in C++ we have all sorts of freedom when dealing with types. We can cast any type to another and the C++ compiler would not complain. For example we can cast a char* to a int* and then with a dereference operator we'll be able to get a number out of the four first bytes of the string ! (with 32 bit integers and little-endian architecture). This of course gives us all sorts of flexibility in our code. But this flexibility also adds quite a lot of rope to hang ourselves with. In C#, this is no longer a case since C# is type safe. Meaning that although you can have type conversions using casting only those compatible conversions are allowed. 
If we define strongly typed languages as those who don't allow any type conversions then C# definitely is not strongly typed. With allowing implicit type conversions certain complex scenarios develop which create the need for runtime type checking. A very interesting example of this is the result of array covariance. Arrays are reference types and implicit casting from an array to another is allowed as long as the elements in the array are convertible. This however does not mean that the language doesn't check for proper covariance:

   
    string[] strArray = new string[10];
    object[] objArray = strArray;
    objArray[0] = new Button();

In the above example we have defined and array of type string and then assigned it by reference to the object array. This requires an implicit cast and type check in compile time. This means that the compiler checks to see if the type of strArray or its elements are compatible with objArray or their elements. If that is the case this conversion is allowed(covariance), otherwise a compile error is thrown. Now, in the next line we try to assign a Button to an element in the object array. This is perfectly legal as long as the compiler is concerned. Since the object array can hold elements of type object. Notice that the compiler cannot know what the actual object array is referencing until runtime. This would result in a runtime error. The reason for that is the fact that both objArray and strArray are referencing the same array of strings. Although objArray is allowed to reference that array it cannot change the type of the array since the type is statically set to string. Consequently, the runtime knowing the type of the array disallows an operation to store anything other than strings in that position.

Delegate Type vs Delegate Instance

In the previous post I talked about the evolution of C# as a language. In this post I'm going to continue the same trend. In this post however the focus in on the elements of C# that are usually misunderstood. Jon has had a lot of experience on Stack Overflow and due to this, he has seen many confusion in the community regarding the concepts covered here. I will summarize the key concepts here. I will also put each section into a new post. I've noticed that the posts are too long for any interested reader !

Delegates

Delegates could be defines as entities encapsulating a behavior with certain parameters and a return type. Now the forms of a delegate's definition has already been covered in my previous post. But there seems to be a confusion about the word delegate since it is both used as the "delegate type" and the "delegate instance". The code below shows the difference between the two:

   
    delegate int DelegateType(int a, int b);

    public int Add(int a, int b)
    {
        return a + b;
    }
...
    DelegateType delegateInstance = Add;
    Console.WriteLine(delegateInstance(3, 4));

As seen in the above code snippet, in order to create a delegate firstly a delegate type should have been declared. With the first line of code we are actually creating a new reference type called DelegateType. This type can then be instantiated, passed to functions and basically used wherever a reference type is used. Also remember that delegates like strings are immutable and thread safe. In the next lines we have created an instance of the delegate type and then invoked it. We could have used the Invoke() method of the delegate instance too but a C# shortcut is to just call it like you call a normal method. Each delegate has an invocation list. When the delegate in invoked all the functions in the invocation list are called. These functions usually have no return type but the last. Since if they do, the return type is just thrown away as the return type of the delegate is the return type of the last element in its invocation list.
In C# 1.0 the delegate instance accepts only functions with exactly the same signature defined by the delegate type. Let's say in the scenario of a parent class human and two sub-classes man and woman and with a function with the signature "void Run(Man)", one would expect to be able to have a function with the signature void Run(Human) added to the invocation list. This however can not be done in C# 1.0. This is also known as parameter contravariance. Another interesting scenario is being able to add a function with a signature that returns a derived type of the return value in the signature defined by the delegate type, also known as covariance. These two scenarios are made possible in C# 2.0 for delegates. The concept is still to be made available for implementing methods of an interface even in C# 4.0.
Another misconception is thinking of "Events" as delegates. Although events in the .Net framework rely heavily on delegates to function, they are not delegates themselves. The way that the framework is implemented however makes them look like delegates. But just like properties that look like fields to the wold outside the scope of the object and actually have getters and setters on the inside, Events are implemented as fields of type of the delegate inside their class and to the outside they seem like fields too but are actually add and delete method pairs that add or delete functions to the invocation list of the delegate field.

Friday, 19 October 2012

Chapter 1 - C#'s Evolution(1.0 to 4.0)

Starting Simple

I really like the way Jon has started the book. This book is mostly for those who have already some experience with C# and want to know how things work under the hood. He starts by programming a simple hello world program which is mostly used in teaching e-commerce site design perhaps. The program contains a Product class. I really don't know if I would have permission to post code from the book here, so I guess it's even better practice if I come up with my own code . I would also add material I know here and there. I have also tried to show some design patterns used in the .Net framework as we get to each subject.
What Jon has tried to convey in the first chapter is how C# has evolved as a language from it's initial humble begining(version(1.0)) up to version(4.0). This chapter's aim is to merely impress the reader rather than educate. To show that how the language's evolution has benefited the programmer.
Okay so let's get right to it. Let's say we have a class "Letter" as follows:

public class Letter
{
    public string letterNo;
    public DateTime letterDate;

    public DateTime LetterDate
    {
        get { return letterDate; }
    }

    public string LetterNo
    {
        get { return letterNo; }
    }

    public Letter(string ltrNo, DateTime ltrDate)
    {
        this.letterNo = ltrNo;
        this.letterDate = ltrDate;
    }

    public static ArrayList GetSampleLetters()
    {
        ArrayList list = new ArrayList();
        list.Add(new Letter("1", DateTime.MinValue));
        list.Add(new Letter("2", DateTime.MaxValue));
        list.Add(new Letter("3", DateTime.Now));
        return list;
    }

    public override string ToString()
    {
        return string.Format("Letter {0}({1})", letterNo, letterDate);
    }
}
This class encapsulates the functionality of a letter class. There are some limitations here though:
  • If we want to have a matching setter for our properties, it has to be public.
  • The ArrayList class has no compile-time information about the object it contains.
  • We have gone through a lot of code just to encapsulate two members, namely: letterNo and letterDate.
In C# 2.0 we can improve this a little. For example with C# 2.0 the concept of private setters is introduced and also one of the biggest improvements to C# in my opinion: Generics. So we can now change the program like so:
public class Letter
{
    public string letterNo;
    public DateTime letterDate;

    public DateTime LetterDate
    {
        get { return letterDate; }
        private set { letterDate = value; }
    }

    public string LetterNo
    {
        get { return letterNo; }
        private set { letterNo = value; }
    }

    public Letter(string ltrNo, DateTime ltrDate)
    {
        LetterNo = ltrNo;
        LetterDate = ltrDate;
    }

    public static List<Letter> GetSampleLetters()
    {
        List list = new List();
        list.Add(new Letter("1", DateTime.MinValue));
        list.Add(new Letter("2", DateTime.MaxValue));
        list.Add(new Letter("3", DateTime.Now));
        return list;
    }

    public override string ToString()
    {
        return string.Format("Letter {0}({1})", letterNo, letterDate);
    }
}
As seen above, now we can use generics and private setters in our code. Generics help greatly here since we can now get compile errors if an object of unknown type is added to the list. Also, after fetching an element from the list we don't have to case it anymore since the elements are statically typed. We still have a problem we have not addressed here and that is the use of abundant code to encapsulate the class members. C# 3.0 comes to our help. In C# 3.0 we can use automatic properties.
public class Letter
{
    public DateTime LetterDate {get; private set;}

    public string LetterNo { get; private set; }
   
    public Letter() {}

    public static List<Letter> GetSampleLetters()
    {
        return new List<Letter>() 
        {
            new Letter {LetterNo = "1", LetterDate = DateTime.MinValue},
            new Letter {LetterNo = "2", LetterDate = DateTime.MaxValue},
            new Letter {LetterNo = "3", LetterDate = DateTime.Now}
        };
    }

    public override string ToString()
    {
        return string.Format("Letter {0}({1})", LetterNo, LetterDate);
    }
}
Now this is more like it. Now we have much less code with more functionality. Notice that we don't have the variables anymore which would force us to use properties everywhere adding to consistency. Also, the List initialization is also different here. Now another subtle problem here is this: Although we have set a property to have a private setter and effectively made it read-only for the code outside the scope of the class we have not really made it read-only on the inside. Maybe it would be better to explicitly make it read-only. This can be done in C# 4.0 using the r"readonly" keyword.
public class Letter
{
    readonly DateTime LetterDate {get; private set;}

    readonly string LetterNo { get; private set; }
   
    public Letter() {}

    public static List<Letter> GetSampleLetters()
    {
        return new List<Letter>() 
        {
            new Letter {LetterNo = "1", LetterDate = DateTime.MinValue},
            new Letter {LetterNo = "2", LetterDate = DateTime.MaxValue},
            new Letter {LetterNo = "3", LetterDate = DateTime.Now}
        };
    }

    public override string ToString()
    {
        return string.Format("Letter {0}({1})", LetterNo, LetterDate);
    }
}

Sorting and Filtering

In the previous section we looked at the evolution of C# from an encapsulation point of view. But let's try another approach. If we want to sort our letters by the "LetterDate" property in C# 1.0 we would have to add another type that would implement the interface "IComparer" and then implement the Compare(object, object) method. Does this remind us of a known design pattern BTW? Yes, this is the "Strategy Pattern" (for more known design patterns in .Net you can take a look at this article in MSDN). With this pattern we have allowed more than one strategy for sorting the Array List. Let's say once we want to sort by the letter date and another time by the letter number ? The solution is to add two types which implement the IComparer interface and implement the compare method differently. As we see below:


public class LetterComparer : IComparer
{
    public int Compare(object x, object y)
    {
        Letter first = (Letter)x;
        Letter second = (Letter)y;
        return first.LetterDate.CompareTo(second.LetterDate);
    }
}
...
    ArrayList letters = Letter.GetSampleLetters();
    letters.Sort(new LetterComparer());
    foreach(Letter letter in letters)
        Console.WriteLine(letter.ToString());

There are however a couple of things we don't like with this setup. Firstly we have to add an extra type for each new strategy that we have. Second, we see a lot of casts. Some explicit as in the Compare method and some implicit as in the foreach for the print out. Not only is that a performance issue, it is also the presumtion that we are always passing only Letters to the method. You may say that we can check foreach call to the method. But isn't that another overhead? Fortunately, C# 2.0 comes to our rescue:

public class LetterComparer : IComparer<Letter>
{

    public int Compare(Letter x, Letter y)
    {
        return x.LetterDate.CompareTo(y.LetterDate);
    }
}
...
    List<Letter> letters = Letter.GetSampleLetters();
    letters.Sort(new LetterComparer());
    foreach(Letter letter in letters)
        Console.WriteLine(letter.ToString());

So we have fixed the ArrayList problem. But we are still creating a new type for a simple task of comparing a member. C# 2.0 solves this problem with "Anonymous Methods":

    List<Letter> letters = Letter.GetSampleLetters();
    letters.Sort(delegate(Letter x, Letter y){ return x.LetterDate.CompareTo(y.LetterDate)}; );
    foreach(Letter letter in letters)
        Console.WriteLine(letter.ToString());

Wow now that's a lot of change. Notice here that this is not always necessarily the right thing to do. If the class requires a more complex approach for object comparison then we would definitely stick with the Strategy Pattern.
Here we have made the code quite compact. But can C# 3.0 do better ? Of course it can...With the use of lambda expressions:

    List<Letter> letters = Letter.GetSampleLetters();
    letters.Sort((x,y)=> x.LetterDate.CompareTo(y.LetterDate));
    foreach(Letter letter in letters)
        Console.WriteLine(letter.ToString());


The section bolded shows a "lambda expression". This still creates a delegate. But this time we haven't really specified the type of the parameters of the delegate. As it turns out C# 3.0 can even make this much easier by using "extension methods":

    List<Letter> letters = Letter.GetSampleLetters();
    foreach(Letter letter in letters.OrderBy(p => p.LetterDate)
        Console.WriteLine(letter.ToString());

In the above snippet we are calling a method on the letters object which actually does not exist in the List<T> object members. We'll talk about extension methods later. This sorting operation is not even in-place. We are sorting the elements and then printing them out the List remains unsorted.

Querying Collections

If we were to query a collection and find all letters with the letterNo property higher than a certain value, this is how we would have done it in C# 1.0


    ArrayList Letters = Letter.GetSampleLetters();
    foreach(Letter letter in Letters)
        if(letter.LetterNo > 20)
            Console.WriteLine(letter.ToString());

Here is how we can do the same task with C# 2.0:
    
    List letters = Letter.GetSampleLetters();
    Predicate test = delegate(Letter l){return l.letterDate > 20;};
    List matches = letters.FindAll(test);
    Action print = Console.WriteLine;
    matches.ForEach(print);

This is by no means less complicated than what we could do in C# 1.0. Actually the code written by C# 1.0 is much more readable. Although what we have written in C# 2.0 is almost English ! What we have to realize here is not the number of lines of code we have written but the power we have over the operations. Now we are able to have the action and the predicate in variables. This immediately triggers a possible use of the "Template Method Pattern" in my head.
This pattern is really close to the Strategy Pattern with the difference that in the latter, the algorithms are usually radically different than one another. Whereas in the former, the sub-classes all have the same structure as the parent algorithm with some steps overridden in each child. The Template Pattern is also known for "fill in the blanks pattern". In this case for example the sub-classes could each override methods like "Predicate GetPredicate()" and "Predicate GetAction()" and effectively change the action and the predicate on what is being done in the collection.
The above code can of course be written even in a single line which is probably due to the use of the "Decorator Pattern":

    
    Letter.GetSampleLetters().FindAll(delegate(Letter l){return l.letterDate > 20;}).ForEach(Console.WriteLine);
There is still some as Joe puts it "fluf" around the definition of the delegate. Indeed we can even make the code shorter by the use of lambda expressions.
    
    foreach(Letter letter in Letter.GetSampleLetters().Where(l => l.letterNo > 20))
        Console.WriteLine(l);

To recap, what we have done in C# 2.0 has effectively improved the separation of concern issue by decoupling the predicate and the action. With the help of C# 3.0 we were able to further shorten our code and make it more readable. C# 4.0 does not give us any added benefits in this area.

Handling missing data

In C# 1.0 there where three possible options to handle missing data. Let's say in our Letter class some letters are still waiting to be numbered and they don't have a letterNo yet. But can we set the DateTime type to null ? No. That is not a nullable type. The three solutions were to either add a magic number to the field(DateTime.MinValue()), Wrap it in a sentinel object or add a boolean field to check if the letterNo has been inserted or not. None of these seem straightforward methods. In C# 2.0 Nullable<T> structure is introduced. This structure coupled with the syntactic goodness shown below gives us the flexibility:
    
    int? number;
    public int? number
    {
        get{ return number; }
        private set{ number = value; }
    }

Optional Parameters and Default Values in C# 4.0

So far we have not talk much about C# 4.0 . A new feature added to this version of C# which in my opinion is long overdue. Is the use of default values for the parameters and also optional parameters for methods:

    
    public Letter(int? LetterNo = null, DateTime letterDate)
    {
        this.letterNo = letterNo;
        this.letterDate = letterDate;
    }

Introducing LINQ

LINQ or Language Integrated Query is all C# 3.0 is all about. C# 2.0 introduced generics and was basically fixing up on the inaccuracies of it's ancestor. With C# 3.0 a very powerful method of querying on different data sources was introduced. The idea is to use a common language or expressive query language that could be used to talk to any kind of data source. From a database and collection of objects to XML and COM interoperability.
We have already worked with some aspects of LINQ but we have not shown the actual "query expressions". For example if we want to write the same code for finding letters with the letterNo > 20 with an explicit query expression it would look like this:

    List letters = Letter.GetSampleLetters();
    var result = from Letter l in letters
                 where l.LetterNo > 20
                 select l;
    foreach (Letter l in result)
        Console.WriteLine(l.ToString());

As you can see in the above code, the expression looks very similar to an SQL statement. In the data-driven world of today most developers already know SQL. The choice of a SQL approach to the LINQ syntax then, IMO is an apt one. One might think why we have gone to such lengths to find a letter with a certain letterNo where we could just basically iterate through the list or another might take issue with the performance of pulling down the data from the database wrapping it up into objects and then querying the object. Although the former issue is a valid one and a statement like the above makes no sense in a real life scenario. The latter issue can't be any further from the truth. In the case of LINQ to SQL, LINQ knows not to do just that. The above query expression is going to be translated into a SQL statement and is going to be executed directly on the database.
So far we have talked about querying collections of objects and querying databases. We can also use LINQ to query XML documents. For example if our data is stored as an XML document, we would be able to use an expression very similar to the one above to search for data. We can write our own providers for LINQ. All these and other features that make LINQ a very flexible feature will be discussed later on.

Thursday, 18 October 2012

Why am I doing this?

Okay so as it happens...I have realized that after graduation from my Msc I really need to hone my programming skills if I want to work as a software developer. Although I do have experience in programming in C# professionally before, I feel the need to level up in understanding the tools I use to program. To this end and to also improve my writing skills as well as keeping a journal or a notebook if you will, I have decided to blog my way through everything I learn from now on.
The book I have chosen with not much research is "C# in depth" by Jon Skeet. The book was very engaging from the moment I started reading and I guess that comes from Jon's great passion for programming and for C#. I think I would have a lot of fun going through the chapters and I wish they could be of interest to others.