I have moved!

I've moved my blog
CLICK HERE

Wednesday 19 November 2008

COIL and COILettes

Douglas Crockford started a minor war when he observed that JavaScript's object initialisation syntax provides a neat way to persist hierarchical data, maybe more suitable to some situations than XML.

Since C# 3.0 we've had a similar facility in C#. This means an opportunity for another war.

I've started to see little object initialisation trees appearing in my code, and they start to look like a declarative internal DSL. I should post a real example some time. Anyway, this can only mean one thing: we need a snappy name for these things.

JSON stands for JavaScript Object Notation, and it sounds sort of like the name Jason, which makes it catchy. So we need a four letter acronym that starts with CS (for C-Sharp) where you pronounce the first letter as a syllable and it all sounds like a person's name. This immediately lead me to:

CSIL (pronounced "Cecil" in the American way):

C-Sharp Initialisation Language.

Sadly somebody already used that for something else.

Ah well, it was just too perfect.

But not to be discouraged, I relaxed the rules so it just has to be a four letter acronym that sounds like a word, and went for:

COIL (pronounced "coil", amazingly):

C# Object Initialisation Language.

The beauty of this becomes apparent when you consider that we also need a way to describe those little islands of COIL that appear embedded in our C# programs.

They are called COILettes.

Thursday 13 November 2008

Chain – a LINQ operator for dealing with Linked Lists

I don’t think there’s anything in LINQ that will do this, though I expect I’m wrong - I have a tendency to write my own extension method to do something and then discover that LINQ already provides a general version of it. But in the mean time, here’s my Chain method:

public static IEnumerable<T> Chain<T>(this T first, Func<T, T> follow) where T : class
{
    for (T item = first; item != null; item = follow(item))
        yield return item;
}

It’s designed to be used in any situation where a traditional linked list exists. I used it in my AutoDisposal library to get a list of all the types inherited by a given type. It chains a bunch of objects together into an enumerable list by following links between them.

The irritation that provoked this minor burst of creativity is that if you use the GetFields method of Type, there doesn’t seem to be a simple way to get all the instance fields in that type, including inherited ones. By default, GetFields does include inherited fields, but only public ones. To get all public and non-public instance fields, including all inherited ones, you have to walk the chain of base types and get the fields declared in each type. So you’d better use BindingFlags.DeclaredOnly, or you’ll end up with duplicates of the public fields.

I wanted to write a LINQ expression that would get me a single flat list of all the fields in a type, so this is how I used Chain as part of it:

instance.GetType()
    .Chain(type => type.BaseType)
    .SelectMany(type => type.GetFields(BindingFlags.Instance | 
                                       BindingFlags.Public | 
                                       BindingFlags.NonPublic | 
                                       BindingFlags.DeclaredOnly))
    

In this case, the BaseType property of Type is the “next item” pointer in the linked list I’m interested in, but to use LINQ effectively we need it in the form of an IEnumerable<Type>, and so Chain provides a nifty way to do that.

Shouldn’t something like Chain be in the BCL (if it isn’t already?)

Wednesday 12 November 2008

AutoDisposal using PostSharp

I complain to anyone who will listen about the poor language support in C# for the IDisposable interface. Yeah, we’ve got the using statement, and that’s fine as far as it goes.

But compare that to what C++/CLI has: the most complete support of any .NET language. Not just the equivalent of the using statement, but also automatic implementation of IDisposable that takes care of disposing of nested owned objects, including inherited ones, like magic (relatively speaking, unless you’re a C++ programmer in which case you’ve taken it for granted for a decade or two).

The relationship between deterministic clean-up (i.e. destructors) and garbage collection (i.e. finalizers) was quite vaguely understood until Herb Sutter clarified as part of his work on C++/CLI. But now it’s all very clear how it should work – the only problem is, there doesn’t seem to be any movement towards fixing it in any future version of C#.

Anyway, by now you’re probably bored of the ranting and wondering where the code snippets are. So here’s one:

class FunkyResource : IDisposable
{
    public string Label { get; set; }
 
    public void Dispose()
    {
        Console.WriteLine("Disposing " + Label);
    }
}

Not particularly impressive, I grant you, but it serves as a dummy example of a class that represents some resource that needs to be cleaned up. Here we just log the fact that a given instance is being cleaned up, identifying it with a label string.

Here’s where it gets interesting:

[AutoDisposable]
class FunkyOwner
{
    [AutoDispose] 
    private FunkyResource _resource1 = new FunkyResource { Label = "resource1" };
 
    [AutoDispose] 
    private FunkyResource _resource2 = new FunkyResource { Label = "resource2" };
}

This class owns a couple of FunkyResource instances. By “own” I simply mean that when an instance of this class is disposed of, those two resource instances must also be disposed of.

But wait a second – how does FunkyOwner implement IDisposable? The answer is that it automatically does so, because of that [AutoDisposable] attribute.

This is all thanks to the wonderfully powerful PostSharp, which effectively allows you to extend any CLR-based language through custom attributes. And because PostSharp builds on the CLR in a language-independent way, this means that the extensions you create should work in any CLR-based language that supports attributes.

Very briefly, PostSharp registers an extra step in the compilation process of Visual Studio: after your output assembly is built, a program called PostSharp.exe takes a look at it, and performs additional processing on the raw IL in the assembly. And hey presto, extra features make their way into your assembly.

Before we see how this example works, what about inheritance?

class DerivedOwner : FunkyOwner
{
    [AutoDispose] 
    private FunkyResource _resource3 = new FunkyResource { Label = "resource3" };
}

Note that because we’re deriving from a class that has the [AutoDisposable] attribute, we don’t need to specify it again (it will be ignored if we do). If we call Dispose on an instance of DerivedOwner, we get this output:

Disposing resource3
Disposing resource1
Disposing resource2

That is, the resources owned by DerivedOwner are disposed of first, then the base class’s resources are also disposed of. So inheritance works fine.

And what about a mixed scenario, where I want automatic clean-up of owned resources but I also want to run some custom code of my own during disposal?

class MixedOwner : IDisposable
{
    [AutoDispose]
    private DerivedOwner _owner2 = new DerivedOwner();
 
    public void Dispose()
    {
        Console.WriteLine("Disposing MixedOwner");
 
        this.TryDisposeFields();
    }
}

Note the call to TryDisposeFields(), an extension method I cooked up (see below). The reason for the prefix ‘Try’ is that you can call it on anything and it will look for fields marked with the [AutoDispose] attribute. If it finds any, it will dispose of them. If it doesn’t find any, that’s okay – nothing happens.

When manually implementing IDisposable like this, consider making it virtual, so that derived classes can override it, although they must of course call the base class’s version after performing their custom cleanup. It might be good practice in some circumstances to follow the pattern where you have a separate virtual method Dispose(bool) instead. However, that is really intended to allow you to implement a finalizer and a Dispose method together, and these days there are almost no circumstances in which it is recommended that you write a finalizer (unfortunately a lot of old books give out-of-date advice here).

Also note that we don’t need the [AutoDisposable] attribute on the class, as we already implement IDisposable (again, it would have been harmless to add it).

Unsurprisingly, the output of disposing an instance of MixedOwner is:

Disposing MixedOwner
Disposing resource3
Disposing resource1
Disposing resource2

All very nice. And extremely easy to implement, using the “easy” mode of PostSharp which is known as Laos. No need to directly manipulate IL, just write classes to represent your attributes, deriving from base classes that take care of the messy details.

My [AutoDispose] attribute is actually completely trivial because it’s just a simple marker on fields that I look for using reflection. So this isn’t inherently anything to do with PostSharp:

[AttributeUsage(AttributeTargets.Field, AllowMultiple = false)]
public sealed class AutoDisposeAttribute : Attribute { }

Where PostSharp makes an appearance is in the [AutoDisposable] attribute:

[Serializable]
[AttributeUsage(AttributeTargets.Class, Inherited = true, AllowMultiple = false)]
public sealed class AutoDisposableAttribute : CompositionAspect
{
    public override object CreateImplementationObject(InstanceBoundLaosEventArgs eventArgs)
    {
        return new AutoDisposableImpl(eventArgs.Instance);
    }
 
    public override Type GetPublicInterface(Type containerType)
    {
        return typeof(IDisposable);
    }
 
    public override CompositionAspectOptions GetOptions()
    {
        return CompositionAspectOptions.IgnoreIfAlreadyImplemented;
    }
}

By inheriting the attribute from PostSharp.Laos.CompositionAspect, I’m saying that I want to add an extra interface to any class marked with my attribute. The implementation is created and returned from the CreateImplementationObject method. The other overrides are pretty self-explanatory.

Here’s what my AutoDisposableImpl class looks like:

public sealed class AutoDisposableImpl : IDisposable
{
    private readonly object _instance;
 
    public AutoDisposableImpl(object instance)
    {
        _instance = instance;
    }
 
    public void Dispose()
    {
        _instance.TryDisposeFields();
    }
}

Pretty simple, eh? It just stores a “back pointer” to the object we are extending, so it can call the TryDisposeFields extension method on whatever that object happens to be. In a way, it’s just like MixedOwner except it doesn’t do anything extra.

Actually most of the complicated mess is hidden in that extension method. So here’s the gory detail:

public static void TryDisposeFields(this object instance)
{
    instance.GetType()
        .Chain(type => type.BaseType)
        .SelectMany(type => type.GetFields(BindingFlags.Instance | 
                                           BindingFlags.Public | 
                                           BindingFlags.NonPublic | 
                                           BindingFlags.DeclaredOnly))
        .Where(fieldInfo => fieldInfo.GetCustomAttributes(typeof(AutoDisposeAttribute), false)
                                     .OfType<AutoDisposeAttribute>().Any())
        .Select(fieldInfo => fieldInfo.GetValue(instance))
        .OfType<IDisposable>()
        .ForEach(field => field.Dispose());
}

As you can see, I enjoy chaining a lot of LINQ operators together. The first one, Chain, is a little custom gadget of mine that turns any linked list of T into an IEnumerable<T>. I’ll describe it in a separate post.

After that, I get all the fields of all the types in the inheritance chain into a flat list, using the marvelous SelectMany method. Then I filter them according to whether they have the [AutoDispose] attribute, and select the values of the remaining fields, then filter again based on whether they support IDisposable, and finally I dispose of each one. (That last ForEach method is another one of mine, and I’ve seen a lot of people suggesting the same thing).

So what are the drawbacks of this marvelous scheme? The one major issue with the PostSharp approach is that the Visual Studio IDE does its own compilation of your source to provide auto-completion and other kinds of “intellisense”. It doesn’t look at the assembly on disk, because there might not be one (e.g. if the code doesn’t completely compile without errors yet, as is often the case when you are adding new code, which is precisely the time when you require intellisense features).

This means that the IDE doesn’t know that [AutoDisposable] classes support IDisposable. Nor does the real compilation stage that happens during the build. PostSharp doesn’t kick in until after the compilation has completed. The upshot is that we cannot do this:

using (new FunkyOwner())
{
    Console.WriteLine("Using FunkyOwner...");
}

The using statement needs to statically verify that the object supports IDisposable. It won’t try to resolve this at runtime.

This would seem to be not so much a drawback, more a total friggin’ disaster. However, there is a solution, in the form of another generally applicable extension method:

public static void TryUsing<T>(this T instance, Action<T> action)
{
    try
    {
        action(instance);
    }
    finally
    {
        IDisposable disp = instance as IDisposable;
        if (disp != null)
            disp.Dispose();
    }
}

Now we can write:

new FunkyOwner().TryUsing(funkyOwner =>
{
    Console.WriteLine("Using FunkyOwner...");
});

I arranged the interface of the TryUsing method carefully in order to mimic the characteristic of the using statement. In particular, the object to be disposed of is only given a name by the lambda parameter. This means that it cannot be accidentally used outside of the block where it is “in scope” (i.e. still not yet disposed), unless the programmer makes a special effort to circumvent this protection by storing a reference to the object in a variable declared outside the lambda.

And with that, I have now discussed every single part of the AutoDisposal library. You can download the complete source along with a demonstration program here:

http://www.earwicker.com/AutoDisposalSource.zip

You will of course also need the PostSharp system, which you can get here:

http://www.postsharp.org/download/

Friday 7 November 2008

Fun with Internal DSLs in C++

About three years ago I experimented with internal DSLs in C++. I didn't know to call it that at the time (I'm not sure when the term was coined). It really means twisting the features of an existing language to make what feels like a new language.

The purpose of my DSL was to allow C++ programmers to naturally express database queries that would be executed against an RDBMS as standard SQL queries. In other words, it had exactly the same aim as LINQ, although again I wasn't to know that at the time.

The starting point was a couple of template classes called column and table, which serve the purpose of making the names of tables and columns visible within the C++ type system.

These were wrapped in some convenient macros, so you could declare the structure of your database tables like this:

SQL_BEGIN_NAMED_TABLE(users, "USERS")
    SQL_DECLARE_NAMED_COLUMN(id, "USERID", int)
    SQL_DECLARE_NAMED_COLUMN(username, "USERNAME", std::wstring)
    SQL_DECLARE_NAMED_COLUMN(password, "PASSWORD", std::wstring)
    SQL_DECLARE_NAMED_COLUMN(accesslevel, "ACCESSLEVEL", int)
    SQL_DECLARE_NAMED_COLUMN(usertype, "WINUSERFLAG", int)
    SQL_DECLARE_NAMED_COLUMN(longname, "LONGNAME", std::wstring)
    SQL_DECLARE_NAMED_COLUMN(email, "EMAIL", std::wstring)
    SQL_DECLARE_NAMED_COLUMN(dynamic, "DYNAMIC", int)
    SQL_DECLARE_NAMED_COLUMN(userflags, "USERFLAGS", int)
    SQL_DECLARE_NAMED_COLUMN(pwdexpirytime, "PWDEXPIRYTIME", sql::datetime_type)
SQL_END_TABLE(users)

So there we have a table called USERS with a bunch of columns. The names and data types of the columns are part of the information captured in the resulting type structure.

The macros actually declare a type and also an instance of that type. The above example declares a type called users_t to represent the table, and also a nested type called users_t::password_t to represent that column. Along with these, it declares instances of those types called users and users.password. And the same for the other columns.

We can then write things like this:

record_set<users_::username_, users_::email_> admins;

db.select(
    into = admins,
    from = users,
    where = users.accesslevel == 2
);

The Boost Parameter library provides the named parameter syntax (I wrote my own equivalent first before realising that Boost Parameter existed, and then retro-fitted it).

The predicate expression, as seen in the where clause, can get quite complicated. It can compare columns with values, or with each other, and it can use the standard && and || operators, amongst others. This all gets captured and turned into SQL, just like in Linq, but it's done with operator overloading. This trick is called expression templates in the C++ world.

The problem with this kind of thing is that although it results in a very neat and simple-to-use programming interface, there aren't many people who feel up to the job of maintaining such a library. If you don't like templates, you wouldn't like looking at this code. I'd guess that 30% of the characters in the source are angle brackets (only partly a joke).

The record_set type is a std::vector of another type called record, which is a little like a custom struct that is declared on-the-fly at the point of use. It's a whole little world of pain all by itself!

Just to give a flavour of the excitement involved in this kind of work, here's some of the record source:

struct none
{
    struct value_type {};
};

// Forward declaration of record
template <
        class T0 = none, class T1 = none,
        class T2 = none, class T3 = none,
        class T4 = none, class T5 = none,
        class T6 = none, class T7 = none,
        class T8 = none, class T9 = none,
        class TA = none, class TB = none,
        class TC = none, class TD = none,
        class TE = none, class TF = none
        >
struct record;

// Specialization for all fields none
template <>
struct record<
            none, none, none, none,
            none, none, none, none,
            none, none, none, none,
            none, none, none, none
            >
{
    // continues...
};

template <
        class T0, class T1, class T2, class T3,
        class T4, class T5, class T6, class T7,
        class T8, class T9, class TA, class TB,
        class TC, class TD, class TE, class TF
        >
struct record : 
    public record<T1, T2, T3, T4, T5, T6, T7, T8, 
                T9, TA, TB, TC, TD, TE, TF, none>
{
   // continues...

Yes folks, it's a template that derives from a specialisation of itself.

Thursday 6 November 2008

The Weirdness of Linq Query Expressions

What does this bit of code do?

var result = from n in 5 where (n == 5) select (n + 1);

It looks completely hopeless. How can you loop through the contents of the number 5? The C# language reference says of the from clause:

The data source referenced in the from clause must have a type of IEnumerable, IEnumerable<T>, or a derived type such as IQueryable<T>.

But that is a lie!

The above code is translated into this:

var result = 5.Where(n => n == 5).Select(n => n + 1);

So in fact it is not necessary for the data source (the thing after the in keyword) to be enumerable at all. The only thing it must have is methods (or extension methods) called Where and Select. So let’s define them for all types:

public static class Crazy
{
    public static T Where<T>(this T source, Func<T, bool> pred)
    {
        if (pred(source))
            return source;

        return default(T);
    }

    public static T Select<T>(this T source, Func<T, T> select)
    {   
        return select(source);
    }   
}

So Where is a function that either returns the value passed into it, or the default value for the type, depending on the result of the predicate. This is sort-of analogous to what the Linq-to-objects version of Where does. And Select is completely trivial.

With these definitions, the original line of code produces an integer with the value 6.

Could this kind of abuse be turned into a valid use? Obviously it depends on whether you can think of useful definitions of Where and Select for types other than IEnumerable<T>. Bear in mind that all query expressions begin with fromin,  so it will look mighty confusing unless the data source is list-like.

Tuesday 4 November 2008

Dynamic typing in C# 4 and Duck Generics

When the dynamic typing proposal for C# came out, my feedback was basically that you can already do it in the language today:

DynamicObject Wrapper

The justification for adding dynamic is that it gets rid of a lot of ugly reflection. But delegates and indexers mean that the language already has almost enough expressiveness to hide those details entirely.

Now the proposal is getting a lot firmer and there's a CTP available, more people are voicing their concerns. I remain pretty unconvinced of the value of it. One good point made by several people now is that it ought to be possible to use an interface to describe the set of operations that an object must support. The object wouldn't have to genuinely implement the interface - the IDE would just use the interface's members to offer up auto-completion in the usual way. The compiler would then simply generate dynamic calls, so the interface would be irrelevant at runtime.

I go further: I'd prefer it if this was mandatory. I like it when an assumption or fact about the program is defined in one place - a single definitive statement of that fact. This is much better than having multiple implicit suggestions spread all around the program.

That is unlikely to happen, because it would cause some unrealistically trivial examples grow a few lines longer - shock horror.

This all reminds me of something I've wanted in C# ever since generics were added, which is what I'm going to call "duck generics" - essentially, generics that work more like C++ templates. One saving grace of dynamic calls in C# 4 is that they suggest a new way to solve this problem.

Here's a typical generic class, C. Any type can be the type parameter T as long as it implements ISocket:

public interface ISocket
{
    byte[] Read(int maxBytes);
}

public class C<T> where T : ISocket
{
   ...

The problem is, what if I have a few 3rd party classes that each have exactly the right kind of Read method but don't implement the ISocket interface? Today, I would have to hand-write a forwarding class for each of them.

Previously I've wondered if it would be practical to make the compiler do exactly that for me. Whenever a concrete instantiation of C is created, the compiler would automatically generate a class that implemented ISocket and forward the Read call on to the "real" object.

I think it's a good idea. It doesn't require changes to how generics are declared, or how they are defined, except that typeof(T) on a type parameter would have to tunnel inside the forwarder to get the real type. It would be another example of pure 'syntactic sugar', automating away the need for tedious hand-written code, just like 'yield return' and the 'using' statement, and so on.

Anyway, now there's dynamic calling, another possibility exists, which is to add an alternative syntax:

public class C<T> where T dynamic ISocket

It tells the compiler to allow me to treat T as if it implemented ISocket, and to generate dynamic calls for anything I do to an instance of T. When I instantiate C<S> the compiler checks that S has all the operations needed to satisfy any dynamic call to a member of ISocket. So we still have perfect static type safety; the dynamic calls in my generic class are guaranteed to succeed.

There would be no need to generate forwarder classes, or to make instances of those forwarders at runtime. But I still prefer the original idea. Firstly, dynamic calls are likely to be a little slower. But secondly, and more importantly, I would have to change my generic class declaration when I want to enable this capability, which seems wrong.

So I still don't have a good reason for built-in dynamic calling in C#.