I have moved!

I've moved my blog
CLICK HERE

Thursday, 22 May 2008

Virtual Properties in C#

It is an important and popular fact that properties in C# can be defined in interfaces. For example:

public interface IThing
{
    IThing Parent { get; }
}

We can then implement that interface on a concrete class:

public class : Thing
{
    public IThing Parent 
    { 
        get { /* return a parent from somewhere */ }
    }
}

Like so. We can also use the nifty new syntax to implement the property to be just like a simple field:

public IThing Parent { get; set; }

Notice (this is the crucial point) that this defines a setter as well as a getter for the property. It doesn't make any difference to clients of IThing, because they only need a getter, but it might be useful for clients of Thing. Very nice.

Now I know what you're thinking. Nobody British writes a blog post like this about a language feature unless some aspect of it is unsatisfactory. So here it comes.

Suppose I want to follow the same pattern but using an abstract class instead of an interface. It's basically the same idea:

public abstract class IThing
{
    public abstract IThing Parent { get; }
}

The only difference in the concrete derived class is the need to use the modifier override:

public class Thing : IThing
{
    public override IThing Parent
    {
        get { /* blah */ }
    }
}

But there's another difference, can you guess it?

public class Thing : IThing
{
    public override IThing Parent { get; set; }
}

The above produces an error: cannot override because 'IThing.Parent' does not have an overridable set accessor.

It's the same if you try to write the getter and setter in long hand. You are banned from providing a setter, even though it makes no difference to the correctness of your implementation of IThing.

I can just hear the language designers saying wisely to themselves, "Why the heck would anyone ever want to override the getter but not the setter, or vice versa for that matter?" Well, now you know, you crazy old language designers.

If the base class has an abstract getter but no setter (or vice versa) it ought to be a perfectly valid thing to override that getter (or setter) while also providing a non-virtual matching setter (or getter).

To fix this, we need the ability to apply the override modifier on the getter and setter individually.

Sunday, 11 May 2008

Delegates are Immutable

Suppose you see this in some code:

button.Click += new EventHandler(button_Click);

Naturally you would conclude that Click is an event and button_Click is a method name (formally described as a method group, because the name by itself identifies one or more methods with different parameters). After the above line of code has been executed, the method button_Click will run each time the Click event fires (presumably when the button is clicked).

Also these days we don't need to be so verbose and can just write:

button.Click += button_Click;

Perfectly straightforward. But full of assumptions that may not be true. I deliberately chose the name button_Click to make you think it identified a method, because that's the kind of name the Visual Studio IDE gives to automatically generated event handler methods. What if only the identifier was different?

button.Click += ButtonClicked; // Exhibit A

Maybe ButtonClicked is in fact not a method but another event. In which case, what does Exhibit A actually do?

Another way of phrasing all this: how does it differ from this next line of code?

button.Click += (s, e) => ButtonClicked(s, e); // Exhibit B

If ButtonClicked is a method name, then there is no effective difference between Exhibit A and Exhibit B. Exhibit B wastes a little bit of garbage collected memory, but otherwise it sets up something with identical behaviour. Exhibit A creates a delegate that calls the method ButtonClicked directly, whereas Exhibit B builds an anonymous method that calls ButtonClicked and then wraps it in a delegate. The result is an unnecessary layer of wrapping. So if someone were maintaining the code and they came across Exhibit B, they could safely change it to Exhibit A.

But what if, as seems more likely, ButtonClicked is an event? It all depends on what state ButtonClicked is in at the time. Suppose that before Exhibit A executes, there are currently no handlers attached to either event. The debugger would tell us that both button.Click and ButtonClicked have the special value null.

Immediately after Exhibit A executes, the debugger would continue to tell us exactly the same thing: Exhibit A would have no effect on anything.

But what if ButtonClicked had previously been set up as follows?

ButtonClicked += (s, e) => Console.WriteLine(e.ToString());

Then Exhibit A would have an effect. It would mean that the anonymous function attached to ButtonClicked would also execute if the button is clicked.

And what if a second handler were attached to ButtonClicked after Exhibit A was run?

ButtonClicked += (s, e) => Console.WriteLine("And again");

This change would have no effect at all on the behaviour when the button is clicked.

The reason for this is that delegates are immutable - they cannot be modified after they are constructed - and so should be treated like value types. Exhibit A could be interpreted as an instruction to make button.Click point to the exact same delegate object as ButtonClicked. And in fact, C# and the runtime are probably at liberty to do it that way. But as there is no way to modify an existing delegate, therefore it makes no difference whether the two events point to one delegate or to two separate copies.

When we attach our second handler to ButtonClicked, this results in the creation of a completely new delegate object that "multi-casts" invocations on to both handlers. However, the previous delegate object is still attached to button.Click.

Why is this important? Because the sort of place where you might want to set up this "chaining" of events is in the constructor of a window that owns a button. The desired behaviour is that when the button is clicked, the window's ButtonClicked event should fire. But if you try to achieve this with Exhibit A, you will find that it does not work, because in the window's constructor, nothing has been attached to ButtonClicked yet. What you actually need is Exhibit B.

And yet if ButtonClicked was a method, there would be no noticeable difference between the two.

So the short answer to the question, "What's the difference between Exhibit A and Exhibit B?" is...

It depends.

Tuesday, 6 May 2008

Pointers to Value-Types in C#

C# has an "unsafe" mode in which real, nasty pointers are allowed, but this mode is rarely used except in some messy situations involving talking to old code. The C# language proper does not have pointers.

In C and C++, you can get the address of anything, including a local variable on the stack:

int x = 0;

int *p = &x; /* get address of x */

*p = 5; /* change value of x */

Of course, when a function exits, all its local variables cease to exist. Any remaining pointers to them are now "dangling" and must not be used. If they ever are... well, who knows what could happen? Make no mistake, it's an exciting world of opportunities and I've already had plenty of it, thanks.

Naturally we're all very happy not to have such nonsense in C#. But in fact, thanks to anonymous delegates, we can sort of almost do the same thing anyway. But without the undefined behaviour.

Here's the Ptr<T> generic class, which is a pointer - or at least can be a pointer if you construct it the right way:

public class Ptr<T>
{
    Func<T> getter;
    Action<T> setter;

    public Ptr(Func<T> g, Action<T> s)
    {
        getter = g;
        setter = s;
    }

    public T Deref
    {
        get { return getter(); }
        set { setter(value); }
    }
}

And here's how to use it:

int x = 0;

Ptr<int> p = new Ptr<int>(() => x, v => x = v); // &x

p.Deref = 5; // *p = 5

Debug.Assert(p.Deref == 5); // *p == 5

In practice it's a good deal more flexible than a C/C++ pointer. When we construct it, we get to specify exactly how to get or set the value, so there are an unlimited number of possible behaviours, but if we want pointer-like behaviour, we just need to replicate the pattern shown above, providing a getter that maps no arguments onto the variable's value, and a setter than maps one argument onto the assignment operation.

Using the pointer to modify or obtain the pointed-to value is much simpler. Instead of putting an asterisk in front, we put .Deref on the end.

It's tempting to think we could add another constructor like this:

public Ptr(ref T x)
{
    getter = () => x;
    setter = v => x = v;
}

... which would allow this:

Ptr<int> p = new Ptr<int>(ref x);

But lambdas are not allowed to refer to ref parameters.

I find it interesting that, thanks to anonymous delegates,  C# already contains the plumbing necessary to implement the & (address-of) operator. It could in theory be done as mere syntactic sugar:

Ptr<int> p = &x;

... which would expand to this:

Ptr<int> p = new Ptr<int>(() => x, v => x = v);

Nothing else is really needed.

(Although to be honest, in the year or so I've been using C# in earnest, I haven't ever needed to take the address of a local value-type.)

DynamicObject Wrapper

Charlie Calvert started a discussion way back in January about a proposal to add dynamic lookup support to C# - a convenient way to write code that makes calls into objects of a type not known at compile time.

A number of people have suggested that for the dynamic case, the member access syntax should be made deliberately different to the usual dot syntax. There is a lot to be said for this. The big danger of this whole idea is that a radically less type-safe way of working will start to creep into programs that used to be type-safe, thus undoing some of the good work done by adding generics. But as long as the syntax makes this clear, it's better to have convenient support than nothing at all, or else people will invent their own approach and it will probably be horrible.

I was thinking that maybe to underline the fact that the method name is really no more solid than a string, you could have the syntax mimic accessing items in a dictionary.

So instead of:

d.LetThereBeLight(); 

It would be:

d["LetThereBeLight"](); 

It's not a lot more typing, and it properly highlights the dynamic nature of what the code is doing. Any C# programmer would instantly get what it meant.

It also has an advantage that other syntaxes don't - it allows me to pass a string variable if I want to.

But then I realised we can already do that today. Here's a simplistic wrapper around any CLR object that provides that exact dictionary-like calling interface:

class DynamicObject
{
    private object m_target;

    public delegate object Member(params object[] args);

    public DynamicObject(object target)
    {
        m_target = target;
    }

    public Member this[object member]
    {
        get { return args =>
        {
            string name = member as string;
            if (name != null)
            {
                // Try to find a method (TODO: overloading)
                MethodInfo m = m_target.GetType().GetMethod(name);

                if (m != null)
                    return m.Invoke(m_target, args);

                // Then try for an ordinary property
                PropertyInfo p = 
                    m_target.GetType().GetProperty(name);

                if (p != null)
                {
                    // No args implies get
                    if (args.Length == 0)
                        return p.GetValue(m_target, null);

                    // Otherwise, set - first arg is new value
                    p.SetValue(m_target, args[0], null);
                    return null;
                }
            }

            // Otherwise try indexer
            PropertyInfo i = m_target.GetType().GetProperty("Item");
            if (i != null)
            {
                object[] memberArg = new object[] { member };

                if (args.Length == 0)
                    return i.GetValue(m_target, memberArg);

                i.SetValue(m_target, args[0], memberArg);
                return null;
            }

            // Give up
            throw new InvalidOperationException(
                "No such member: " + name);
        }; }
    }
}

The indexer simply returns a delegate that takes a variable number of arguments. It handles methods and properties, and gives them uniform syntax. It's good for pretty much anything that the CLR can call through reflection. If there is no method or property with the specified name, it forwards the request on to the target object's default indexer.

So suppose our mystery dynamic object has this implementation:

public class Universe
{
    public void LetThereBeLight()
    {
        Console.WriteLine("And there was light!");
    }

    public void CreateSexes(int n)
    {
        Console.WriteLine("And there were " + n 
            + " of every creature, which was plenty");
    }

    public int SpatialDimensions { get; set; }

    public string this[int day]
    {
        get
        {
            switch (day)
            {
                case 1: return "Heaven and Earth";
                case 2: return "Mammals";
                case 3: return "Fish";
                case 4: return "TV";
                case 5: return "Hi Def";
                case 6: return "Humans";
            }

            return "Rested";
        }
    }
}

So if I know the type, I would write code like this in the usual way:

Universe u = new Universe();

u.LetThereBeLight();
u.CreateSexes(2);

u.SpatialDimensions = 3;

Debug.Assert(u.SpatialDimensions == 3);

Debug.Assert(u[3] == "Fish");

But we can wrap one in a DynamicObject and still use it fairly conveniently:

DynamicObject d = new DynamicObject(new Universe());

d["LetThereBeLight"]();
d["CreateSexes"](2);

d["SpatialDimensions"](3);

Debug.Assert((int)d["SpatialDimensions"]() == 3);
Debug.Assert((string)d[3]() == "Fish");

So do we really need a new language feature?