I have moved!

I've moved my blog
CLICK HERE

Tuesday 9 June 2009

Generic Value Object With Yield Return

Inspired by this and this, here's my version.

Jan Van Ryswyck is right to use static reflection, and to do that he "registers" the properties in a list. On the other hand, assuming you'd have a lot of these objects building up in the GC, it might prove important to reduce the number of ancillary allocations going on, so it may be better to avoid creating a list of the properties in every single instance.

It's not really necessary to have the full information about the properties via reflection; we only need the values. Also it's a pain to have to distinguish between fields and properties. Finally, we are really doing operations on a sequence of values, which can be expressed very neatly with Linq operators, and specified in the derived class with an iterator method.

So here's what I came up with:

public abstract class ValueObject<T> : IEquatable<T>
   where T : ValueObject<T>
{
   protected abstract IEnumerable<object> Reflect();

   public override bool Equals(Object obj)
   {
       if (ReferenceEquals(null, obj)) return false;
       if (obj.GetType() != GetType()) return false;
       return Equals(obj as T);
   }

   public bool Equals(T other)
   {
       if (ReferenceEquals(null, other)) return false;
       if (ReferenceEquals(this, other)) return true;
       return Reflect().SequenceEqual(other.Reflect());
   }

   public override int GetHashCode()
   {
       return Reflect().Aggregate(36, (hashCode, value) => value == null ?
                               hashCode : hashCode ^ value.GetHashCode());
   }

   public override string ToString()
   {
       return "{ " + (string) Reflect().Aggregate((l, r) => l + ", " + r) + " }";
   }
}

First off, it's a lot shorter! Aside from the standard ReferenceEquals checks, every method is a one-liner, a return of a single expression. Check out how SequenceEqual does so much work for us. And Aggregate is designed for turning a sequence into one value, which is exactly what GetHashCode and ToString are all about.

This is all possible because we're treating the properties or fields as just a sequence of values, obtained from the Reflect method, which the derived class has to supply.

(You could easily add operator== and operator!= of course. Also in case you're wondering about the way ToString appears not to check for null, actually it does, because one odd thing about .NET is that string concatenation can cope with null strings.)

Secondly, the way you use it is pretty neat as well:

public class Person : ValueObject<Person>
{
   public int Age { get; set; }
   public string Name { get; set; }

   protected override IEnumerable<object> Reflect()
   {
       yield return Age;
       yield return Name;
   }
}

It wouldn't matter if I yielded the values of fields or properties, and there's no need for expression-lambda trickery to get a PropertyInfo.

If you're unfamiliar with how iterator methods work, you may be wondering, what if I have twenty fields and the first fields of two instances are unequal, isn't this going to waste time comparing the other nineteen fields? No, because SequenceEqual will stop iterating as soon as it finds an unequal element pair, and iterator methods are interruptible.

(Note that if you need this to work on .NET 2.0, you can grab Jared Parsons' BCL Extras library to get the necessary sequence operations. If you're using the C# 2.0 compiler you just need to rejig the method call syntax to avoid using them as extension methods. Iterator methods were already available in 2.0, so nothing here is dependent on 3.0.)

Thursday 4 June 2009

Lazy Sequence Generation without using Yield Return

C# has the yield return feature, which makes it easy to write state machines as if they were plain old methods:

class Program
{
    static IEnumerable<int> SomeNumbers()
    {
        Console.WriteLine("Started");

        yield return 1;
        
        Console.WriteLine("Yielded 1");

        yield return 2;
            
        Console.WriteLine("Yielded 2");

        yield return 3;

        Console.WriteLine("Finished");
    }

    static void Main(string[] args)
    {
        foreach (int n in SomeNumbers())
        {
            Console.WriteLine(n);
        }
    }
}

The output shows that the sequence of numbers is generated "lazily", with each chunk of code in the method being executed only on demand as the foreach pulls values from sequence.

How close can we get to this using only lamdbas? Pretty close:

class Program
{
    static Lazy.Yielding<int> SomeNumbers()
    {
        Console.WriteLine("Started");

        return Lazy.Yield(1, () => {

        Console.WriteLine("Yielded 1");

        return Lazy.Yield(2, () => {
        
        Console.WriteLine("Yielded 2");

        return Lazy.Yield(3, () => {

        Console.WriteLine("Finished");

        return null; }); }); });
    }

    static void Main(string[] args)
    {
        foreach (int n in Lazy.Enumerate<int>(SomeNumbers))
        {
            Console.WriteLine(n);
        }
    }
}

By messing with the nesting of my curly braces, I've made it look like the original, but really it's made of three nested lambdas. So this version of SomeNumbers is, deep breath... a function that returns a function that returns a function that returns a function.

Each returned function supplies the code to execute for the next step.

The main remaining ingredient is a helper function Lazy.Enumerate that turns our strange contraption into a plain IEnumerable, so we can loop through it conveniently.

public static class Lazy
{
    public class Yielding<T>
    {
        public readonly T Result;
        public readonly Func<Yielding<T>> Next;

        public Yielding(T result)
        {
            Result = result;
            Next = null;
        }

        public Yielding(T result, Func<Yielding<T>> next)
        {
            Result = result;
            Next = next;
        }
    }

    public static Yielding<T> Yield<T>(T value, Func<Yielding<T>> next)
    {
        return new Yielding<T>(value, next);
    }

    public static Yielding<T> Yield<T>(T value)
    {
        return new Yielding<T>(value);
    }

    public class Seq<T> : IEnumerable<T>
    {
        private readonly Func<Yielding<T>> _generator;

        public Seq(Func<Yielding<T>> generator)
        {
            _generator = generator;
        }

        private class Iter : IEnumerator<T>
        {
            private Func<Yielding<T>> _generator;

            public Iter(Func<Yielding<T>> generator)
            {
                _generator = generator;
            }

            public T Current { get; set; }

            public void Dispose() { }

            object System.Collections.IEnumerator.Current
            {
                get { return Current; }
            }

            public bool MoveNext()
            {
                if (_generator == null)
                    return false;

                Yielding<T> yielding = _generator();
                if (yielding == null)
                {
                    _generator = null;
                    return false;
                }

                Current = yielding.Result;
                _generator = yielding.Next;
                return true;
            }

            public void Reset()
            {
                throw new NotImplementedException();
            }
        }

        public IEnumerator<T> GetEnumerator()
        {
            return new Iter(_generator);
        }

        System.Collections.IEnumerator 
            System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }

    public static IEnumerable<T> Enumerate<T>(
                      Func<Yielding<T>> generator)
    {
        return new Seq<T>(generator);
    }
}

Most of the code is "boilerplate" implementation of IEnumerable/IEnumerator. The important bit is the MoveNext method, which calls the generator method to get the next value and the next generator method.

So what's missing? The major thing (aside from the misery of getting the syntax right, closing all the right brackets, etc.) is the lack of try/finally support, which turns out to be extremely useful. We could add support for that, however. Firstly, we'd add this member to Yielding.

public readonly Action Finally;

The generator code would initialize that field to whatever action it liked, to represent the finally block, before returning the Yielding instance. And Lazy.Seq.Iter would store it so it could execute it before retrieving the next value, and it would also execute it from the Dispose method, so that the Finally action would run even if the loop was abandoned.