This is an EX blog...

Tuesday, 9 June 2009

Generic Value Object With Yield Return

Inspired by this and this, here's my version.

Jan Van Ryswyck is right to use static reflection, and to do that he "registers" the properties in a list. On the other hand, assuming you'd have a lot of these objects building up in the GC, it might prove important to reduce the number of ancillary allocations going on, so it may be better to avoid creating a list of the properties in every single instance.

It's not really necessary to have the full information about the properties via reflection; we only need the values. Also it's a pain to have to distinguish between fields and properties. Finally, we are really doing operations on a sequence of values, which can be expressed very neatly with Linq operators, and specified in the derived class with an iterator method.

So here's what I came up with:

public abstract class ValueObject<T> : IEquatable<T>
   where T : ValueObject<T>
{
   protected abstract IEnumerable<object> Reflect();

   public override bool Equals(Object obj)
   {
       if (ReferenceEquals(null, obj)) return false;
       if (obj.GetType() != GetType()) return false;
       return Equals(obj as T);
   }

   public bool Equals(T other)
   {
       if (ReferenceEquals(null, other)) return false;
       if (ReferenceEquals(this, other)) return true;
       return Reflect().SequenceEqual(other.Reflect());
   }

   public override int GetHashCode()
   {
       return Reflect().Aggregate(36, (hashCode, value) => value == null ?
                               hashCode : hashCode ^ value.GetHashCode());
   }

   public override string ToString()
   {
       return "{ " + (string) Reflect().Aggregate((l, r) => l + ", " + r) + " }";
   }
}

First off, it's a lot shorter! Aside from the standard ReferenceEquals checks, every method is a one-liner, a return of a single expression. Check out how SequenceEqual does so much work for us. And Aggregate is designed for turning a sequence into one value, which is exactly what GetHashCode and ToString are all about.

This is all possible because we're treating the properties or fields as just a sequence of values, obtained from the Reflect method, which the derived class has to supply.

(You could easily add operator== and operator!= of course. Also in case you're wondering about the way ToString appears not to check for null, actually it does, because one odd thing about .NET is that string concatenation can cope with null strings.)

Secondly, the way you use it is pretty neat as well:

public class Person : ValueObject<Person>
{
   public int Age { get; set; }
   public string Name { get; set; }

   protected override IEnumerable<object> Reflect()
   {
       yield return Age;
       yield return Name;
   }
}

It wouldn't matter if I yielded the values of fields or properties, and there's no need for expression-lambda trickery to get a PropertyInfo.

If you're unfamiliar with how iterator methods work, you may be wondering, what if I have twenty fields and the first fields of two instances are unequal, isn't this going to waste time comparing the other nineteen fields? No, because SequenceEqual will stop iterating as soon as it finds an unequal element pair, and iterator methods are interruptible.

(Note that if you need this to work on .NET 2.0, you can grab Jared Parsons' BCL Extras library to get the necessary sequence operations. If you're using the C# 2.0 compiler you just need to rejig the method call syntax to avoid using them as extension methods. Iterator methods were already available in 2.0, so nothing here is dependent on 3.0.)

Thursday, 4 June 2009

Lazy Sequence Generation without using Yield Return

C# has the yield return feature, which makes it easy to write state machines as if they were plain old methods:

class Program
{
    static IEnumerable<int> SomeNumbers()
    {
        Console.WriteLine("Started");

        yield return 1;
        
        Console.WriteLine("Yielded 1");

        yield return 2;
            
        Console.WriteLine("Yielded 2");

        yield return 3;

        Console.WriteLine("Finished");
    }

    static void Main(string[] args)
    {
        foreach (int n in SomeNumbers())
        {
            Console.WriteLine(n);
        }
    }
}

The output shows that the sequence of numbers is generated "lazily", with each chunk of code in the method being executed only on demand as the foreach pulls values from sequence.

How close can we get to this using only lamdbas? Pretty close:

class Program
{
    static Lazy.Yielding<int> SomeNumbers()
    {
        Console.WriteLine("Started");

        return Lazy.Yield(1, () => {

        Console.WriteLine("Yielded 1");

        return Lazy.Yield(2, () => {
        
        Console.WriteLine("Yielded 2");

        return Lazy.Yield(3, () => {

        Console.WriteLine("Finished");

        return null; }); }); });
    }

    static void Main(string[] args)
    {
        foreach (int n in Lazy.Enumerate<int>(SomeNumbers))
        {
            Console.WriteLine(n);
        }
    }
}

By messing with the nesting of my curly braces, I've made it look like the original, but really it's made of three nested lambdas. So this version of SomeNumbers is, deep breath... a function that returns a function that returns a function that returns a function.

Each returned function supplies the code to execute for the next step.

The main remaining ingredient is a helper function Lazy.Enumerate that turns our strange contraption into a plain IEnumerable, so we can loop through it conveniently.

public static class Lazy
{
    public class Yielding<T>
    {
        public readonly T Result;
        public readonly Func<Yielding<T>> Next;

        public Yielding(T result)
        {
            Result = result;
            Next = null;
        }

        public Yielding(T result, Func<Yielding<T>> next)
        {
            Result = result;
            Next = next;
        }
    }

    public static Yielding<T> Yield<T>(T value, Func<Yielding<T>> next)
    {
        return new Yielding<T>(value, next);
    }

    public static Yielding<T> Yield<T>(T value)
    {
        return new Yielding<T>(value);
    }

    public class Seq<T> : IEnumerable<T>
    {
        private readonly Func<Yielding<T>> _generator;

        public Seq(Func<Yielding<T>> generator)
        {
            _generator = generator;
        }

        private class Iter : IEnumerator<T>
        {
            private Func<Yielding<T>> _generator;

            public Iter(Func<Yielding<T>> generator)
            {
                _generator = generator;
            }

            public T Current { get; set; }

            public void Dispose() { }

            object System.Collections.IEnumerator.Current
            {
                get { return Current; }
            }

            public bool MoveNext()
            {
                if (_generator == null)
                    return false;

                Yielding<T> yielding = _generator();
                if (yielding == null)
                {
                    _generator = null;
                    return false;
                }

                Current = yielding.Result;
                _generator = yielding.Next;
                return true;
            }

            public void Reset()
            {
                throw new NotImplementedException();
            }
        }

        public IEnumerator<T> GetEnumerator()
        {
            return new Iter(_generator);
        }

        System.Collections.IEnumerator 
            System.Collections.IEnumerable.GetEnumerator()
        {
            return GetEnumerator();
        }
    }

    public static IEnumerable<T> Enumerate<T>(
                      Func<Yielding<T>> generator)
    {
        return new Seq<T>(generator);
    }
}

Most of the code is "boilerplate" implementation of IEnumerable/IEnumerator. The important bit is the MoveNext method, which calls the generator method to get the next value and the next generator method.

So what's missing? The major thing (aside from the misery of getting the syntax right, closing all the right brackets, etc.) is the lack of try/finally support, which turns out to be extremely useful. We could add support for that, however. Firstly, we'd add this member to Yielding.

public readonly Action Finally;

The generator code would initialize that field to whatever action it liked, to represent the finally block, before returning the Yielding instance. And Lazy.Seq.Iter would store it so it could execute it before retrieving the next value, and it would also execute it from the Dispose method, so that the Finally action would run even if the loop was abandoned.

Friday, 22 May 2009

VC++16 has decltype in Beta 1 (so Linq to C++0x looks a little nicer)

This post has moved to my new(er) blog and can be found here: http://smellegantcode.wordpress.com/2009/05/22/vc16-has-decltype-in-beta-1-so-linq-to-c0x-looks-a-little-nicer/

A while ago I wrote up a little demo of using lambdas to implement convenient lazy list comprehension in C++0x, but at the time the VC++ compiler I was using lacked decltype, and I found a need for it.

I just tried out the Beta 1 that was released this week, and it has decltype, so here's the updated sample:

#include <iostream>
#include <vector>
#include <sstream>

// Some helpers for working with decltype 
// (maybe these are in std:: somewhere?)
template <class T>
T *make_ptr() { return (T *)0; }

template <class T>
const T &make_ref() { return *(make_ptr<T>()); }

// so you don't need to obtain boost::lexical_cast!
template <class T>
std::string to_string(const T &v)
{
    std::ostringstream s;
    s << v;
    return s.str();
}

// Unary function that always returns true
template <class T>
struct always_true
{
    bool operator() (const T &) const { return true; }
};

// Unary function that returns its argument
template <class T>
struct pass_thru
{
    T operator() (const T &e) const { return e; }
};

template <class TElemFrom, 
          class TElemTo, 
          class TIterFrom, 
          class TSelector,
          class TPredicate>
class filter
{
    TIterFrom from_;
    TIterFrom end_;
    const TSelector &selector_;
    const TPredicate &predicate_;

public:
    filter(TIterFrom from, TIterFrom end, 
             const TSelector &selector, 
             const TPredicate &predicate)
        : from_(from), end_(end), 
          selector_(selector), 
          predicate_(predicate) {}

    class const_iterator
    {
        TIterFrom from_;
        TIterFrom end_;
        const TSelector &selector_;
        const TPredicate &predicate_;

        void locate()
        {
            while (!done())
            {
                if (predicate_(*from_))
                    return;

                ++from_;
            }
        }

        bool done() const { return (from_ == end_); }

    public:
        const_iterator(TIterFrom from, TIterFrom end, 
                       const TSelector &selector,
                       const TPredicate &predicate)
            : from_(from), end_(end), 
              selector_(selector),
              predicate_(predicate) { locate(); }

        TElemTo operator*() const { return selector_(*from_); }

        const_iterator operator++()
        {
            ++from_;
            locate();
            return *this;
        }

        bool operator==(const const_iterator &other) const
            { return done() && other.done(); }

        bool operator!=(const const_iterator &other) const
            { return !done() || !other.done(); }
    };

    typedef TElemFrom value_type;

    const_iterator begin() const 
        { return const_iterator(from_, end_, selector_, predicate_); }

    const_iterator end() const 
        { return const_iterator(end_, end_, selector_, predicate_); }

    template <class TSelector>
    filter<TElemTo, 
             decltype(make_ref<TSelector>()(make_ref<TElemTo>())),
             const_iterator, 
             TSelector,
             always_true<TElemTo> >
        select(const TSelector &selector)
    {
        return filter<TElemTo, 
            decltype(make_ref<TSelector>()(make_ref<TElemTo>())), 
                      const_iterator, 
                      TSelector, 
                      always_true<TElemTo> >
                    (begin(), 
                     end(), 
                     selector, 
                     always_true<TElemTo>());
    }

    template <class TPredicate>
    filter<TElemTo, 
              TElemTo,
             const_iterator, 
             pass_thru<TElemTo>, 
             TPredicate> 
        where(const TPredicate &predicate)
    {
        return filter<TElemTo, 
                        TElemTo,
                        const_iterator, 
                        pass_thru<TElemTo>, 
                        TPredicate>
                    (begin(),
                     end(),
                     pass_thru<TElemTo>(), 
                     predicate);
    }
};

template <class TCollFrom>
filter<typename TCollFrom::value_type, 
         typename TCollFrom::value_type,
         typename TCollFrom::const_iterator, 
         pass_thru<typename TCollFrom::value_type>,
         always_true<typename TCollFrom::value_type> >
    from(const TCollFrom &from)
{
    return filter<typename TCollFrom::value_type, 
                    typename TCollFrom::value_type, 
                    typename TCollFrom::const_iterator, 
                    pass_thru<typename TCollFrom::value_type>, 
                    always_true<typename TCollFrom::value_type> >
                (from.begin(), 
                 from.end(), 
                 pass_thru<typename TCollFrom::value_type>(), 
                 always_true<typename TCollFrom::value_type>());
}

int main(int argc, char* argv[])
{
    std::vector<int> vecInts;
    vecInts.push_back(5);
    vecInts.push_back(2);
    vecInts.push_back(9);

    auto filtered = from(vecInts)
        .where([] (int n) { return n > 3; })
        .select([] (int n) { return "\"" + to_string(n) + "\""; });

    for (auto i = filtered.begin(); i != filtered.end(); ++i)
        std::cout << *i << std::endl;

    return 0;
}

The main function is the pay-off, of course. I have a vector of ints, and I wrap it something that ends up stored in a variable called filtered, which I can then iterate through. The filtering (discarding anything not greater than 3) and transforming (int to quoted string) of the items happens on-the-fly during that iteration.

(And to clarify, this is different from the last time I posted it because select no longer requires any type parameters to be explicitly given - thanks to decltype).

Thursday, 14 May 2009

Recovery

If you want your application to be able to recover from a crash, it would NOT be ideal for it to restore the exact state it was in immediately before the crash, because then it would just crash again.

(Same goes for economies, presumably).

Monday, 11 May 2009

When to use Stored Procedures?

No coding blog would be complete without an overly-simplistic salvo fired into this bloody online battlefield, so here's mine.

What if you're writing an application that will make heavy use of an RDBMS?

If you're writing an packaged app for sale to customers who want to run it on the RDBMS of their choice, don't use stored procedures.

But what if you're writing something bespoke?

Maybe you've never in practice had to change RDBMS in mid-project. But maybe that's because you automatically ruled it out as an impossibility - how do you know what advantages you might have had, what client license bundle offers you could have benefitted from, if you maintained the ability to switch RDBMS vendors on a whim? Competition pressure is what keeps vendors competitive, and competition pressure disappears if customers lose the ability to switch. The vendors know this, which is why they want to lock you in. So if you want to be able to get the best deal out of RDBMS vendors, don't use stored procedures.

But what if you want to have a layered architecture?

Adding more languages doesn't enable more layers. It just adds more development cost. You can write a data access abstraction layer in any language - and in Java and .NET, open source toolkits like [n]Hibernate already make this child's play, largely eliminating the need to generate your own SQL strings or DDL.

If you want to reduce development costs by keeping the number of language skills required to maintain an app lower, by writing all business logic and data layer logic in one language, don't use stored procedures.

But what if you're prepared to take on extra cost because you want to optimise by moving execution into the RDBMS?

If you haven't yet done any comparative profiling to establish where such optimisation is really needed or makes any measurable difference, don't use stored procedures.

... otherwise, knock yourself out!

Sunday, 29 March 2009

General form of C# Object Initializers

C# has a language feature that allows several properties of an object to be assigned to as a suffix to a new expression, namely the object initializer:

var myObj = new MyClass
            {
                SomeProperty = 5,
                Another = true,
                Complain = str => MessageBox.Show(str),
            };

As properties can have hand-coded setters, this is an opportunity to call several methods on the newly constructed object, without having to make each method return the same object.

The limitations on property setters are:

They can only accept one argument
They cannot be generic

I would like it if we could call methods and enlist in events, as well as assign to properties, inside an object initializer block.

var myObj = new MyClass
            {
                SomeProperty = 5,
                Another = true,
                Complain = str => MessageBox.Show(str),
                DoSomething(),
                Click += (se, ev) => MessageBox.Show("Clicked!"),
            };

And why should such a block of modifications only be applicable immediately after construction? We could have:

myObj with
{
    SomeProperty = 5,
    Another = true,
    Complain = str => MessageBox.Show(str),
    DoSomething(),
    Click += (se, ev) => MessageBox.Show("Clicked!"),
}

The with would be a new keyword that operates on an object of some type and produces the same object and type - note that this would be an expression, not a statement.

So you could use initializer-style syntax regardless of whether you'd got the object from a new expression or from an IOC or factory method, etc.

In fact you could use with after a complete new and it would be equivalent to the current style of object initializer:

var myObj = new MyClass() with
            {
                SomeProperty = 5,
                Another = true,
                Complain = str => MessageBox.Show(str),
                DoSomething(),
                Click += (se, ev) => MessageBox.Show("Clicked!")
            };

I mused about this in a Stack Overflow answer, and Charlie Flowers pointed out something I should have realised immediately - we can implement with as an extension method.

public static T With(this T with, Action<T> action)
{
    if (with != null)
        action(with);
    return with;
}

Equivalent of normal object initializer, but with event enlisting:

var myObj = new MyClass().With(w =>
            {
                w.SomeProperty = 5;
                w.Another = true;
                w.Click += (se, ev) => MessageBox.Show("Clicked!");
            };

And on a factory method instead of a new:

var myObj = Factory.Alloc().With(w =>
            {
                w.SomeProperty = 5;
                w.Another = true;
                w.Click += (se, ev) => MessageBox.Show("Clicked!");
            };

I couldn't resist giving it the "maybe monad"-style check for null as well, so if you have something that might return null, you can still apply With to it and then check it for null-ness.

Friday, 6 March 2009

Readonly Fields and Thread Safety

In C# you can mark the fields of a type as readonly (indeed you generally should if it's a struct). By doing so, you make it illegal to change the fields except in a constructor of the type.

The advantage of this is that readonly data can be shared freely between threads without any thread causing updates that may be seen "in progress" by other threads, simply because there never will be any updates.

But is this strictly speaking true? Nope. It would be true if it were impossible for another thread to gain access to a partially constructed object. This seems true at first because the constructor of an object effectively "returns" a reference to itself via the new operator, which then becomes available to the caller of new, at which point the object is available to be passed around and is also fully constructed.

But a constructor can call methods, and it can pass this to methods. Those methods could pass that information to other threads. So you have to be careful what you do in a constructor. Maybe the compiler will help to check this somehow in a future version of the language.

This ties in nicely with some general advice about constructors that Brad Abrams gives in the Annotated C# Programming Language: do as little as possible in the constructor, and do everything else "lazily" - that is, the first time you need it. If you find yourself requiring non-readonly fields to achieve that, then you potentially have a problem with your design, because your class as a whole is not going to be readonly and so will not be trivially shareable in the way you originally assumed.

This is an EX blog...

I have moved!