I have moved!

I've moved my blog
CLICK HERE

Thursday 29 January 2009

Using Collection Initializers with Immutable Lists

The Incomparable Mr Skeet mentions how irksome it is that collection initializers assume that the collection is mutable.

Suppose we defined an "immutable-yet-initializable collection" as any type that supports this interface:

public interface IImmutableCollection<TCollection, TItem> : IEnumerable<TItem>
       where TCollection : IImmutableCollection<TCollection, TItem>, new()
{
    TCollection Add(TItem item);
}

In other words, it can be enumerated, it can be default-constructed (to get an empty list) and it has an Add method that returns a new list with the item added to it.

And suppose our simple demo implementation of that is as follows:

public class SimpleImmutableList<TItem> : 
             IImmutableCollection<SimpleImmutableList<TItem>, TItem>
{
    private SimpleImmutableList<TItem> _next;

    private TItem _head;

    public SimpleImmutableList<TItem> Add(TItem item)
    {
        return new SimpleImmutableList<TItem>
                   {
                       _next = this,
                       _head = item
                   };
    }

    public IEnumerator<TItem> GetEnumerator()
    {
        for (SimpleImmutableList<TItem> i = this; 
             i._next != null; i = i._next)
            yield return i._head;
    }

    System.Collections.IEnumerator 
           System.Collections.IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

Not very sophisticated, doesn't attempt to solve the well-known "OMG! my list is backward" problem. But anyway. Our problem is that we want this delightful experience:

List<string> mutableList = 
        new List<string>
                   {
                       "First",
                       "Second",
                       "Third"
                   };
Debug.Assert(mutableList.Count() == 3);
Debug.Assert(mutableList.ElementAt(2) == "Third");

Except that we want to make a SimpleImmutableList<string> instead of a List<string>. Well, we can get pretty close:

SimpleImmutableList<string> immutableList = 
        new Initializer<SimpleImmutableList<string>, string>
                    {
                        "First",
                        "Second",
                        "Third"
                    }.Result;
Debug.Assert(immutableList.Count() == 3);
Debug.Assert(immutableList.ElementAt(2) == "First"); // OMG! etc

I've picked out in red ink the extra noise we have to add, but it's not that bad. The enabling thing is that Initializer class:

public class Initializer<TCollection, TItem>  : IEnumerable<TItem>
       where TCollection : IImmutableCollection<TCollection, TItem>, 
                           new()
{
    public Initializer()
    {
        Result = new TCollection();
    }

    public void Add(TItem item)
    {
        Result = Result.Add(item);
    }

    public TCollection Result { get; private set; }

    public IEnumerator<TItem> GetEnumerator()
    {
        return Result.GetEnumerator();
    }

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return Result.GetEnumerator();
    }
}

So if anyone tells you that it is impossible to initialize an immutable collection with a collection initializer, tell them about the time you met a crazy old man with some wild ideas about an Initializer class, and see what they say1.

1. "Get out of my house".

Monday 26 January 2009

Linq to C++0x

Updated 2009-01-28 - completely different approach to get a fluent interface.

Updated 2009-05-22 - See new post with decltype goodness!

C++ has long had some functional-styled things in its standard library, in particular a header file called <functional>. But actually using it in a functional way has been painful, largely due to the lack of lambdas.

But C++0x (very good Wikipedia page, by the way) solves this, amongst many other long-standing problems.

With lambdas in place, the next thing I naturally wonder is: what about a complete system of lazy-evaluated list processing, which is to say: what about Linq to Objects in C++?

The C++ equivalent of IEnumerable<T> is a collection of T. Another good thing in C++0x is concepts, which allows us to state for the first time precisely what we mean by a collection in C++. I don't yet have a compiler that supports concepts, so I'll just say that for the sake of argument, a collection is anything that has a nested type called const_iterator and two const methods called begin and end that return instances of that const_iterator type. (I'm using const iterators because I want to support functional style programming, and support for immutability is one area where C++ still leaves C# in the dust).

The iterator is the equivalent of IEnumerator<T>, and must have an * operator so we can retrieve the current item, and a ++ operator so we can navigate to the next item. We discover if we're at the end by comparing the iterator with the instance returned by end, so iterators must also have equality comparison.

Just as a minimal demonstration, I want to recreate the Select and Where extension methods. They don't actually use any new C++0x features, but as we'll see, they become much more pleasant to use as a result of lambdas.

In order to share as much code as I can, I'll create a single new container-like class template called filter, which simulates a container by borrowing the contents of another container. It will have built-in features to support skipping over some elements, and to transform elements to new values (not necessarily of the same type).

Because of this approach, I'll need a way to switch off these features when I don't need them. When deciding whether to skip over an element, we will use a predicate, which is a function that takes some value and returns a bool. To switch off that feature, we'd simply pass a predicate that always returns true:

template <class T>
struct always_true
{
    bool operator() (const T &) const { return true; }
};

Similarly to switch off element transformation we need a function that just passes through whatever it is given:

template <class T>
struct pass_thru
{
    T operator() (const T &e) const { return e; }
};

I could of course re-write these simple functions as lambdas wherever I need them, but as I need them in a couple of places it makes sense to name them in the traditional way.

And now here's the filter class template.

template <class TElemFrom, 
          class TElemTo, 
          class TIterFrom, 
          class TSelector,
          class TPredicate>
class filter
{
    TIterFrom from_;
    TIterFrom end_;
    const TSelector &selector_;
    const TPredicate &predicate_;

public:
    filter(TIterFrom from, TIterFrom end, 
             const TSelector &selector, 
             const TPredicate &predicate)
        : from_(from), end_(end), 
          selector_(selector), 
          predicate_(predicate) {}

    class const_iterator
    {
        TIterFrom from_;
        TIterFrom end_;
        const TSelector &selector_;
        const TPredicate &predicate_;

        void locate()
        {
            while (!done())
            {
                if (predicate_(*from_))
                    return;

                ++from_;
            }
        }

        bool done() const { return (from_ == end_); }

    public:
        const_iterator(TIterFrom from, TIterFrom end, 
                       const TSelector &selector,
                       const TPredicate &predicate)
            : from_(from), end_(end), 
              selector_(selector),
              predicate_(predicate) { locate(); }

        TElemTo operator*() const { return selector_(*from_); }

        const_iterator operator++()
        {
            ++from_;
            locate();
            return *this;
        }

        bool operator==(const const_iterator &other) const
            { return done() && other.done(); }

        bool operator!=(const const_iterator &other) const
            { return !done() || !other.done(); }
    };

    typedef TElemFrom value_type;

    const_iterator begin() const 
        { return const_iterator(from_, end_, selector_, predicate_); }

    const_iterator end() const 
        { return const_iterator(end_, end_, selector_, predicate_); }

    template <class TElemTo2, class TSelector>
    filter<TElemTo, 
             TElemTo2,
             const_iterator, 
             TSelector,
             always_true<TElemTo> >
        select(const TSelector &selector)
    {
        return filter<TElemTo, 
                        TElemTo2, 
                        const_iterator, 
                        TSelector, 
                        always_true<TElemTo> >
                    (begin(), 
                     end(), 
                     selector, 
                     always_true<TElemTo>());
    }

    template <class TPredicate>
    filter<TElemTo, 
              TElemTo,
             const_iterator, 
             pass_thru<TElemTo>, 
             TPredicate> 
        where(const TPredicate &predicate)
    {
        return filter<TElemTo, 
                        TElemTo,
                        const_iterator, 
                        pass_thru<TElemTo>, 
                        TPredicate>
                    (begin(),
                     end(),
                     pass_thru<TElemTo>(), 
                     predicate);
    }
};

All the real work is done by the nested const_iterator class, which performs skipping using the predicate in the ++ operator and performs transformation using the 'selector' in the * operator.

It's quite an unwieldy class to construct, what with all those type parameters and actual parameters, but we can make it much easier by providing a free function called from:

template <class TCollFrom>
filter<typename TCollFrom::value_type, 
         typename TCollFrom::value_type,
         typename TCollFrom::const_iterator, 
         pass_thru<typename TCollFrom::value_type>,
         always_true<typename TCollFrom::value_type> >
    from(const TCollFrom &from)
{
    return filter<typename TCollFrom::value_type, 
                    typename TCollFrom::value_type, 
                    typename TCollFrom::const_iterator, 
                    pass_thru<typename TCollFrom::value_type>, 
                    always_true<typename TCollFrom::value_type> >
                (from.begin(), 
                 from.end(), 
                 pass_thru<typename TCollFrom::value_type>(), 
                 always_true<typename TCollFrom::value_type>());
}

This does nothing much apart from wrapping a real container inside an instance of filter, and switching off all filtering or transformation, so we can then call the where or select members. As those members also return instances of filter, this gives us a convenient "fluent interface". Those concerned about performance should be reassured that the compiler ought to be able to inline away all of this syntactical plumbing when it sees the trivial definitions of pass_thru and always_true.

For example, filter a list of integers and turn them into quoted strings:

std::vector<int> vecInts;
vecInts.push_back(5);
vecInts.push_back(2);
vecInts.push_back(9);

auto filtered = from(vecInts)
    .where([] (int n) { return n > 3; })
    .select<std::string>([] (int n)
        { return "\"" + 
                 boost::lexical_cast<std::string>(n) + 
                 "\""; });

for (auto i = filtered.begin(); i != filtered.end(); ++i)
    std::cout << *i << std::endl;

And that prints:

"5"
"9"

How is this select function different to std::transform? Because it is evaluated lazily. None of the work is done until we are in the final for loop, and I could break out of that early if I wanted to.

The one irritation is the fact that I have to tell select what the output value type will be, even though the compiler can lambda deduce it from the lambda in this case. Why can't the compiler join up the dots for us? In fact, in a full implementation of C++0x, it could, but I don't yet have access to such a compiler. Here's how I believe it would work.

First, we need the decltype keyword, which yields the compile-time type of any expression. Suppose we have something function-like of type TSelector, and we want to know what it would return when called. We also happen to know that it takes one parameter of type TElemTo (because we're taking the output of the previous filter).

So we need an instance of TSelector:

TSelector()

And then on that, we need to call it, passing it an instance of TElemTo:

TSelector()(TElemTo())

And then we want the type of that expression:

decltype(TSelector()(TElemTo()))

Hey presto, we might think: this can replace the TElemTo2 type parameter, so callers would no longer need to write select<std::string>.

But I suspect this won't work, because it's very likely that TSelector will not be default-constructible (if it is a lambda that captures by reference, for example). The trick in such situations is a make_ptr function:

template <class T>
T *make_ptr() { return (T *)0; }

It actually returns a NULL pointer of the specified type. We can then define a make_ref that turns that into a reference, and our expression becomes:

decltype(make_ref<TSelector>()(make_ref<TElemTo>()))

Of course, if that particular operation was ever executed, it would result in major crash-ola. But it never will be executed; decltype doesn't execute the expression, it just figures out the resulting type.

Then we can get rid of TElemTo2, and our example call looks like this:

auto filtered = from(vecInts)
    .where([] (int n) { return n > 3; })
    .select([] (int n) { return "\"" + 
                         boost::lexical_cast<std::string>(n) + 
                         "\""; });

I hope decltype makes it into MSVC++ version 16.