I have moved!

I've moved my blog
CLICK HERE

Wednesday 9 April 2008

Coroutines with IEnumerable and yield return

There was a brief flurry of excitement about the fancy yield return syntax when the C# 2.0 specification was first published some years ago, and the idea of it being used for creating coroutines.

This quickly died down, as far as I can see anyway. Either people forgot the idea, or it became such boringly standard practice that no one ever thinks to discuss it anymore. I don't know which, but I think it deserves more attention either way.

There were a few people at the time saying that it was a shame that C# appeared to only support coroutines in a special case where the function returns some weird IEnumerable<T> thing, and wouldn't it have been better if it was more general?

But that's a misunderstanding. The caller that drives the execution of the coroutine needs some uniform way of doing so, and also of getting results as they become available. So an enumeration is in fact an ideal and general interface to a coroutine. Nor is it necessary to use the enumerated results as the "actual" results of the coroutine, as I will soon demonstrate.

Aside: A word about threads. Obviously using a background thread is the most general solution, but it can be difficult to get right (or difficult to know when you've got it wrong, which amounts to the same thing). It also requires the results to be marshalled to the foreground thread because most GUI frameworks can't cope with direct calls from multiple threads.

The single-threaded approach requires you to break your long-running algorithm into little slices that can be performed one at a time. This may be impossible, if the long waits are caused by going out to some 3rd party library function that you don't have the ability to break up into multiple short-lived calls.

Supposing you own the code for a lengthy algorithm, and you want to break it up into steps that can be executed one at a time, allowing you to keep interacting with the user. How should you go about doing it? It's not fun to turn a function inside out and convert it into a state machine. The end result is often harder to understand and maintain. But this is where yield return comes to the rescue.

An example: suppose my very complicated algorithm is going to fill in a list with the numbers from 1 to a million. Here's the simple version:

void Fill(ListView listView)
{
    for (int n = 1; n <= Million; n++)
    {
        listView.Items.Add(n.ToString());
    }
}
 
And I might call it like this:
 
Fill(myListView);
 
Which is fine except that the user interface locks up for about ten seconds until it's done. So here's a different way of writing it:
 
IEnumerable<double> Fill(ListView listView)
{
    for (int n = 1; n <= Million; n++)
    {
        listView.Items.Add(n.ToString());
        yield return (double)n / (double)Million;
    }
}
 

It returns an IEnumerable and it spits out a double each time it goes round the loop. The doubles actually just indicate how far through the operation we are, ranging from 0 to 1. This is what I mean by not using the enumerated values as the "actual" results of the coroutine. The actual results are whatever we do with the parameters to the coroutine - in this case, we put the results in a ListView.

By emitting doubles between 0 and 1 that indicate the progress, we have a completely general interface to a coroutine. Of course, you could just yield zero or a random number each time - it just means you lose out on a nice feature as we'll see in a moment.

Now we need some way of starting it running in parallel, a little like launching a thread. This is what my CoroutineOnEnumerable class is for:

// Using a coroutine
var listFiller = new CoroutineOnEnumerable();

listFiller.Start(Fill(myListView));

We can also insert an extra step in that snippet to tie the listFiller object to a progress bar to show the user how far we've progressed:

// Using a coroutine
var listFiller = new CoroutineOnEnumerable();

// Attach to progress bar
listFiller.Progressed += (s, e) =>
        myProgressBar.Value = listFiller.Progress;

listFiller.Start(Fill(myListView));

Each time the routine progresses, we set the Value of myProgressBar, which of course has a range of 0 to 1. This is the only reason why you need to care about what values you yield during your coroutine's execution. They supply the value of the Progress property. If you aren't going to use that to indicate progress to the user, you can just say yield return 0. That's what I do when the coroutine has no easy way to figure out how far it has gone (for example, recursively scanning a tree structure such as the file system).

You can also attach a handler for when it finishes, as it has a Completed event (this fires when the coroutine actually finishes, not when it yields a 1).

If the caller wants to abandon the operation (for example, the user got tired of waiting and pressed a Cancel button), you should call the Stop method on your CoroutineOnEnumerable instance (or alternatively, call the standard Dispose method - both have the same effect). If you don't do this then any finally blocks within the coroutine will never run.

It also has an adjustable MinimumTimeSlice, which is the minimum number of milliseconds that it will spend executing the routine before actually yielding control to the environment. This is important - if we actually yielded control back to the environment every time yield return in the coroutine, it could run very slowly. Note that this is only a minimum limit, not a maximum - this is cooperative multithreading, and is dependent on the coroutine calling yield return at regular intervals.

Here's CoroutineOnEnumerable.cs:
using System;
using System.Windows.Threading;
using System.Collections.Generic;

namespace WpfApplication1
{
    public class CoroutineOnEnumerable : IDisposable
    {
        private DispatcherTimer m_timer;
        private IEnumerator<double> m_routine;
        private double m_progress;

        public event EventHandler Progressed, Completed;
        public double MinimumTimeSlice { get; set; }

        public CoroutineOnEnumerable()
        {
            MinimumTimeSlice = 100;
        }

        public bool Busy
        {
            get { return m_timer != null; }
        }

        public double Progress
        {
            get { return m_progress; }

            set
            {
                if (m_progress != value)
                {
                    m_progress = value;

                    if (Progressed != null)
                        Progressed(this, EventArgs.Empty);
                }
            }
        }

        public void Start(IEnumerable<double> routine)
        {
            Stop();

            m_routine = routine.GetEnumerator();

            m_timer = new DispatcherTimer(DispatcherPriority.Normal,
                                          Dispatcher.CurrentDispatcher);
            m_timer.Interval = TimeSpan.FromMilliseconds(1);
            m_timer.Tick += OnTick;
            m_timer.Start();
        }

        public void Stop()
        {
            if (m_timer != null)
            {
                m_timer.Stop();
                m_timer = null;
            }

            if (m_routine != null)
            {
                IDisposable disp = m_routine as IDisposable;
                if (disp != null)
                    disp.Dispose();
            }
        }

        void OnTick(object sender, EventArgs e)
        {
            if (m_routine != null)
            {
                System.Diagnostics.Stopwatch stopWatch = new System.Diagnostics.Stopwatch();
                stopWatch.Start();

                while (stopWatch.ElapsedMilliseconds < MinimumTimeSlice)
                {
                    if (!m_routine.MoveNext())
                    {
                        Stop();
                        Progress = 1;
                        if (Completed != null)
                            Completed(this, EventArgs.Empty);

                        return;
                    }
                }

                Progress = m_routine.Current;
            }
        }

        public void Dispose()
        {
            Stop();
        }
    }
}

And here's the example app code that you can paste into a Wizard generated WPF application. Firstly the XAML in Window1.xaml:

<Window x:Class="WpfApplication1.Window1"
 xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
 xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
 Title="Window1" Height="300" Width="300">
    <Grid>
        <Grid.RowDefinitions>
            <RowDefinition Height="20"/>
            <RowDefinition/>
        </Grid.RowDefinitions>
        <ProgressBar Grid.Row="0" Name="myProgressBar" Minimum="0" Maximum="1"/>
        <ListView Grid.Row="1" Name="myListView"/>
    </Grid>
</Window>
And the code-behind in Window1.xaml.cs:
using System.Collections.Generic;
using System.Windows;
using System.Windows.Controls;

namespace WpfApplication1
{
    /// <summary>
    /// Interaction logic for Window1.xaml
    /// </summary>
    public partial class Window1 : Window
    {
        const int Million = 1000000;

        public Window1()
        {
            InitializeComponent();

            // Fill(myListView);   -- Traditional subroutine

            // Using a coroutine
            var listFiller = new CoroutineOnEnumerable();

            // Attach to progress bar
            listFiller.Progressed += (s, e) =>
                    myProgressBar.Value = listFiller.Progress;

            listFiller.Start(Fill(myListView));
        }

        /* The traditional subroutine definition

        void Fill(ListView listView)
        {
            for (int n = 1; n <= Million; n++)
            {
                listView.Items.Add(n.ToString());
            }
        }
        */
        
        // The coroutine definition
        IEnumerable<double> Fill(ListView listView)
        {
            for (int n = 1; n <= Million; n++)
            {
                listView.Items.Add(n.ToString());
                yield return (double)n / (double)Million;
            }
        }
    }
}

Speaking as a long-time suffering of thread anxiety, this built-in support for coroutines is reason enough for me to prefer C# over any other mainstream language.

2 comments:

blessYAHU said...

Nice article and possibly an answer to a question I had about Coroutines. Other languages (like smalltalk) praise Coroutines for the ability to pause one process/method and run another midstream. How would this be done with what you have explained here? Make the method call on the yield statement?

Daniel said...

The scheduler class here takes advantage of WPF's message pump (via the timer) to tell it when to do some work. So you can start several of them running simultaneously. Also you could add a Pause property, and it would stop doing work while Pause was true.

Alternatively you could have a more sophisticated scheduler designed to manage the interaction of several coroutines. For simplicity you could make it a singleton so any coroutine can access it. Then you could implement a class that would be analgous to a mutex, which would allow any coroutine to pause any other coroutine based on attempt to lock the same mutex. In other words, you could reinvent the whole set of threading primitives if you wanted to.