I have moved!

I've moved my blog
CLICK HERE

Thursday 26 February 2009

Fatal Exceptions, and Why VB.NET Has a Purpose!

There’s a feature in the CLR that is exposed in VB.NET and not in C#. Apparently this is a deliberate decision made by the C# team, which so far they’ve stuck to. Unfortunately, it’s a pretty important feature.

I was prompted by Andrew Pardoe of the CLR team to look more closely at this when considering try/finally and the problem of “fatal” exceptions. By fatal, I mean an exception that you have no idea how to handle. A good example is NullReferenceException, which almost certainly indicates a bug. If there’s a bug, your program is in an unknown state, and you need to capture that state exactly as it is. Ideally the program would terminate at that point. In the interests of flexibility, the CLR does allow you to catch such exceptions. (Then the CLR team tells you that you shouldn’t – it’s a “fruit of the tree of knowledge” type of situation.)

Very often a method will throw a variety of exceptions to indicate that it was unable to finish what it was supposed to do, but it ain’t the end of the world. These are recoverable exceptions. A simple example is FileInfo.Delete:

string errorMessage = null;

try
{
    new FileInfo(@"c:\myfile.txt").Delete();
}
catch (SecurityException x)
{
    errorMessage = x.Message;
}
catch (IOException x)
{
    errorMessage = x.Message;
}
catch (UnauthorizedAccessException x)
{
    errorMessage = x.Message;
}

if (errorMessage != null)
    MessageBox.Show("You idiot - " + errorMessage);

Now, there’s a slight problem, which we’ll gloss over – how do you know what exceptions a method will throw? Ultimately, you need accurate, reliable, up-to-date documentation for every method. But instead of waiting for hell to freeze over, let’s move on.

When you have an operation that performs many such calls to various APIs, 3rd party libraries and so on, the ideal place to put the handler is at the outermost layer where you can recover. For example, the user presses the button and you run a lot of code to do what the user has asked. The best place to put the try/catch is around that whole sequence of operations, so that the catch can display the error to the user, and so they can figure out why they’re such an idiot. Inside that operation, you need to use try/finally (and related constructs) to ensure that any half-finished stuff is always properly undone. Nice and simple.

Except for one slight irritation: now the set of exceptions you need to catch is the union of all the recoverable exceptions that might be thrown by all the operations performed within that combined operation. This means that your try/catch block potentially has a ridiculously long list of types. It’s also very easy to miss an exception type here or there, which would mean that your program would crash when it didn’t need to. Whenever someone performs maintenance on the big operation (or any of the APIs or 3rd party libraries it calls on to), the huge list of catch blocks needs to be updated to ensure nothing is missed.

Also this all needs to be done separately for every try/catch statement. So you might think of reducing this hideous boilerplate repetition by writing it once:

public static bool Recover(Action action, Action<Exception> recover)
{
    try
    {
        action();
        return true;
    }
    catch (SecurityException x)
    {
        recover(x);
    }
    catch (IOException x)
    {
        recover(x);
    }
    catch (UnauthorizedAccessException x)
    {
        recover(x);
    }
    //others...

    return false;
}

You can now write nice neat lambdas to provide the code for the action to try and the single recovery handler. But you still have the same major problem: this long list of all the recoverable exceptions has to include every single one that might be thrown anywhere in any operation.

What if your product is extensible, such that 3rd parties can write plug-ins for it? How do they add their own recoverable exceptions to the Recover function? They simply can’t. And even if they could, what if they don’t remember to? How does your application protect itself?

So you have explosive complexity and apparently no way to control it.

At this point, the following quite reasonable thought occurs to you: surely the number of fatal exceptions is smaller than the number of non-fatal ones. Indeed, the number of non-fatal ones is constantly growing as we create new exception classes of our own. Also, if I make any mistakes in constructing my software, I’d prefer to not crash rather than crash.

The natural conclusion, which therefore naturally occurs to everyone eventually, is to do this:

try
{
    new FileInfo(@"c:\myfile.txt").Delete();
}
catch (Exception x)
{
    MessageBox.Show("Couldn't delete file: " + x.Message);
}

But then you remember that there’s such a thing as a fatal exception. Although you’d generally prefer your software not to crash, in the event of a definite bug, it is better for it to crash – that way, you get an accurate snapshot of the program’s state. So you need to examine the exception object before you try to handle it:

try
{
    new FileInfo(@"c:\myfile.txt").Delete();
}
catch (Exception x)
{
    if (!ExceptionPolicy.IsExceptionRecoverable(x))
        throw;

    MessageBox.Show("Couldn't delete file: " + x.Message);
}
    

Of course, that IsExceptionRecoverable method has to be written by you, but only once, and it probably has a relatively short list of exception types to look out for, and that list doesn’t grow so fast as the list of recoverable exceptions.

The above solution represents the state of the art according to the Microsoft Enterprise Library. Unfortunately, it’s still not right. The problem is to do with the thing I mentioned above about how to correctly write the code within each long and complex operation:

Inside that operation, you need to use try/finally (and related constructs) to ensure that any half-finished stuff is always properly undone.

The problem is, you only want the finally blocks to run for recoverable exceptions, not fatal ones. To repeat what I said right at the start:

If there’s a bug, your program is in an unknown state, and you need to capture that state exactly as it is.

So you definitely don’t want to run any finally blocks before that state is captured. They might modify it (in fact, that’s the whole point of them). They might make it worse, cause further corruption, or even throw another exception and so hide the original bug.

My first thought when this occurred to me was that try/finally presents a serious problem. How can we make it execute the finally block for a recoverable exception but not for a fatal exception?

Well, we can, sort of. If you don’t ever catch an exception, the finally blocks never run. When an exception is thrown, the CLR does not immediately run the finally blocks. First, it checks to see if there is a suitable catch in the stack for the current exception. If there is, and only if there is, it runs the enclosed finally blocks, and then runs the catch block.

This is why the correct advice is to never catch fatal exceptions, because merely catching them will trigger the silent execution of finally blocks when the program is already in an unrecoverable situation.

But this is sadly no good to us, because we’ve already established that explicitly catching only the recoverable exception types will lead us to a situation of hideously and unmanageable complexity, and programs that crash when they don’t need to. It’s just not practical to work that way.

The problem all stems from the fact that catch blocks can only decide to run based on the type of the exception, and this has to be an “opt in”. It can’t be coded the other way round, as an “opt out”. Yes, exceptions can be arranged in hierarchies and so that one catch block can catch all exception types derived from some base class, but this turns out to be useless because there’s no base class that all recoverable exceptions have in common, except for the class which is also the base class of all fatal exceptions as well.

So we cry: if only the CLR allowed us to run some code of our own to decide whether to catch an exception, before the finally blocks run.

The surprising answer (to a C# programmer) is that the CLR does allow us to do that. It just isn’t made available in C#. Only in VB.NET. Devastating, huh?

So to do exception handling in C#, in a way that is actually manageable, but also doesn’t execute code after a fatal exception, you first have to crank up VB.NET and create a class library containing this class:

Imports System

Public Class Exceptions
    Public Shared Sub TryFilterCatch(ByVal tryAction As Action, _
            ByVal filter As Func(Of Exception, Boolean), _
            ByVal handler As Action(Of Exception))
        Try
            tryAction()
        Catch ex As Exception When filter(ex)
            handler(ex)
        End Try
    End Sub
End Class

This effectively turns the magic feature of VB.NET – the When clause that appears next to the Catch – into something we can reuse in any CLR-based language. Having built that, you can now go back to C#… phew!

Now, supposing you still have that IsExceptionRecoverable method handy, you can do this:

Exceptions.TryFilterCatch(() =>
{
    new FileInfo(@"c:\myfile.txt").Delete();
},
ExceptionPolicy.IsExceptionRecoverable, x =>
{
    MessageBox.Show("Couldn't delete file: " + x.Message);
});

Given that we are effectively forced to this conclusion by the above reasoning process, it’s very strange to look at where we’ve ended up. Essentially, if you want a manageable, fault-tolerant, practical approach to exception handling, the above example is what you need to use, instead of using C# try/catch statements.

How does this situation persist? Why isn’t general exception filtering exposed in C#, as it is in VB.NET? These are very good questions. It seems that the C# team doesn’t accept the line of reasoning I gave above. According to them, we should all write long and unmanageable lists of catch handlers, explicitly listing the recoverable exceptions, based on the apparently trustworthy documentation of every software component we ever call into.

The CLR team, on the other hand, is well aware of this problem, and is aware that people are forced to try an alternative, and so inevitably make their way down the chain of reasoning I’ve described here, and the Microsoft Enterprise Library is probably representative of where most people stop in that chain.

The earliest reference I’ve found for all this stuff is here. It covers all the technical details, and mentions the idea of exposing VB.NET’s filtering capability.

This is the blog post that Andrew Pardoe pointed me to. The more I think about it, the more I think it’s a message to the C# team, rather than to users. The chances of many users hearing the message is low, and the chances of them understanding the implications is probably a little lower still. They must be hoping that the C# team will hear it and understand it. I guess C# is a customer of CLR, so the C# team is “always right” in that relationship. Consequently the message will need to come from customers of the C# team!

In the meantime, the CLR team is doing something about this problem:

In version 4 of the CLR, the product team is making exceptions that indicate a corrupted process state distinct from all other exceptions.

This is only part of the solution, of course, because in some situations our own exception types can indicate that the program is in an invalid state. The “corruption” is not of the low-level, random byte trashing kind, but it is corruption by a bug, nonetheless. So we need the recoverability criterion to be specified in a customizable way for each try/catch situation.

This means that, even after version 4 of the CLR, we will still need that snippet of VB code. Until the C# team catch-es up with the VB.NET team.

No comments: