Saturday, November 14, 2015

A data flow helper class

One problem I encounter when processing lists is exception handling. I prefer to write code that "chains" calls transforming the data:

  var results = list
    .Select(DoThing1)
    .Select(DoThing2)
    // ...
    .Select(DoThingN)
    .ToList();

The problem with something like this is that, if any of the calls throws an exception, processing stops for the whole list. Handling that requires that I move the "chain" to a new method and handle exceptions there:

  var results = list.Select(InnerMethod).ToList();

  // ...
  private ResultN InnerMethod(Input input)
  {
    try
    {
      var r1 = DoThing1(input);
      var r2 = DoThing2(r1);
      // ...
      var rn = DoThingN(rn_1);
      
      return rn;
    }
    catch(Exception ex)
    {
      // do something with ex, like logging
      return ?? // can't throw, I want to continue processing the rest of the list
    }
  }

Now I have two problems :) One is that the code just looks uglier, so maybe most people can ignore that. (I have OCD with regards to this - code that "looks bad" drives me nuts.) The more important issue is that I need to decide on an "empty" ResultN value to return from the inner method and then I need to be able to filter those out of the overall results. That can get ugly really quickly.

By analogy with what I read about other languages (I think GO uses this approach - I've never studied the language but I believe I first encountered the idea in some articles about it), I decided to write an "either a good value or an exception" helper class. On further reflection I changed that to a struct because I don't want to check that the value is not null. Once I thought of that I also decided that null is not a "good value", so any attempt to pass it as such will result in an ArgumentNullException instead in the "or an exception" part. I hope things will become clearer from the code:

  public struct Result<T>
  {
    public bool HasValue { get; }

    public T Value
    {
      get
      {
        if (!HasValue)
          throw Exception;

        return value;
      }
    }

    public Exception Exception => HasValue ? null : exception ?? NULL_EXCEPTION;

    public Result(T value)
    {
      // do not accept null
      if (value == null)
      {
        HasValue = false;
        this.value = default(T);
        exception = new ArgumentNullException();
      }
      else
      {
        HasValue = true;
        this.value = value;
        exception = null;
      }
    }

    public Result(Exception ex)
    {
      HasValue = false;
      value = default(T);
      exception = ex;
    }

    //

    // ReSharper disable once StaticMemberInGenericType
    private static readonly Exception NULL_EXCEPTION = new ArgumentNullException();

    private readonly T value;
    private readonly Exception exception;
  }

I also added an Apply extension method - the reason it's an extension method instead of a method in the original struct is because of the additional generic type TR; it just looked wrong there. (I did mention my OCD, right?)

  public static class ResultExtensions
  {
    public static Result<TR> Apply<T, TR>(this Result<T> it, Func<T, TR> selector)
    {
      return it.HasValue
        ? Try(() => selector(it.Value))
        : new Result<TR>(it.Exception);
    }

    public static IEnumerable<Result<TR>> Select<T, TR>(this IEnumerable<Result<T>> list, Func<T, TR> selector)
    {
      return list.Select(it => it.Apply(selector));
    }

    //

    private static Result<TR> Try<TR>(Func<TR> func)
    {
      try
      {
        return new Result<TR>(func());
      }
      catch (Exception ex)
      {
        return new Result<TR>(ex);
      }
    }
  }

While I didn't write this in a TDD fashion, I have added some asserts to a console application to make sure I got back the expected results:

  static class Program
  {
    static void Main()
    {
      var r1 = Divide(5, 2);
      Debug.Assert(Print(r1) == "HasValue = True Value = 2 Exception = ");
      var r2 = Divide(5, 0);
      Debug.Assert(Print(r2) == "HasValue = False Value = (invalid) Exception = Attempted to divide by zero.");
      var r3 = new Result<int?>(3);
      Debug.Assert(Print(r3) == "HasValue = True Value = 3 Exception = ");
      var r4 = new Result<int?>((int?) null);
      Debug.Assert(Print(r4) == "HasValue = False Value = (invalid) Exception = Value cannot be null.");
      var r5 = new Result<object>(null);
      Debug.Assert(Print(r4) == "HasValue = False Value = (invalid) Exception = Value cannot be null.");

      // using the default constructor
      var r6 = new Result<int>();
      Debug.Assert(Print(r6) == "HasValue = False Value = (invalid) Exception = Value cannot be null.");

      // trying to access the Value property without checking first will result in an exception
      try
      {
        Console.WriteLine(r5.Value);
      }
      catch
      {
        Console.WriteLine("Oops.");
      }

      // we can now chain selectors

      // case 1: all good
      var i1 = new Result<int?>(5);
      var ri1 = i1
        .Apply(it => 100 / it)
        .Apply(it => 200 / it)
        .Apply(it => 10 / it);
      Debug.Assert(Print(ri1) == "HasValue = True Value = 1 Exception = ");

      // case 2: something bad happens
      var i2 = new Result<int>(200);
      var ri2 = i2
        .Apply(it => 100 / it)
        .Apply(it => 200 / it)
        .Apply(it => 10 / it);
      Debug.Assert(Print(ri2) == "HasValue = False Value = (invalid) Exception = Attempted to divide by zero.");

      // finally, the target use case: processing a list without aborting due to exceptions
      var list1 = new List<int> { 10, 20, 0, 30, 40 };
      var list2 = list1
        .Select(it => new Result<int>(it))
        .Select(it => it / 2)
        .Select(it => 10 / it)
        .Select(it => 100 / it)
        .ToList();
      var good = list2.Where(it => it.HasValue).ToList();
      var bad = list2.Where(it => !it.HasValue).ToList();

      Debug.Assert(good.Count == 2);
      Debug.Assert(bad.Count == 3);
    }

    private static Result<int> Divide(int a, int b)
    {
      try
      {
        return new Result<int>(a / b);
      }
      catch (Exception ex)
      {
        return new Result<int>(ex);
      }
    }

    private static string Print<T>(Result<T> r)
    {
      return $"HasValue = {r.HasValue} Value = {(r.HasValue ? r.Value + "" : "(invalid)")} Exception = {r.Exception?.Message}";
    }
  }

Note that the addition of the Select extension method, I didn't have to write the last example as

      var list2 = list1
        .Select(it => new Result<int>(it))
        .Select(it => it.Apply(x => x / 2))
        .Select(it => it.Apply(x => 10 / x))
        .Select(it => it.Apply(x => 100 / x))
        .ToList();

Avoiding boilerplate code is good; so is the fact that the inner lambda doesn't have to know anything about the Result<T> type and yet, any crash in it doesn't abort processing the entire list.