Thursday, June 13, 2013

Fail-over algorithm

oDesk and similar sites are an interesting source of programming ideas - there are a couple of job descriptions I've seen that I have found interesting. One of them went something like this (paraphrasing):

I need to continuously get data from a site and write it to a database. They have a main API that I coded for, but it fails sometimes. There is a secondary API and I coded for that too, but the first one fails unexpectedly and I haven't yet figured out all cases. Plus, the secondary one can fail too, I want to be informed if that happens.

Expressing this requirement in a different way, and assuming that the "want to be informed" part can also fail, the algorithm would look something like this:

  • try the first API
  • if that fails, log and try the second API
  • if that fails, log and inform the client
  • if that fails, log and write to an alert file
  • if that fails… do nothing, it's more important for the program to keep running

"Challenge accepted," as they say on the internet, even though the above way of putting the problem makes the solution embarrassingly easy.

First acceptance test

As usual, I start by writing an acceptance test. It should show a minimally functioning system — in my case, I want the first call to fail and the second one to succeed. This will allow me to verify both the fail-over mechanism and the logging. Here is the AcceptanceTests class:

  [TestClass]
  public class AcceptanceTests
  {
    [TestMethod]
    public void TestFailoverMechanism()
    {
      var stream = new MemoryStream();
      var logger = new StreamLogger(stream);
      logger.SystemClock = () => new DateTime(2000, 1, 1, 14, 15, 16);
      var guard = new FailGuard(logger);

      var success = false;
      guard.Attempt(
        () => { throw new Exception("Call failed."); },
        () => { success = true; });

      Assert.IsTrue(success);
      var lines = GetLines(stream);
      Assert.AreEqual("[2000.01.01 14:15:16 E] Call failed.", lines[0]);
      Assert.AreEqual("[2000.01.01 14:15:16 D] Success.", lines[1]);
    }

    //

    private static string[] GetLines(Stream stream)
    {
      stream.Seek(0, SeekOrigin.Begin);
      var reader = new StreamReader(stream);

      return reader
        .ReadToEnd()
        .Split(new[] { Environment.NewLine }, StringSplitOptions.None);
    }
  }

To explain the test: I want to test the system, not just the fail-over mechanism, which means I need to test the logger too. I want to verify that the logger writes the current date/time correctly, which means I need to force it to read a specific date/time. I could have just used DateTime.Now and hoped that the whole thing would take less than a second – which would be a reasonable assumption – but I think that's a bad habit to get into. The logger component has a dependency on the current date/time, that dependency should be exposed.

I will start by making the whole thing compile. Add the following to the library project, each interface / class in its own file of course. (I see that I haven't mentioned this — I split the solution into a library project called Failover.Library and a tests project called Failover.Tests; I have also added the Moq package, my favorite mocking library, to the tests project.)

  public interface Logger
  {
  }
  public class StreamLogger : Logger
  {
    public Func<DateTime> SystemClock = () => DateTime.Now;

    public StreamLogger(Stream stream)
    {
      //
    }
  }
  public class FailGuard
  {
    public FailGuard(Logger logger)
    {
      //
    }

    public void Attempt(params Action[] actions)
    {
      //
    }
  }

The project compiles and the test fails. All is as it should be.

First unit test

Not that I think the FailGuard class will be complicated, but the StreamLogger class should be really easy so that's what I'll start with. The acceptance test indicates that the StreamLogger class can log both regular messages and exceptions - they will have a "D" and an "E", respectively, in the line prefix part. I will start with the "D":

  [TestClass]
  public class StreamLoggerTests
  {
    [TestClass]
    public class Debug : StreamLoggerTests
    {
      [TestMethod]
      public void WritesTheGivenMessage()
      {
        const string MESSAGE = "abc123";

        var stream = new MemoryStream();
        var sut = new StreamLogger(stream);
        sut.SystemClock = () => new DateTime(2000, 1, 2, 3, 4, 5);

        sut.Debug(MESSAGE);

        var lines = GetLines(stream);
        Assert.IsTrue(lines[0].EndsWith(MESSAGE));
      }
    }

    //

    private static string[] GetLines(Stream stream)
    {
      stream.Seek(0, SeekOrigin.Begin);
      var reader = new StreamReader(stream);

      return reader
        .ReadToEnd()
        .Split(new[] { Environment.NewLine }, StringSplitOptions.None);
    }
  }

As you can see, I follow a number of conventions:

  • a test class for each class being tested, containing
  • a test class (inheriting from the one surrounding it) per method
  • the instance being tested is called sut, from "system under test"

To make this compile, the StreamLogger class needs a Debug method:

    public void Debug(string message)
    {
      //
    }

Everything compiles and the test fails. Good! Let's make it pass:

  public class StreamLogger : Logger
  {
    public Func<DateTime> SystemClock = () => DateTime.Now;

    public StreamLogger(Stream stream)
    {
      writer = new StreamWriter(stream);
    }

    public void Debug(string message)
    {
      writer.WriteLine(message);
    }

    //

    private readonly StreamWriter writer;
  }

The test… fails? (This stumped me for a few seconds.) Ah… I forgot; the streams need to be flushed in order for the write to actually happen. I could add a call to the Flush method after each write but that's error-prone. Instead, change the constructor of the StreamLogger class to:

    public StreamLogger(Stream stream)
    {
      writer = new StreamWriter(stream) { AutoFlush = true };
    }

The unit test passes. Great!

Another convention I'm using is that I put all the private members after the public ones, separated by an empty comment line. I just don't like the common convention that the first thing you see when looking at a class are the parts that are supposed to be hidden.

Duplication is bad

Ok… before rushing to do anything else, there's ugly duplication in the tests: the GetLines method. It needs to be moved to a helper class (add this to the test project, not the library):

  public static class Helper
  {
    public static string[] GetLines(this Stream stream)
    {
      stream.Seek(0, SeekOrigin.Begin);
      var reader = new StreamReader(stream);

      return reader
        .ReadToEnd()
        .Split(new[] { Environment.NewLine }, StringSplitOptions.None);
    }
  }

As you can see, the GetLines is an extension method, which means the tests change to:

  [TestClass]
  public class StreamLoggerTests
  {
    [TestClass]
    public class Debug : StreamLoggerTests
    {
      [TestMethod]
      public void WritesTheGivenMessage()
      {
        const string MESSAGE = "abc123";

        var stream = new MemoryStream();
        var sut = new StreamLogger(stream);
        sut.SystemClock = () => new DateTime(2000, 1, 2, 3, 4, 5);

        sut.Debug(MESSAGE);

        var lines = stream.GetLines();
        Assert.IsTrue(lines[0].EndsWith(MESSAGE));
      }
    }
  }

and

  [TestClass]
  public class AcceptanceTests
  {
    [TestMethod]
    public void TestFailoverMechanism()
    {
      var stream = new MemoryStream();
      var logger = new StreamLogger(stream);
      logger.SystemClock = () => new DateTime(2000, 1, 1, 14, 15, 16);
      var guard = new FailGuard(logger);

      var success = false;
      guard.Attempt(
        () => { throw new Exception("Call failed."); },
        () => { success = true; });

      Assert.IsTrue(success);
      var lines = stream.GetLines();
      Assert.AreEqual("[2000.01.01 14:15:16 E] Call failed.", lines[0]);
      Assert.AreEqual("[2000.01.01 14:15:16 D] Success.", lines[1]);
    }
  }

The unit test is still passing, the acceptance test still failing - all is good.

Second unit test

The Debug method needs to also write the current date/time; add a new test to the StreamLoggerTests.Debug class:


      [TestMethod]
      public void WritesTheCurrentDateTime()
      {
        var stream = new MemoryStream();
        var sut = new StreamLogger(stream);
        sut.SystemClock = () => new DateTime(2000, 1, 2, 3, 4, 5);

        sut.Debug("");

        var lines = stream.GetLines();
        Assert.IsTrue(lines[0].StartsWith("[2000.01.02 03:04:05"));
      }

(Since I don't care about the message in this test, I just sent an empty string.)

The test fails (as it should). To make it pass, change the StreamLogger.Debug method to:

    public void Debug(string message)
    {
      writer.WriteLine("[{0:yyyy.MM.dd hh:mm:ss}] {1}", SystemClock.Invoke(), message);
    }

The test passes. Almost there!

Duplication is really bad

There is some duplication in the StreamLoggerTests class. Let's get rid of it:

  [TestClass]
  public class StreamLoggerTests
  {
    [TestInitialize]
    public void SetUp()
    {
      stream = new MemoryStream();
      sut = new StreamLogger(stream) { SystemClock = () => new DateTime(2000, 1, 2, 3, 4, 5) };
    }

    [TestClass]
    public class Debug : StreamLoggerTests
    {
      [TestMethod]
      public void WritesTheGivenMessage()
      {
        const string MESSAGE = "abc123";

        sut.Debug(MESSAGE);

        var lines = stream.GetLines();
        Assert.IsTrue(lines[0].EndsWith(MESSAGE));
      }

      [TestMethod]
      public void WritesTheCurrentDateTime()
      {
        sut.Debug("");

        var lines = stream.GetLines();
        Assert.IsTrue(lines[0].StartsWith("[2000.01.02 03:04:05"));
      }
    }

    //

    private Stream stream;
    private StreamLogger sut;
  }

The tests are still passing. (Always check this - refactoring can be tricky and can create bugs.)

Final test for the Debug method

The final test for the Debug method is to verify that the "D" gets written:

      [TestMethod]
      public void Writes_D_AtTheEndOfThePrefix()
      {
        sut.Debug("");

        var lines = stream.GetLines();
        Assert.AreEqual("D]", lines[0].Substring(21, 2));
      }

The test fails. Making it pass is trivial:

    public void Debug(string message)
    {
      writer.WriteLine("[{0:yyyy.MM.dd hh:mm:ss} D] {1}", SystemClock.Invoke(), message);
    }

Nothing to refactor. The code is extremely simple and it works correctly.

Logging errors

Add the following inside the StreamLoggerTests class:

    [TestClass]
    public class Error: StreamLoggerTests
    {
      [TestMethod]
      public void WritesTheGivenMessage()
      {
        const string MESSAGE = "abc123";

        sut.Error(new Exception(MESSAGE));

        var lines = stream.GetLines();
        Assert.IsTrue(lines[0].EndsWith(MESSAGE));
      }
    }

This needs a new method in the StreamLogger class:

    public void Error(Exception exception)
    {
      //
    }

The test fails; to make it pass, change this method to:

    public void Error(Exception exception)
    {
      writer.WriteLine(exception.Message);
    }

(Note that I do not rush to implement the entire functionality, even though I know what I'm supposed to end with. It is better to proceed in small steps and make sure everything works before starting on the next one.)

The second test is almost identical to the one for the Debug method:

      [TestMethod]
      public void WritesTheCurrentDateTime()
      {
        sut.Error(new Exception(""));

        var lines = stream.GetLines();
        Assert.IsTrue(lines[0].StartsWith("[2000.01.02 03:04:05"));
      }

It fails, so I change the method to:

    public void Error(Exception exception)
    {
      writer.WriteLine("[{0:yyyy.MM.dd hh:mm:ss}] {1}", SystemClock.Invoke(), exception.Message);
    }

Finally, the third test:

      [TestMethod]
      public void Writes_E_AtTheEndOfThePrefix()
      {
        sut.Error(new Exception(""));

        var lines = stream.GetLines();
        Assert.AreEqual("E]", lines[0].Substring(21, 2));
      }

and the matching code change (but only after verifying that the test fails):

    public void Error(Exception exception)
    {
      writer.WriteLine("[{0:yyyy.MM.dd hh:mm:ss} E] {1}", SystemClock.Invoke(), exception.Message);
    }

Duplication, duplication, duplication

The StreamLogger class is begging for a refactoring:

  public class StreamLogger : Logger
  {
    public Func<DateTime> SystemClock = () => DateTime.Now;

    public StreamLogger(Stream stream)
    {
      writer = new StreamWriter(stream) { AutoFlush = true };
    }

    public void Debug(string message)
    {
      WriteMessage("D", message);
    }

    public void Error(Exception exception)
    {
      WriteMessage("E", exception.Message);
    }

    //

    private readonly StreamWriter writer;

    private void WriteMessage(string messageType, string message)
    {
      writer.WriteLine("[{0:yyyy.MM.dd hh:mm:ss} {1}] {2}", SystemClock.Invoke(), messageType, message);
    }
  }

The tests pass; the code is crystal clear; all is good. Whew!

The FailGuard class

We're now getting to the main reason for this article: the FailGuard class. Let's write a test to verify that the Attempt method runs a single action:

  [TestClass]
  public class FailGuardTests
  {
    [TestClass]
    public class Attempt : FailGuardTests
    {
      [TestMethod]
      public void RunsSingleAction()
      {
        var logger = new Mock<Logger>();
        var sut = new FailGuard(logger.Object);
        var success = false;

        sut.Attempt(() => { success = true; });

        Assert.IsTrue(success);
      }
    }
  }

As before, the encompassing FailGuardTests class contains an Attempt class, which in turn contains the first test — namely the RunsSingleAction method. The test fails, so to make it pass change the FailGuard.Attempt method to:

    public void Attempt(params Action[] actions)
    {
      actions[0].Invoke();
    }

This immediately suggests another test, which I will write after verifying that the first one passes — what happens if there are no actions?

      [TestMethod]
      public void DoesNothingIfThereAreNoActions()
      {
        var logger = new Mock<Logger>();
        var sut = new FailGuard(logger.Object);

        sut.Attempt();

        // success - did not crash
      }

The test crashes, as expected, so I'll change the production code to:

    public void Attempt(params Action[] actions)
    {
      if (!actions.Any())
        return;

      actions[0].Invoke();
    }

Both unit tests pass; time to refactor:

  [TestClass]
  public class FailGuardTests
  {
    [TestInitialize]
    public void SetUp()
    {
      logger = new Mock<Logger>();
      sut = new FailGuard(logger.Object);
    }

    [TestClass]
    public class Attempt : FailGuardTests
    {
      [TestMethod]
      public void RunsSingleAction()
      {
        var success = false;

        sut.Attempt(() => { success = true; });

        Assert.IsTrue(success);
      }

      [TestMethod]
      public void DoesNothingIfThereAreNoActions()
      {
        sut.Attempt();

        // success - did not crash
      }
    }

    //

    private Mock<Logger> logger;
    private FailGuard sut;
  }

All good.

Fail-over mechanism

We've finally reached the intended functionality: the fail-over mechanism. Let's write a unit test for that:

      [TestMethod]
      public void ContinuesToSecondActionIfTheFirstOneFails()
      {
        var success = false;

        sut.Attempt(
          () => { throw new Exception(); },
          () => { success = true; });

        Assert.IsTrue(success);
      }

This fails - the Attempt method does not guard against the exception. Let's fix that:

    public void Attempt(params Action[] actions)
    {
      if (!actions.Any())
        return;

      try
      {
        actions[0].Invoke();
      }
      catch (Exception)
      {
        actions[1].Invoke();
      }
    }

Note that I'm always doing "the simplest thing that could possibly work". Yes, I could go ahead and write the whole logic but then I wouldn't have any tests to guard against bugs — and if I tried to add them after the code they would tend to use the knowledge I have about the implementation. It can work, but it's dangerous. Let's stick to the TDD method.

Second note: the acceptance test is now failing in a different place. Progress!

Of course, now that I've made the test pass I can guard against a problem: what happens if the (only) action fails? I'll make a note to come back to that later, but I've neglected logging and I want to take care of it first.

Logging

Start with the logging for the "single action working" case:

      [TestMethod]
      public void LogsSuccessfulSingleAction()
      {
        sut.Attempt(() => { });

        logger.Verify(it => it.Debug("Success."));
      }

This… does not compile, because I forgot to add the Debug and Error methods to the Logger interface. Oops:

  public interface Logger
  {
    void Debug(string message);
    void Error(Exception exception);
  }

Better. The test fails, so I'm changing the FailGuard class to:

  public class FailGuard
  {
    public FailGuard(Logger logger)
    {
      this.logger = logger;
    }

    public void Attempt(params Action[] actions)
    {
      if (!actions.Any())
        return;

      try
      {
        actions[0].Invoke();
        logger.Debug("Success.");
      }
      catch (Exception)
      {
        actions[1].Invoke();
      }
    }

    //

    private readonly Logger logger;
  }

Yay!

Of course, the logger needs to be informed of errors too:

      [TestMethod]
      public void LogsError()
      {
        const string MESSAGE = "abc";

        sut.Attempt(
          () => { throw new Exception(MESSAGE); },
          () => { });

        logger.Verify(it => it.Error(It.Is<Exception>(e => e.Message == MESSAGE)));
      }

To make it pass, I change the production code to:

    public void Attempt(params Action[] actions)
    {
      if (!actions.Any())
        return;

      try
      {
        actions[0].Invoke();
        logger.Debug("Success.");
      }
      catch (Exception ex)
      {
        logger.Error(ex);
        actions[1].Invoke();
      }
    }

That passes. Unfortunately, the acceptance test now breaks in an interesting way:

Assert.AreEqual failed. Expected:<14:15:16 E Call failed.>. Actual:<02:15:16 E Call failed.>.

Oops again? I've been bitten by the time formatting string specifier. Easy to fix by changing the StreamLogger.WriteMessage method to:

    private void WriteMessage(string messageType, string message)
    {
      writer.WriteLine("[{0:yyyy.MM.dd HH:mm:ss} {1}] {2}", SystemClock.Invoke(), messageType, message);
    }

Good.

Fixing the algorithm

You will note that my last test (the FailGuardTests.LogsError method) has an additional empty action. That's because without it the Attempt method would fail - it tries to access an inexistent second action.

I'll modify the LogsError test to show the problem:

      [TestMethod]
      public void LogsError()
      {
        const string MESSAGE = "abc";

        sut.Attempt(() => { throw new Exception(MESSAGE); });

        logger.Verify(it => it.Error(It.Is<Exception>(e => e.Message == MESSAGE)));
      }

Fixing it is again easy:

    public void Attempt(params Action[] actions)
    {
      if (!actions.Any())
        return;

      try
      {
        actions[0].Invoke();
        logger.Debug("Success.");
      }
      catch (Exception ex)
      {
        logger.Error(ex);

        if (actions.Length > 1)
          actions[1].Invoke();
      }
    }

Of course, this is starting to become ugly. I need a test with several actions. In fact, I am going to write two tests - one where the last action succeeds and one where it fails; both of them will check the logs. I will start with the first one:

      [TestMethod]
      public void LogsErrorsAndThenSuccess()
      {
        var sequence = new MockSequence();
        logger.InSequence(sequence).Setup(it => it.Error((It.Is<Exception>(e => e.Message == "1")))).Verifiable();
        logger.InSequence(sequence).Setup(it => it.Error((It.Is<Exception>(e => e.Message == "2")))).Verifiable();
        logger.InSequence(sequence).Setup(it => it.Error((It.Is<Exception>(e => e.Message == "3")))).Verifiable();
        logger.InSequence(sequence).Setup(it => it.Debug("Success")).Verifiable();
        
        sut.Attempt(
          () => { throw new Exception("1"); },
          () => { throw new Exception("2"); },
          () => { throw new Exception("3"); },
          () => { });

        logger.Verify();
      }

This fails, which forces me to write the proper code in the Attempt method. (Note that "the simplest thing that could possibly work" now is not to keep adding ifs and special cases for the first 3 or 4 actions.)

    public void Attempt(params Action[] actions)
    {
      foreach (var action in actions)
      {
        try
        {
          action.Invoke();
          logger.Debug("Success.");
          break;
        }
        catch (Exception ex)
        {
          logger.Error(ex);
        }
      }
    }

(I have removed the initial Any() call because it was no longer needed - the foreach will just not run if there are no actions.)

Not only is this making the unit test pass, but the acceptance test too! We're basically done, except I want to make sure and I write the final unit test, where all the actions fail:

      [TestMethod]
      public void LogsOnlyErrors()
      {
        var sequence = new MockSequence();
        logger.InSequence(sequence).Setup(it => it.Error((It.Is<Exception>(e => e.Message == "1")))).Verifiable();
        logger.InSequence(sequence).Setup(it => it.Error((It.Is<Exception>(e => e.Message == "2")))).Verifiable();
        logger.InSequence(sequence).Setup(it => it.Error((It.Is<Exception>(e => e.Message == "3")))).Verifiable();

        sut.Attempt(
          () => { throw new Exception("1"); },
          () => { throw new Exception("2"); },
          () => { throw new Exception("3"); });

        logger.Verify();
      }

To paraphrase a common proverb, this too has passed. We're done!

Conclusion

The production classes (and interface) look like this:

  public class FailGuard
  {
    public FailGuard(Logger logger)
    {
      this.logger = logger;
    }

    public void Attempt(params Action[] actions)
    {
      foreach (var action in actions)
      {
        try
        {
          action.Invoke();
          logger.Debug("Success.");
          break;
        }
        catch (Exception ex)
        {
          logger.Error(ex);
        }
      }
    }

    //

    private readonly Logger logger;
  }
  public interface Logger
  {
    void Debug(string message);
    void Error(Exception exception);
  }
  public class StreamLogger : Logger
  {
    public Func<DateTime> SystemClock = () => DateTime.Now;

    public StreamLogger(Stream stream)
    {
      writer = new StreamWriter(stream) { AutoFlush = true };
    }

    public void Debug(string message)
    {
      WriteMessage("D", message);
    }

    public void Error(Exception exception)
    {
      WriteMessage("E", exception.Message);
    }

    //

    private readonly StreamWriter writer;

    private void WriteMessage(string messageType, string message)
    {
      writer.WriteLine("[{0:yyyy.MM.dd HH:mm:ss} {1}] {2}", SystemClock.Invoke(), messageType, message);
    }
  }

You might say that they are too simple. Agreed… except, I have seen production code with simpler requirements that was five times their size and an incredibly convoluted mess. These classes are fairly reusable and easy to understand and maintain — and we have the tests to prove that they work as expected. Not bad for a couple of hours of coding.

Edit: The source code is available on GitHub.

No comments: