Friday, May 13, 2016

C# math is fast

I was reading an article on neural networks that mentioned the usual sigmoid activation function when the inputs are real numbers in the [0, 1) interval:

1 / (1 + e−x)

The article mentions that this is probably where the program would spend at least half of its time so I thought "why not pre-compute a bunch of values and trade memory and precision for time"? It turns out, C# math is quite fast and the gain might not be worth it. (I haven't tested it yet with a NN.)

This is the code I wrote to benchmark the two options, using LinqPad 5:

void Main()
{
    const int STEPS = 1 * 1000 * 1000;
    
    Func<double, double> activation = x => 1.0 / (1.0 + Math.Exp(-x));
    var cache = Precompute(0.0, 1.0, STEPS, activation);
    
    Benchmark("Using the cache", x => cache[(int) Math.Truncate(x * STEPS)]);
    Benchmark("Calling the function each time", activation);
}

double[] Precompute(double lower, double upper, int steps, Func<double, double> func)
{
    var result = new List<double>();
    
    For(0.0, 1.0, 1.0 / steps, value => result.Add(func(value)));
    
    return result.ToArray();
}

void Benchmark(string message, Func<double, double> func)
{
    const int COUNT = 100 * 1000 * 1000;
    
    var dummy = 0.0;
    var ts = ComputeBenchmark(() => For(0.0, 1.0, 1.0 / COUNT, value => dummy += func(value)));
    
    Console.WriteLine(message + ": " + ts + " - sum = " + dummy);
}

void For(double lower, double upper, double increment, Action<double> action)
{
    for (var value = lower; value <= upper; value += increment)
        action(value);
}

TimeSpan ComputeBenchmark(Action action)
{
    var sw = new Stopwatch();
    sw.Start();
    
    action();
    
    sw.Stop();
    return sw.Elapsed;
}

These are the results; using the cache takes about 70% of the time required when calling the function every time, at the expense of several MB of memory and some loss of precision. In a large NN, where billions of evaluations are expected in training, the trade-off might be worth it. I don't yet have enough experience to decide.

Using the cache: 00:00:01.9850792 - sum = 62011439.025702
Calling the function each time: 00:00:02.7755272 - sum = 62011450.5899971

As a side-note, I have computed and printed the sum to avoid the compiler from optimizing away the function calls; I don't know if LinqPad would do that but I wanted to avoid it just in case.

Tuesday, April 19, 2016

A simple rules engine

I'm extracting data from some OCR'd letters and, in order to determine which type of letter I'm parsing, I'm using a method similar to this:

        public Letter Parse(string text)
        {
            Letter result;
            if (text.IndexOf("...", StringComparison.OrdinalIgnoreCase) >= 0)
                letter = new LetterA();
            else
                letter = new LetterB();

            //... additional processing
            return letter;
        }

If the letter contains a specific text, I know it's of one type; otherwise I'll default to the other type. Unfortunately that's going to get really complicated, really fast once I start adding new letter types. I read somewhere that "you should move logic out of the code and into the data when possible"; it made sense and I never had a reason to regret it. So, let me try to do that here.

First I'll add a "rules list" class that will allow me to store the various criteria:

    public class RulesList<T, TResult>
    {
        public RulesList()
        {
            rules = new List<Tuple<Predicate<T>, Func<TResult>>>();
        }

        public void Add(Predicate<T> condition, Func<TResult> constructor)
        {
            rules.Add(Tuple.Create(condition, constructor));
        }

        public TResult Get(T criteria)
        {
            return rules
                .Where(rule => rule.Item1(criteria))
                .Select(rule => rule.Item2())
                .FirstOrDefault();
        }

        //

        private readonly List<Tuple<Predicate<T>, Func<TResult>>> rules;
    }

I've made this class more generic by replacing the string type with T; honestly, I don't think there will ever be a need for anything else in this project but… it wasn't a big "expense".

Using this is quite simple at the moment:

    public class LetterSelector
    {
        public LetterSelector()
        {
            rules = new RulesList<string, Letter>();

            rules.Add(s => s.IndexOf("...", StringComparison.OrdinalIgnoreCase) >= 0, () => new LetterA());
            rules.Add(_ => true, () => new LetterB());
        }

        public Letter Parse(string text)
        {
            var letter = rules.Get(text);
            //... additional processing

            return letter;
        }

        //

        private readonly RulesList<string, Letter> rules;
    }

Is this a big gain? Right now it doesn't look like I gained anything; however, bitter experience taught me that methods with many conditionals quickly become an unmaintainable mess (you haven't lived until you've had to fix a method with 700+ lines and a cyclomatic complexity over 200). This will allow me to separate those conditionals into their own lambdas or small private methods in the LetterSelector class.

Tuesday, April 12, 2016

Crystal Reports woes

This took me an hour to figure out so I thought I'd write it down in case it helps anyone else.

If you have a form that's going to display a Crystal Report and you want to zoom it by default, the "normal" way would be to do this in form_Shown:

    private void ReportViewer_Shown(object sender, EventArgs e)
    {
        viewer.Zoom(2); // 1 = page width, 2 = whole page, 25..400 = zoom factor
    }

(Where viewer is the CrystalReportViewer component.)

Unfortunately, it takes CR a while to compute and display the actual report; by the time that happens, the .Zoom() call has already been executed (and ignored).

I have tried a number of workarounds (including launching a thread, waiting for two seconds and then calling the Zoom method - it worked but it was a horrible hack) before I discovered that CR has a "hidden" PageChanged event (it has a [Browsable(false)] attribute). Use that event by assigning a handler in the constructor:

    viewer.PageChanged += Viewer_PageChanged;

and then add the Viewer_PageChanged method:

    private void Viewer_PageChanged(object sender, EventArgs e)
    {
        viewer.Zoom(2);

        viewer.PageChanged -= Viewer_PageChanged;
    }

(Note that, in order to avoid leaking references, I have removed the PageChanged handler immediately after calling the Zoom method.)