Friday, August 30, 2013

Obfuscation for its own sake

Wow... I just discovered Twitter Widgets (I needed them for a project I'm working on). When you create them, you get a code fragment to paste into your site to include the widget. Here's mine:

<a class="twitter-timeline" href="https://twitter.com/mdpopescu" data-widget-id="...">Tweets by @mdpopescu</a>
<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+"://platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs");</script>

Look at that piece of Javascript. Marvel at its obscurity. Find out it can be replaced by:

<script id="twitter-wjs" src="//platform.twitter.com/widgets.js"></script>

I mean, seriously?

Minor nitpick: yes, that javascript code will not re-create the script tag if the id has already been defined. However, I think the proper place for that is inside the widgets.js file... which they already do. Yay.

Thursday, August 29, 2013

Duck typing with EF entities

Ok, this might be old news for you, but for me it's a neat trick that I just figured out.

Any time I work on a personal project involving a database, I prefer to use Entity Framework Code First (EF from now). It just lets me do what I want (write code) instead of stuff I hate (dragging things around in a badly implemented "graphical editor"). One of the many things I absolutely love about it is that I can use POCOs - Plain Old C# Objects - as database entities. I don't have to inherit specific classes, not even specific interfaces.

I do have a few rules when writing these classes, though:

  • All entities have a primary key called Id.
  • The Id is normally an int (it will be automatically converted to an auto-incrementing field); in some special cases (e.g., for security purposes) the Id might be a Guid instead.
  • Any foreign key gets written as a <name>Id field for the foreign key itself and, most of the time, a virtual <name>(s) property for the actual record(s) - depending on whether the relationship is 1:1 or 1:many.

Here's an example from my Inventory project up on GitHub:

  public class SaleItem
  {
    public int Id { get; set; }
    public int SaleId { get; set; }
    public int ProductId { get; set; }
    public decimal Quantity { get; set; }
    public decimal Price { get; set; }

    public virtual Sale Sale { get; set; }
    public virtual Product Product { get; set; }
  }

You can see here the primary key Id and the two foreign keys SaleId and ProductId, with the matching Sale and Product properties for the lazy loading of the actual records. (EF wants them virtual so that it can intercept the accesses and make lazy loading possible.)

So far so good. So what is the trick? Well, there's a problem with not inheriting from a common root (as you can see from the SaleItem class above): it makes it difficult to write generic code. For example, I needed a method that could go through a list of either sale or acquisition items (two distinct entities and therefore distinct classes) and select the stocks with the same products as the list items. Something like this:

    private static IEnumerable<Stock> GetStocks<T>(IEnumerable<Stock> stocks, IEnumerable<T> items)
    {
      var productIds = items
        .Select(it => it.ProductId)
        .ToList();

      return stocks
        .Where(it => productIds.Contains(it.ProductId))
        .ToList();
    }

The problem is that this code won't compile; C# doesn't like the it.ProductId expression.

The way I solved this initially was to create a common Item interface with a ProductId property and change the SaleItem and AcquisitionItem classes to implement that interface. And sure, if you can do that then that works (and is probably faster than my trick). However, that's not always possible. So what I came up with instead (and I am absolutely not claiming this is my invention - it is more than likely that I saw this somewhere and it just took a while for it to bubble up from the deep dark hurricane that is my mind) was this:

    private static IEnumerable<Stock> GetStocks<T>(IEnumerable<Stock> stocks, IEnumerable<T> items)
    {
      var productIds = items
        .Select(it => ((dynamic) it).ProductId)
        .ToList();

      return stocks
        .Where(it => productIds.Contains(it.ProductId))
        .ToList();
    }

The money shot is in this expression:

((dynamic) it).ProductId

What it does is, basically, telling the compiler "I know that the object has a property called ProductId, go ahead and retrieve that for me". Since the objects do have that property, nothing blows up at run-time. That means I can keep my entities as unrelated as I want, with absolutely no need to inherit from a common class or even an interface. Of course, in this case both entities need to have a ProductId property, but that's a convention I'm already following (and one that, worst come to worst, I can work around by using the ?: operator to decide which property to access based on the actual type of the it entity).

Conclusion: use POCO entities with EF Code First and use the dynamic keyword instead of forcing the entity classes to inherit from a common ancestor.

Wednesday, August 28, 2013

The visitors

The world's observatories spotted them on August 27, 2014. They came from somewhere in the Orion constellation and took their sweet time doing it: it was more than three months before they became visible enough that the amateur astronomers were beginning to spot them.

The White House made an official announcement on December 1st. It didn't say that much, but the news killed all other stories. The Pentagon saw the biggest increase in military spending since 9/11, with a tiny part of that fortune going to NASA and the Pasadena and Houston observatories.

The New Year came to a rather anxious world; most of the panic had subsided by then, with barely a couple of street preachers warning everyone of the impending apocalypse. Gun sales were at an all-time high, not so much for protection against the aliens (nobody expected a few shotguns to help there) as against possible riots.

On March 1st, two hundred and thirty nine ships stopped at a distance of 2.5 million miles - about ten times the distance to the Moon. One ship continued on, settling in a low Earth orbit three days later.

Monday, August 26, 2013

A generic implementation of a finite state machine

A finite state machine (FSM for short) can be informally described as a network of states, like "the elevator door is open" and the connections between them. There is one state that is the start state and there might be one or more stop states (or none). A FSM has a current state, which can change as a reaction to an event.

Let me clarify things with an example: a very simple interface for a music player. This music player has two states: it can play music, or it can be silent. The starting state is the idle one:

  +------+     +---------+
  | idle |     | playing |
  +------+     +---------+

Put more formally, the network of states is a directed graph of nodes (aka states). The connections between the nodes are called edges, and they have a preferred direction - there will be FSMs where you can get from state A to state B, but not back to state A. (If you want to be able to get back to state A, simply add another directed edge from B to A.)

How does the current state change? To continue with the previous example, the state of the music player changes when you press a button. Here is an example:

  +------+ --- press Play --> +---------+
  | idle |                    | playing |
  +------+ <-- press Stop --- +---------+

The above schema is a compact representation of the following rules:

  • if the current state is idle and the Play button is pressed, change the state to playing
  • if the current state is playing and the Stop button is pressed, change the state to idle

Of course, just changing the state of the player doesn't help that much; we need to actually start (or stop) playing the music. A FSM helps by having actions associated with each state change:

             [start playing]
  +------+ --- press Play --> +---------+
  | idle |                    | playing |
  +------+ <-- press Stop --- +---------+
             [stop playing]

This would be the equivalent of

  • if the current state is idle and the Play button is pressed, start playing; change the state to playing
  • if the current state is playing and the Stop button is pressed, stop playing; change the state to idle

Each event can have zero or more actions associated with it; very importantly, an event doesn't have to change the current state. For example, what should happen when the player is idle and you press stop? Nothing; the current state remains idle.

                         [start playing]
              +------+ --- Play --> +---------+ --- Play --+
 +----------> | idle |              | playing | <----------+
 +-- Stop --- +------+ <-- Stop --- +---------+
                         [stop playing]

Note that, in the previous version of the FSM, pressing Play while playing (or Stop while not playing) would have been an error; the implementation I'm going to write would have thrown an exception in that case. (I will make that a boolean property, but I believe I would keep that active, as the alternative is to simply "eat" unexpected events without a trace. I think that would lead to nasty, hard-to-trace bugs; if you truly don't care about unknown events for a given state, just add a default handler that does nothing and leaves the current state unchanged.)

That reminds me: in addition to the outgoing edges associated with an event, a state can also have a default event handler, that is activated if an event occurs but it doesn't match any of the others. (I will implement the event handlers as a list of predicates Predicate<T> associated with each state, each predicate determining whether a given event has occurred. In the case of the music player above, the T will be the button that was pressed, and the predicates will determine whether it was the Play or the Stop button.)

To give an example of a FSM with a genuine need for a default, do-nothing handler, consider a FSM that needs to count the number of times each letter occurs in a given string. That FSM only cares about three cases: the next character is a letter, the string has been completely parsed, and "anything else" (in which case, do nothing). Note also that a default case does not necessarily mean "do nothing": I could change the previous FSM to individually count each letter BUT also count how many non-letter characters occur overall.

Anyway. The above description should be enough to write the code for representing an edge in this graph:

  public class Edge<T>
  {
    public Predicate<T> Predicate { get; private set; }
    public Action<T> Action { get; private set; }
    public Node<T> Node { get; private set; }

    public Edge(Predicate<T> predicate, Action<T> action, Node<T> node)
    {
      Predicate = predicate;
      Action = action;
      Node = node;
    }
  }

(If you have seen my previous article you might have spotted the if possible, move logic out of code and into data mantra. An Edge object is the equivalent of a case in a switch statement; it's another way of moving logic from code to data.)

The Node class is also quite simple:

  public class Node<T>
  {
    public string State { get; private set; }

    public Node(string state)
    {
      State = state;
      edges = new List<Edge<T>>();
    }

    public void Add(Edge<T> edge)
    {
      edges.Add(edge);
    }

    public Node<T> Handle(T input)
    {
      var edge = edges.FirstOrDefault(e => e.Predicate != null && e.Predicate(input)) ??
                 edges.FirstOrDefault(e => e.Predicate == null);
      if (edge == null)
        return null;

      if (edge.Action != null)
        edge.Action(input);

      return edge.Node;
    }

    //

    private readonly List<Edge<T>> edges;
  }

Finally, here's the FSM implementation:

  public class FSM<T>
  {
    public bool IgnoreErrors { get; set; }

    public FSM(Node<T> start)
    {
      current = start;
    }

    public void Handle(T input)
    {
      var next = current.Handle(input);
      if (next != null)
        current = next;
      else if (!IgnoreErrors)
        throw new Exception(string.Format("Unknown input [{0}]", input));
    }

    //

    private Node<T> current;
  }

Here's an example of using this FSM implementation to parse a number:

    private static void Main(string[] args)
    {
      var sign = 1;
      var number = 0.0;
      var divisor = 0.0;

      var start = new Node<char>("start");
      var wholePart = new Node<char>("int part");
      var fractPart = new Node<char>("decimal part");

      start.Add(new Edge<char>(c => c == '+', null, wholePart));
      start.Add(new Edge<char>(c => c == '-', c => sign = -1, wholePart));
      start.Add(new Edge<char>(char.IsDigit, c => number = number * 10 + (c - '0'), wholePart));
      start.Add(new Edge<char>(c => c == '.', c => divisor = 0.1, fractPart));

      wholePart.Add(new Edge<char>(char.IsDigit, c => number = number * 10 + (c - '0'), wholePart));
      wholePart.Add(new Edge<char>(c => c == '.', c => divisor = 0.1, fractPart));

      fractPart.Add(new Edge<char>(char.IsDigit, c =>
      {
        number += divisor * (c - '0');
        divisor /= 10;
      }, fractPart));

      do
      {
        sign = 1;
        number = 0.0;
        divisor = 0.0;

        Console.Write("Enter a number: ");
        var s = Console.ReadLine();
        if (string.IsNullOrEmpty(s))
          break;

        var fsm = new FSM<char>(start);
        foreach (var c in s)
          fsm.Handle(c);

        Console.WriteLine("The number is " + sign * number);
      } while (true);
    }

The result of a run would look like this:

Enter a number: -.1
The number is -0.1
Enter a number: .1
The number is 0.1
Enter a number: 1.23
The number is 1.23
Enter a number: -12.34
The number is -12.34
Enter a number: +56789.0
The number is 56789
Enter a number: 1.2.3

Unhandled Exception: System.Exception: Unknown input [.]
Press any key to continue . . .

Of course, now that I am using the code, I find that it would help to simplify the syntax. One place would be in the Node class; I don't need to explicitly create the Edge instance (the Node.Add method can do that) and I can also simplify the case where the predicate simply checks for equality:

    public void Add(Predicate<T> predicate, Action<T> action, Node<T> node = null)
    {
      edges.Add(new Edge<T>(predicate, action, node ?? this));
    }

    public void Add(T value, Action<T> action, Node<T> node = null)
    {
      Add(it => it.Equals(value), action, node);
    }

This allows me to simplify the creation of the nodes in the Main method to:

      start.Add('+', null, wholePart);
      start.Add('-', c => sign = -1, wholePart);
      start.Add(char.IsDigit, c => number = number * 10 + (c - '0'), wholePart);
      start.Add('.', c => divisor = 0.1, fractPart);

      wholePart.Add(char.IsDigit, c => number = number * 10 + (c - '0'));
      wholePart.Add('.', c => divisor = 0.1, fractPart);

      fractPart.Add(char.IsDigit, c =>
      {
        number += divisor * (c - '0');
        divisor /= 10;
      });

I am thinking of making a couple more changes to this code, like a Reset method in the FSM and possibly a couple of helper methods to avoid creating explicit Node objects, something like

  fsm.Add("start", '+', null, "whole");
  fsm.Add("start", char.IsDigit, c => number * 10 + (c - '0'), "whole");
  fsm.Add("whole", '.', c => divisor = 0.1, "fract");

and so on, but all in all I am pretty happy with how the code looks right now.

Saturday, August 24, 2013

On the impossibility of time travel

I've long believed that time travel is impossible. One of the many reasons I have for it is: it violates the conservation of mass/energy (something disappears from the universe at moment A and something appears in the universe at moment B).

Recently, I've had to reconsider this problem.

I started with an observation: the conservation of mass/energy is not violated when something moves. Formalizing this statement, an object changing its X, Y and/or Z coordinates does not violate the conservation of mass/energy. Now, if I consider that the universe is actually a 4-coordinate system, it can be inferred by analogy that an object changing its X, Y, Z and/or T coordinates is not violating conservation either. (The fact that we don't currently know any such objects is irrelevant; there was a time when we didn't have airplanes or cars.)

Of course, this isn't proof, but I wasn't proving a theorem in the first case; all I had was a thought experiment, an apparent contradiction with the known laws of the universe. That contradiction is now gone.

I guess I'll have to think more about the other 100 reasons for my disbelief in time travel...

Two algorithms

Two small code fragments for problems I found interesting.

First, I had a list of criteria that were being used to decide which files in a file list to mark as selected. Some of the criteria were simple, like "none" or "all", but some of them were more complex, like "modified in the last 15 minutes". Since I strongly disliked the idea of a series of ifs (or the equivalent switch statement), I came up with the following:

    var filters = new Dictionary<string, Predicate<FileObject>>
    {
        {"None", file => false},
        {"All (*.*)", file => true},
        {"Invert Selection", file => !file.Selected},
        {"Modified Last 15min", file => file.LastWriteTime.AddMinutes(15) >= DateTime.Now},
        {"Modified Today", file => file.LastWriteTime.Date == DateTime.Now.Date},
        {"*.dll (not Telerik)", file => !file.Name.StartsWith("Telerik") && file.Name.EndsWith(".dll")},
    };

    foreach (var file in fileList)
        file.Selected = filter(file);

Of course, this is rather specific to the exact problem I had, but the idea can be used in the more general case: create one of

    Dictionary<string, Predicate<T>>
    Dictionary<string, Func<T, TResult>>
    Dictionary<string, Action<T>>

and then iterate through the list and get the result / apply the action for each item. This is much easier to understand and modify than a complicated series of ifs, if only because the tendency when modifying the ifs is to add ad-hoc code to solve the particular problem you have right now instead of trying to preserve the code structure.

The second problem I had had to do with formatting a file size to the closest approximation in bytes, KB, MB or GB. For example, a 345 byte file would return "345 bytes", while a 345,000 byte file would say "336.91 KB". (1 KB = 1024 bytes.)

This is again a problem that would normally be solved with a succession of ifs, so once again I looked for something else. (One of the things I've read and that stuck with me was: if possible, move logic out of code and into data.) Here's what I came up with:

    public static string Prettify(this long size)
    {
      var formats = new List<string>
      {
        "#,##0 bytes",
        "#,##0.## KB",
        "#,##0.## MB",
        "#,##0.## GB",
      };

      var limit = 1024L;
      var index = 0;
      while (size >= limit)
      {
        limit *= 1024L;
        index++;
      }

      var format = formats[index];
      return ((double) size / limit * 1024.0).ToString(format);
    }

The reason for the multiplication with 1024 at the end is due to the fact that the limit has already been multiplied once; what I'm doing is basically dividing it back down: size / (limit / 1024), I've just opened the parenthesis.

So, here they are: alternative algorithms for problems normally solved with multiple ifs.