Saturday, August 20, 2011

Random N out of M

Someone asked about randomly picking N elements out of an array of M, and I remembered reading about one way of doing that, so here it is:

    private static IEnumerable<int> Generate(Random rnd, int n, int m)
    {
      var i = 0;
      while (n > 0)
      {
        var r = rnd.NextDouble();
        if (r < (double) n / m)
        {
          yield return i;
          n--;
        }

        m--;
        i++;
      }
    }


The above C# method assumes 0 < N <= M and will return an IEnumerable<int> containing exactly N numbers in the 0 .. M - 1 interval. This method can even be used to pick items out of a stream without waiting for the whole stream to be produced first.

A simpler method would be to just do this:

    for (var i = 0; i < n; i++)
    {
      do {
        var next = rnd.Next(m);
      while (!alreadyGenerated(next));
      yield return next;
    }
but this one requires 1) saving all numbers (to check if one was already generated) and 2) having the actual data in an array instead of a stream (since the indexes are not generated in ascending order).

Edit: Per the observation from the first comment, the first fragment will never loop more than M times - an O(M) algorithm. The second fragment risks an O(M^2) if N is close to M.

2 comments:

beroal said...

Maybe you should say that its time complexity is O(M).

Marcel said...

Good point. The first code fragment will take no more than O(M) loops. In fact it has the useful property that the closer N is to M, the closer the number of loops approaches N. The second fragment can get to O(M^2) if N is almost equal to M.