Saturday, August 20, 2011

Random N out of M

Someone asked about randomly picking N elements out of an array of M, and I remembered reading about one way of doing that, so here it is:

    private static IEnumerable<int> Generate(Random rnd, int n, int m)
    {
      var i = 0;
      while (n > 0)
      {
        var r = rnd.NextDouble();
        if (r < (double) n / m)
        {
          yield return i;
          n--;
        }

        m--;
        i++;
      }
    }


The above C# method assumes 0 < N <= M and will return an IEnumerable<int> containing exactly N numbers in the 0 .. M - 1 interval. This method can even be used to pick items out of a stream without waiting for the whole stream to be produced first.

A simpler method would be to just do this:

    for (var i = 0; i < n; i++)
    {
      do {
        var next = rnd.Next(m);
      while (!alreadyGenerated(next));
      yield return next;
    }
but this one requires 1) saving all numbers (to check if one was already generated) and 2) having the actual data in an array instead of a stream (since the indexes are not generated in ascending order).

Edit: Per the observation from the first comment, the first fragment will never loop more than M times - an O(M) algorithm. The second fragment risks an O(M^2) if N is close to M.