Oct 26, 2013

Closures in C# - Answer


In the previous post I've asked what would be the output of a simple C# application:

using System;
using System.Collections.Generic;
using System.Linq;

class Program
{
   private static void Main(string[] args)
   {
      foreach (var func in GetFuncs1())
      {
         Console.WriteLine("[first] {0}", func());
      }

      foreach (var func in GetFuncs2())
      {
         Console.WriteLine("[second] {0}", func());
      }
   }

   private static List<Func<int>> GetFuncs1()
   {
      var fs = new List<Func<int>>();

      for (int i = 0; i < 3; i++)
      {
         fs.Add(() => i);
      }
      return fs;
   }

   private static List<Func<int>> GetFuncs2()
   {
      var fs = new List<Func<int>>();

      foreach (var i in Enumerable.Range(0, 3))
      {
         fs.Add(() => i);
      }
      return fs;
    }  
}
The correct answer is: it depends!

In order to understand the issue you need to undertand Closures and captured variables in C# (for a deeper understanding about these subjects I recomed reading here and here).

Basically in both methods (GetFuncs1 and 2) we are creating lambda expressions that references a local variable (i); since the CLR garantees that the memory allocated for such variable will be reclaimed as soon as it is not referenced, i.e, long before the usage of the lambda expression, the C# compiler "captures" the variable (if you are curious about how it does it, open the assembly using ILDASM or ILSpy to inspect the IL code, I do recommend) and here start our issues.

As I said the compiler captures the variable, not its value; so, calling GetFuncs1() produces code that is equivalent to something like (see the highlighted lines 20, 25-28 and 29) (the actual code is quite different but for our purposes the following code can be seen as equivalent):

using System;
using System.Collections.Generic;

class Program
{
 private static void Main(string[] args)
 {
  foreach (var func in GetFuncs1())
  {
   Console.WriteLine("[first] {0}", func());
  }
 }

 private static List<Func<int>> GetFuncs1()
 {
  var fs = new List<Func<int>>();

  for (ret = 0; ret < 3; ret++)
  {
   fs.Add(ret_func);
  }
  return fs;
 }

 private static int ret_func()
 {
  return ret;
 }

 private static int ret;
}
That means that we are always calling ret_func() and this function will return the value of ret which is updated in the for loop!

Ok, so now you understand why the output was 3,3,3 but two questions come up (at least):

  1. Is there a way to avoid this behavior so the output would be 0, 1 and 2 ?
  2. Why the hell I said that the output of the program "depends" ? And more importantly, depends on what?
The answer for the first question is yes; simply declare a new variable inside the for loop and assign "i" to it (line 20), then return the value of this new variable from within the lambda expression (line 21) :
using System;
using System.Collections.Generic;

class Program
{
 private static void Main(string[] args)
 {
  foreach (var func in GetFuncs1())
  {
   Console.WriteLine("[first] {0}", func());
  }
 }

 private static List<Func<int>> GetFuncs1()
 {
  var fs = new List<Func<int>>();

  for (int i = 0; i < 3; i++)
  {
   var capture = i;
   fs.Add(() => capture);
  }
  return fs;
 }
}
Running the application now produces the expected result. What changed is that now, since we are creating a new variable in every iteration, the compiler will emit code equivalent to:
using System;
using System.Collections.Generic;

class Program
{
 private static void Main(string[] args)
 {
  foreach (var func in GetFuncs1())
  {
   Console.WriteLine("[first] {0}", func());
  }
 }

 private static List<Func<int>> GetFuncs1()
 {
  var fs = new List<Func<int>>();

  for (int i = 0; i < 3; i++)
  {
   var h = new Holder {value = i};
   fs.Add(h.GetIt);
  }
  return fs;
 }

 class Holder
 {
  public int value;

  public int GetIt()
  {
   return value;
  }
 }
}
Again, now it became pretty easy to understand why we get the expected output; in each iteration the compiler is simply instantiating a new object to hold the value of "i" at that point in time.

Ok, so from now on, everytime you create lambda expressions / anonymous functions that capture local variables I am sure you'll pay attention and make sure you get the behaviour you want.

Regarding the second point, i.e, the "depends" part, if this blog happens to have more than a few readers I am sure that, when running the program, some of you got "unexpected" different outputs twice (i.e, 3,3,3 and 2,2,2) while some of you got 3,3,3 and 0,1,2!

The difference? The version of the C# compiler used!

See the picture below:

It happens that in C# 5 the behavior for captured variables of a foreach loop has changed! In this case the compiler will emit code as if in each iteration a new local variable were allocated (the same trick I showed you before); if you followed me you now understand why when compiling with VS 2012 the code produced the expected result.

Now the 1 million dollars question: why on earth the version that uses a simple for has not been updated with the same behavior? Well, I don't have (and I don't pretend to have) the definitive answer. What I can say is that IMHO it is at least "confusing". New developers (which are most likely unaware of this behavior) will, invariably, be caught by this at least once (but it can be twice)!

If these new developers happen to first use a for loop (always with lambdas / anonymous methods capturing local variables) they may find themselves wondering why their lambdas/anonymous methods are observing the updated value of a captured variable and then, when they finally realize what's going on, they will tend to use the "declare a local variable in each iteration" trick everywhere (including in foreach loops). By the other hand if these developers happen to first use a foreach loop then they will most likely find themselves wondering why they observe different behaviors when using lambdas/anonymous methods!

Even though I understand the reasoning in this discussion I don't fully agree - parts of my brain still thinks this simply introduces inconsitencies - but this is just my opnion.

If you are interested in reading about this topic in the C# language specification, it can be found in chapter 8.8.4.

What do you think?

No comments: