Programing Fun: Type safe / refactor friendly configuration for Db4o

Hi, In the first installment of this series I presented the idea of having a typesafe way to configure db4o. In the second one, I explored how we could use expression trees to figure out a property name being accessed. In this post I'll address the issue on how to dive into the property's IL (intermediate language) to come up with a field name which we want to configure (if you are not familiar to IL, please, read along some of the links here). The best tools I'm aware of (when it comes to read or even change a method's IL!) are the Mono Cecil library and Net Reflector, which is an invaluable tool when you want just to look at a method's IL without writing one line of code. I don't need to say that I used these tools a lot :) In order to come up with a field name to be indexed, the main approach is to find all return paths for the property getter in question and check the field being returned. Simple, right? Unfortunately no; what if the property getter returns only constants? what if there are more than one field involved? Before we dig into the details about IL handling, let's document the behavior for some possible use cases (the less obvious ones) regarding property getter return paths:

Only constants: Throw exception
More than one field involved: Throw exception
Chained method: returns the called method / property accessed field.

(we could come up with different requirements, for instance, every field referenced in one property return path must be indexed, but in order to keep it simple, let's stick to the rules presented above). From my previous posts, we already have a way to figure out the property get being accessed; the next step is to analyze the method body; I used Cecil.FlowAnalisys to accomplish this task (line 6) in the following function:

public FieldReference Resolve(MethodDefinition method)
{
if (CheckRecursion(FullName(method))) return null;
callStack.Push(FullName(method));

ControlFlowGraph graph = FlowGraphFactory.CreateControlFlowGraph(method);

graph.Dump(Console.Out);

Resolve(
graph.Blocks[0],
firstInstruction => Array.Find(
                   graph.Blocks,
                  candidate => candidate.FirstInstruction == firstInstruction));

callStack.Pop();

FieldReference reference = CheckReferencedFields(
                         referencedFields, method.Name);
referencedFields.Clear();

return reference;
}

Next, for each control block we inspect it's IL as shown in the following code:

void Resolve(
    InstructionBlock block,
    Func<Instruction, InstructionBlock> blockForStartingInstruction)
{
if (AnalyzeInstructions(block.FirstInstruction, block.LastInstruction))
{
 Resolve(
    blockForStartingInstruction(block.LastInstruction.Next),
    blockForStartingInstruction);
}

for (int i = 0; i < block.Successors.Length; i++)
{
  Resolve(block.Successors[i], blockForStartingInstruction);
}
}

In Line 5 we call a function that do the dirt work of analyzing the IL instructions. This function returns true if the block containing the next instruction (after the last instruction of the current block) should be handled as the "next block" (this is done to support finally blocks). Lines 12-15 just process the remaining blocks calling Resolve() recursively. The real fun is in AnalyzeInstructions() function which do the following:

Line 7/9: For Ldfld / Ret instruction sequences, add ldfld target to return field list (lines 11 and 12).
Line 16: Add an alias for the each Ldfld /Stloc sequence;
For Ldloc / Ret sequences, search the aliases to see if ldloc's operand is an alias for a field and add any alias found to the return field list (line 23).
If a method call is found, analyze the target's method body (lines 35 - 41).

bool AnalyzeInstructions(
Instruction current,
Instruction last)
{
while (current != last.Next)
{
if (IsLoadField(current))
{
 if (IsReturn(current.Next))
 {
     referencedFields.Add(
          (FieldReference)current.Operand);
 }
 else
 {
    EnsureLocalVariableAliasFor(current, () => (FieldReference)current.Operand);
 }
}
else if (IsLdloc(current.OpCode))
{
 if (IsReturn(InstructionOrBranchTarget(current.Next)))
 {
   FlushAliasesFor(current);
 }
 else if (IsStore(InstructionOrBranchTarget(current.Next)))
 {
    EnsureLocalVariableAliasFor(
        current,
        () => aliases.Find(
            candidate => candidate.OpCode == current.OpCode).Field);

    current = current.Next;
 }
}
else if (IsMethodCall(current))
{
 EnsureLocalVariableAliasFor(
   current,
   () => Resolve(
      ResolveMethod((MethodReference)current.Operand)));
}

current = current.Next;
}

return IsLeave(last);
}

That's it! You can download the full source code here. [Updated on 7/23/2008: Fixed the source code download link] Just in case you want to play with the code, one point can (need to) be improved is the handling of exception blocks. Thoughts? Adriano

Programing Fun

Jun 25, 2008

Type safe / refactor friendly configuration for Db4o - Part III

2 comments: