Mar 14, 2020

Mind the Gap: dealing with offset issues in Mono.Cecil


Lire cet article en français
Leia este post em Português


Mind the Gap

If there's one thing I've learned in my IT career is that, sooner or later (more often than not the former), a deeper knowledge about the various technologies abstracted by a library/framework will be required in order to use it efficiently and/or solve issues (in the context of this post Mono.Cecil abstracts various CIL aspects).

Last week while I was working on a Cecilifier feature, out of the blue I started getting invalid assemblies without changing any code in Cecilifier itself. After some head scratching I've figured it out: changes to test classes leading to some of the branches being more than 127 apart from each other.

IL (intermediate language) has two families of branch instructions: short i and long ii forms. In short (pun intended), instructions that take the short form uses one byte as the offset (ranging from -128 ~ 127) to the target of the branch whereas the ones based on the long form takes 4 bytes (a much wider range) and Cecilifier was emitting the former type of branches irrespective to the offset between the branch instruction and its target.

To make it more concrete the following program simulates this scenario by naively adding a bunch of nop instructions inside the if statement leading to the offset of the branch (to the end of the if) to overflow:
When executed it saves a modified version of itself (to your temp folder, i.e, %temp% on Windows / /tmp on Linux) which throws an exception as soon as the affected method is jited (in the example when we try to execute Foo() method).

You can build the program above and:
  1. Execute it passing `modify 1000` as its command line argument (you can try different values for the number of nops):
    `Test.exe modify 1000`
  2. After it finishes, run the modified version passing`run`  as the command line argument (on windows):
    `%temp%\Output.exe run`
Luckily, Mono.Cecil provides a relatively easy way to ensure that branch instructions compatible with the offsets will be used through SimplifyMacros() extension method (defined in Mono.Cecil.Rocks.MethodBodyRocks) which goes over a method's body instructions replacing the ones encoded using the short form of the opcodes with the respective long form ones; this means that branch instructions such as br.s, beq.s, etc. are replaced with their long form counterparts (br, beq, etc) in the same way that opcodes like ldarg.x are replaced with ldarg x and so on. Since those instructions uses offsets that are 4 bytes long it is very unlikely (almost impossible) that targets will be outside the valid range.

After finishing doing the modifications to a method body you can call OptimizeMacros() to ensure all instructions uses the most efficient (space wise) possible encoding by taking the long form of instructions and replacing them with the short form versions whenever possible.

Armed with these methods we can change our program to (note lines #51 & #58):
Now the assembly produced is no longer invalid!




As a last note keep in mind that in general in order to minimize assembly size, compilers emit instructions taking the least space possible (which will be the short form for offesets smaller than 128). That is nice and cool but this means that in the vast majority of the cases the short form of the branch instructions will be used and inserting a singe IL instruction in a method's body may cause short forms of branch instructions to fall out of valid offsets leading to exceptions at runtime.

Have fun.

#monocecil #cecilifier

No comments: