Programing Fun: 05/01/2012

May 19, 2012

API usage rant

If you have been following me recently (Facebook, Linked-in, Tweeter, etc) you probably are aware that I have been busy taking care of the building of my new house, my job and what not (not necessarily in this order ;).

Man, building a new place proved to be a huge time sink; since we've finally finished the main construction phase I expect to blog more often.

Ok, so lets start with a rant (I am really good ranting :))

Some time ago I was playing with one of my toy projects when I got an exception while adding a new entry into a dictionary. The exception was quite self explanatory about what had happened (from the point of view of the dictionary class implementer): a duplicate key was added (which, according to the documentation, is explicitly prohibited).

Ok, I can live with that, it was my fault after all ;) My real issue is that the message would be much more helpful had it included the actual key value (of course you override ToString() method in all of your public classes, don't you ?).
Without this information I was forced to debug the code (of course I could catch the exception and print the key but then I'd be required to change my code just to find out what was happening) to figure out what the original exception could have told me.

To my understanding, one of the reasons for this is to avoid possible sensible information from leaking and, even though IMHO this is reasonable, it makes me think whether we could do it better.

To me looks like the dictionary class developer (product owner, or whoever) was faced with the usual coast/benefit decision: make it easier, more convenient to use or make it suitable for use when sensible information is involved and he/she decided for the later.

When writing libraries I usually tend to making it easier to be used at the coast of a higher chance of developers misuse (or using it in a way that may shut then in their foots). I confess that I had never thought about this particular scenario but now I think it could be fixed (made more flexible) by allowing the user (in this case the developer) to decide what to expose in exceptions (through an enum, an interface, you name it) and choosing a safe default value.

What do you think?

Adriano

May 9, 2012

Free (open source) RAM disk for Windows (x86 / x64)

Hi

Have you ever had the need to perform some I/O intensive operation on a relative small set of files? I do, usually when I need to compile some application.

In order to speed up this operation I've recurred to the so called "RAM Disks", applications that takes part of your computer's main memory and pretends that it is an actual disk. As you can imagine read/write/etc (I/O) operations against this virtual disk are much faster than operations against your actual hard disks (be it a traditional magnetic HD or a shine new SSD).

In the past 3 years I've used at least 3 different solutions and was happy with none. My main complain is that most of them were not capable (or it was really clumsy/hard to accomplish) of creating RAM disks dynamically, i.e, one is required to create it at OS start up time. Since my needs are dynamic (I may need more disk space at some points in time during the day) it is really hard to come up with a disk size that work for me: choose a to big disk and I am wasting precious RAM space; create a disk that's to small and I may find myself facing "out of disk space" errors (then cleaning up some junk and restarting my build tasks).

Fortunately some time ago I stumbled upon a RAM disk implementation that allows me to mount and unmount disks dynamically! Since then my work flow has been much more smooth ;) The not so good side is that it is not very easy for the "not computer savvy" user.

Bottom line is: if you need a stable and flexible (but maybe not so easy to start with) RAM disk solution you can't go wrong with this one.

May 8, 2012

What's wrong with this code - fun with assembly : the answer

In the previous post I presented the following program and asked what was the problem with it:

int fact(int n)
{
    if (n == 1)
        return 1;
    else
        n = n * fact(n - 1);
}

int main(int argc, char *argv[])
{
     int n = argc > 1 ? atoi(argv[1]) : 5;
     int i = fact(n);
   
     return printf("fact(%d): %d", n, i);
}

As I said in the previous post, the problem itself is not so hard to find out: take a close look in line 6 again! We are not missing any instruction, or are we? Actually we are missing a return so line 6 should really look like

return n * fact(n - 1);

i.e, the developer forgot to include the return.
Note that the compiler tried to alert me with the following warning, but as you know, I just ignored it :) (as I said in the previous post, you should never ignore compiler warnings)

c:\temp\fact.c(7) : warning C4715: 'fact' : not all control paths return a value

So the interesting question is: why this program works even when it is clear that it is missing a return? In order to answer this question we are going to dive into the assembly generated code for this program. To get the assembly code just type the following in a command line (assuming you have cl.exe in your path):

cl fact.c /Fa fact.asm

Even if you have little knowledge of assembly, please, bear with me; I'll try to explain the important parts. Also I've simplified both functions (fact and main) assembly code removing not important (to this discussion) bits.

MESSAGE DB 'Fact(%d): %d', 00H
fact:
 push ebp
 mov ebp, esp
 
 cmp dword ptr [ebp+8], 1
 jne next_fact
 mov eax, 1
 jmp finish
 
next_fact:
 mov eax, [ebp+8]
 sub eax, 1
 push eax
 call fact
 add esp, 4
 imul eax, [ebp + 8]
 mov [ebp+8], eax
 
finish:
 pop ebp
 ret
 
main:
 call fact
 add esp, 4
 mov _i$[ebp], eax
 mov ecx, _i$[ebp]
 push ecx
 mov edx, _n$[ebp]
 push edx
 push OFFSET MESSAGE
 call _printf
 add esp, 12
 mov esp, ebp
 pop ebp
 ret 0

PS: If you want to generate an assembly source with more - actually lots of - information from the original C program use the following command line arguments:

cl fact.c /Fa fact.asm /FAscu

First let have some simple facts:

fact function code starts at line 2 and extends through line 22 (ret instruction).
Argument n is stored at address [ebp + 8]
main function starts at line 24 and extends through line 37.
main calls fact function on line 25 (again, please note that for brevity/simplicity reasons I removed parts of the main function, so it became easier to understand).

Now that we have a high level view of the code lets dig a little deep into the fact function code (and on main function when appropriate).

Lines 3 and 4 represents the standard C function prologue; the first interesting instruction is the one at line 6 which compares n (remember, [ebp + 8]) with 1 branching to label next_fact (line 11) if they are not equal.

Starting at line 12 the code loads n into register eax subtracts one from it and calls itself recursively. When fact returns from a previous recursive call (line 16) the code calculates n times eax i.e, the compiler used eax register to return the calculated factorial from fact. We can confirm this behavior inspecting line 27 (inside main function) which assigns eax to local variable i after calling fact.

But just using eax register to pass the return value from fact is not enough; this program works (accidentally) only because the compiler used the same register eax to perform the calculations and to pass the return value.

So we can conclude that in this particular program, omitting the return statement renders an executable equivalent to the one that would be generated had the return statement be present.

Note that different compilers (or even different versions of the same compiler) may choose other registers to perform the calculations / pass return values from functions; actually I have found at least one version of CL (from Visual Studio 6) that used register ecx to perform the multiplication and eax to pass the return values (rendering an incorrect program).

Note also that even the same version of a compiler may generate different versions of the code (one that uses eax for both, calculations and for the return values and another that uses a different set of registers) depending on options such optimization, debug, etc.

Best