May 1, 2008

Type safe / refactor friendly configuration for Db4o

As of today, Db4o uses strings to reference fields in some scenarios such:
  • SODA queries
  • Configuration

See the sample bellow:

IConfiguration cfg = Db4oFactory.NewConfiguration();
cfg.ObjectClass(typeof(Test)).ObjectField("_age").Indexed = true;
IMHO, using strings in these situations is sub-optimal as:
  • Such solution relies in internal class details (field names)
  • Compiler can't check whenever the field actually exists.
  • We don't take advantage of modern IDE's refactorings so, chances are that these entities and references to them get out of sync breaking your code; it's even worst because you will figure it out only at runtime :(; it may even get unnoticed as Db4o is completely happy with some invalid field names.
With the introduction of C# 3.0 and its automatic properties we have another issue: we don't even know the field's name used to back the property! Sure, we can cheat by using some tool like reflector or ildasm to figure it out but this would be a fragile solution as C# compiler may choose whatever name it wants, so it may choose different names schemas in the future and we don't want to get our code breaking when new C# compilers come out, do we?

I do believe that this approach was chosen due to technical limitations at the time the code was written, but now, with the debut of C# 3.0 and its new features like extension methods, lambda expressions, expression trees, etc. I bet we can do better :).

To be fair, we've already been improving (removing) some string usages, for instance, Linq / Native Queries uses strong typing instead of strings to reference fields (under the hood they generate SODA queries that still use strings, but the crux here is that these "names" will always be in sync with its entities).

One area that, in my opinion, can be improved is configuration. For instance, in order to setup indexes, deletion behavior (when to cascade delete), etc. we still use strings to identify the field we want to apply the configuration. Suppose we have the following code to configure indexes:

IConfiguration cfg = Db4oFactory.NewConfiguration();
cfg.ObjectClass(typeof(Test)).ObjectField("_age").Indexed = true;
We could introduce new methods to IConfiguration interface so one would be able to do something like this:
IConfiguration cfg = Db4oFactory.NewConfiguration();
cfg.ObjectField((Test s) => s.Age).Indexed = true;

Wow! Using this syntax we no longer have issues with refactorings, abusive internal class knowledge and we get compiler time support for free :) (for instance, if we mistype "s.Afe" C# compiler will complain).

To be fair, we have some issues with refactorings. If we change field/class's name we need to inform Db4o about this changes as explained here (I do have some ideas on how to express these calls also but this is a little bit more complex topic that I won't cover here).

What do you think?

In a following post I'll discuss how we could implement this new configuration method. Go to part II of this post. Adriano


ppetrov said...

This will be a great feature! I was think that it's already in the new version (7.2) as it's the LINQ support. I don't know how this will fit in the dual nature of db4o (Java & .NET). Java doesn't have lambdas. I wish I want to be able to write a query like this :

var persons = db.Query((Person p) => p.Age > 18);

Vagaus said...


Sure, LINQ queries are typesafe. SODA queries uses strings and I can't see this changing for now.

What I want to explore is really how we could make configuration more typesafe.

Regarding your query:

var persons = db.Query(
(Person p) => p.Age > 18);

it's already supported (it's a native query :)