Saturday, March 12, 2011

A mile stone in the long road

DataClass now supports Java.  It took a lot longer than I expected but it's done now.

You might be thinking "that's neat but what does it mean that it 'supports Java'?"

Before I answer that, I'll briefly remind you what DataClass does - just in case you stumbled on this entry by accident and don't already know.  DataClass is a compiler that takes documents that describe a class of database and produces binaries that know how to build and interact with instances of that class.  The point is to give you the same ease of development you would have for any other class of objects; for instance, allowing you to quickly spin up test instances with identical behavior to production instances.

Following is an example of a DataClass source document:

database SomeDB {
  types int as integer;

  version 1.0 : initialized {
    design {
      public table T {
        public column X with DataType = type(int);
      }
    }
    construction { 
      step sql { $[T.Declaration] }
    }
  }

  current version 1.1 : 1.0 {
    design {
      public table T : base.T {
        public column Y with DataType = type(int);
      }
    }
    construction { 
      step sql { ALTER TABLE $[T] ADD $[T.Y.Declaration] }
    }
  }
}

That would create a class called SomeDB that knew about two versions of a database design, knew how to build and/or upgrade databases to either of those versions, and had a body of symbols allowing clients to couple to public aspects of each of SomeDB's various designs (1.0 and 1.1).  If stored procedures were involved, there would be a proxy for each version that allowed clients to call these stored procedures as if they were normal methods on a normal object.

Up until build 20.11.1733, DataClass would only produce .NET assemblies that connected to a database through ADO.NET.  This most recent build, however, added the ability to produce database proxies and constructors in a java JAR file.

This involved a significant amount of refactoring before it could readily be supported.  I had to encapsulate all the variation between Java and C# (not too hard), between the JRE and the CLR (there's a fair amount), and between JDBC and ADO.NET (they work in fundamentally different ways).

The really disappointing thing about this was that I discovered how fundamentally different JDBC and ADO.NET were very late in the game.  In ADO.NET, access to schema information and binding by name are both easy and reliable.  In JDBC it appears to be expected that you will bind by position - something I thought we left behind two decades ago - because binding by name is possible for certain providers but not guaranteed.

Anyway, that was a problem I could work around pretty easily.  DataClass uses the logical position of a parameter or column to infer its physical position.  There are, I'm sure, times that this will not suffice.  In such cases, developers can use the PhysicalPosition attribute to dictate the binding position of a parameter or a column.  That might look something like the following:

public procedure MyProcedure {
  // Inferred: PhysicalPosition = 1
  public parameter F;
  // Inferred: PhysicalPosition = 2
  public parameter G;
  // Explicitly: PhysicalPosition = 0
  public parameter H with PhysicalPosition = 0;
}

Another "gotcha" for Java programmers is the fact that DataClass expects physical positions to be zero-based and adjusts accordingly for the platform.  So, even though the first parameter is parameter 1 when binding via JDBC, you must specify it as parameter 0.  I guess the point is this: don't take JDBC's implementation details into account as DataClass hides them.