Monday, December 16, 2013

Tracking Versions in your Database

As you may or may not know, I preach creating classes of databases over manipulating the designs of individual databases.  There are many different implementation decisions one could make while elevating design from instance to class.

In a previous post, I discussed the shape of the class itself, arguing that it should be organized around actual released versions plus the thing you intend to deploy next.  This is not revolutionary.  It's not ubiquitous but I'm certainly not the only one to suggest that this is how databases should be maintained.

There are several ways to manage how versions are tracked when an instance of a database has been deployed.  One way that I've seen done is to keep a text file with a list of the scripts that have been executed.  Another way is to fill a directory with scripts that have been executed, with each script living in its own file.

The way that I think works best and that fits most naturally within the concept of a class of databases is to store the state of a database instance within the instance itself.  The alignment is so strong, in fact, that it makes the other methods seem bizarre and beyond the realm of consideration.

Dependency

From a practical perspective, storing a database's version information inside the database only makes sense.  You don't want to have something outside a database that is required to run or maintain it.  What if that thing gets lost?

the puzzle stops working when
it has a missing piece
Sure, one could make the same argument about the class of databases itself but the odds of a document or set of documents that serve as the source of record for something's design getting lost are extremely low - we have revision control and backup policies to ensure that.

In most environments, the rigorous database maintenance procedures ensure that losing data stored inside a database is many, many, many times less likely than losing a text file or a folder full of scripts.  The main way that you might lose said data is if you somehow lost the database itself, along with all its backups, in which case you would probably not care if you also lost the version data for that database.

Management

Another matter of pragmatism is managing databases is that of managing the dependency between them and their version-tracking data.  If those data are external to a database instance, then you have to know where they are stored in addition to how to connect to a database.

The most dreadful consequence of this management hassle is that you might accidentally upgrade one database based on the state of another database.  Depending on what precautions you take, that could have consequences ranging from a few minutes of aggravation to loss of valuable data. 
do you want to feel like this guy?

Moreover, there is a persistent and non-trivial management cost.  Everything you do with a database - every single instance you create - also has to have this manifest of its upgrades stored somewhere else and the association between the two must be tracked for as long as the database lives.

What about all those little databases that only last a few seconds or a few minutes?  As your test suite is executing?  What about all the development databases?  It's not like tracking each of these relationships is or expensive but all those little tasks accumulate to produce an awkward development environment fraught with menial tasks that draw your attention from the things that matter.

Principle

There are several different supporting arguments and each, to me, is sufficient but all are derivatives of one overriding factor: objects are only objects if they are whole.  In a normal object-oriented environment, access to one object is achieved with a single reference.  One pointer to a spot in memory.  One connection string to a database instance.  Those are a good ways to access an object.

this is not a house
not yet
Imagine if, instead, you needed two things to make an object work.  What if a string object stored all its data intrinsically except for its length and, in order to properly use it, you had to track its length in another variable?  Could you do it?  Of course.  Would it be insanely complex?  Definitely.

The wholeness of an object is what grants it its identity; what elevates it from being a mere blob of data that can be accessed by a program into an object.  Tracking the upgrade state of a database in said database is part of keeping it whole.