Thursday, December 12, 2013

Why the Linear Chain of Deltas Is the Best Way to Define a Database

There are several ways to organize the revisions in a class of databases.  In fact, there are at least two distinct ways that one's strategy can vary: how one organizes the outcomes of applying a class of databases and how one organizes the implementation of a class of databases.  It is my stance that, in almost every case, the most effective solution is to have a class of databases codify a series of released versions and to use a sequence of delta scripts to get there.

What

The first order of business is to decide what a class of databases describes.  You could have it describe the components of a database.  You could have it describe the current state of a database only.  You could have it describe each released version of a database plus the one you are working on now.

As stated above, I think the latter is best.  Rather than disprove its competitors and every other possible competitor, I will demonstrate its superiority.

There is at least one database of consequence for most release products.  Usually, that is a production database acting as the source of record for one or more software applications.  Usually, if you were to take a time-lapsed video of that database's design diagram over, say, a decade condensed down to a few minutes, the image in the video would remain almost completely static for many seconds, then it change almost instantaneously to a new design before going back to being stable again.  This process would probably repeat for the course of the video.

The real database - the most important one in your product, organization, or design - is almost always expressed as a series of discrete versions.  Its content grows gradually over time but it's design alternates between long periods of stasis and short periods of violent change.

Databases Naturally Transition from Version to Version

These long, stable versions of your production database are the natural targets around which to organize your class of databases.  Why build anything that won't produce one of those versions?  Of course, the version you are working on now is a bit of a moving target but, when it gets released, it ceases to be the version you are working on now and becomes another in the series of versions in a deployed database.

How

So I've shown that the allowable targets of a class of databases should be past and future versions of a released database only because the reality of the most important databases is that they transition from one version to another.  Once you are doing that there are a few options for how you do it.

Define Every Possible Transition
One option is to manage every possible transition in design.  That is, if you've developed three versions of your database design already and you want to add a fourth, you would produce and test a script for how to get from each of those previous versions to the latest.

Aside from the fact that this strategy cannot be depicted without violating the cardinal rule of diagramming design (no crossing lines), this is a lot of work.  You've got to test four paths for your fourth version, five paths for your fifth version, six paths for your sixth... you get the idea.

There's another option: only define the transition from the most recently-released version of your database to the version you intend to release next.  That way, when you want to add a ninety-fifth version, you only need to add to your class and test one set of transition scripts, not ninety-five.

Only Define the Next Transition
How do you get from an older version to the latest?  By applying each of the intervening transitions in the correct order.  Of course, to ensure this is done consistently, you have to build a little bit of infrastructure but that infrastructure is a one time investment and costs almost nothing to write in the first place.  It usually pays for itself in months, not years.

On top of all that, I could make the same argument for how a class of database grows as I did for what its outputs can be.

The most important, valuable, and rigid database in your life probably transitions from to design to design by a sequence of transformations applied to it.  The linear nature of its development is nothing more than a reflection of the linear nature of time (as we are able to interact with it).  You made a series of changes in a particular order so there is a series of changes to applied in a particular order.

Do It

Unless you are in the two percent of people for whom this way of doing things actually doesn't make sense, you should start managing your database designs this way: a linear chain of deltas allowing you to move between discrete versions of design and managed by a lightweight infrastructure.