Wednesday, December 04, 2013

My Journey to a Class of Databases

If you know anything about me or you read my article on InformIT, Ten Tips for Constructing an Agile Database Development Environment that Works, you know that I think the foundation of database TDD lies in creating a class of database.  That means transferring focus from the design of individual databases onto something that makes databases.  You want the ability to create or upgrade as many instances as you like and know that they all have exactly the same design.

Exactly the same design.

There are many possible ways to implement a concept like that, though.  When I began developing these concepts nearly a decade ago, before I even knew that I wanted a class of databases, I was just focused on controlling database creation.  I experimented with a lot of different kinds of infrastructure and a bunch of different patterns of database growth.

Since I started being a professional software developer, I've tried a bunch of different ways to control how databases grow and transform.  I think a lot of people have had similar thoughts and various times in their lives.  These are the ways and times I thought about these problems.

The Fool's Errand (pre-2005)

The most naive solution is the idea that you can specify what you want the design to be right now and have some tool that will update an existing database to have the new design.  Sometimes, it is a tool that compares two databases.  Sometimes, it is a diagramming tool that will inspect a database and figure out how to make it comply with a drawing.

magic will transmit design changes!
The problem is that this doesn't work.  It doesn't work for the same reason that you can't unscramble an egg.  There is no way for a software system to look at the current design, look at a new design, and figure out how to get from point A to point B.

At least, it's not possible to do that every time and with current technology.  Maybe, one day when we have computer systems that can infer intent, it will be possible.  Right now, however, that's too complex a task for a computer.

The Installer Fallacy (2005-2006)

Another way of thinking about the problem is the way we imagine installers.  Databases have components.  Components have dependencies.  You ask the installer to make sure the features you want are there and it ensures the dependencies are satisfied.

The problem is that there is always a meltdown.  In this case, I'm using that term a little less figuratively than usual.  In a healthy database design, things are changing.  Tables are splitting and recombining into newer, better shapes all the time.

The features all melt together more quickly than you could imagine.  Pretty soon, it's difficult to tell why you are creating separate features and components at all.  Eventually, all the components blend together and you wonder why you ever divided components in the first place.

Rise of the Versions (2006-2008)

After about my third database "feature" that depended on exactly one feature, which in turn depended on only one feature, I started to get the message.  I realized that the forces in the database world are telling us to organize around time, rather than around features.

It turns out that there is usually at least one database instance that as an extremely linear path of transformation, and it happens to be the absolute most important kind of database there is: a source of record database in production.

Production databases tend to metamorphose over time in a series of discrete transitions from one design to another.  At the same time, production databases are the most indispensable and long-lived databases of all.

Everything else (e.g.: test databases or development databases) tends to have a little more flexibility.  At the very least, nothing else has less flexibility.  So why shouldn't the most important and least flexible kind of database define how all databases of a particular kind are built.

Have a Little Class (2008-present)

When I started formulating these thoughts into something that I could start evangelizing, I realized there was more to this than just regulating the flow of design changes from a development environment out into a production environment.  That's an important feature but it's just an implementation detail of a much more critical shift in mindset.

a path of confidence
What really matters is having uniformity of design between all the different database instances filling the same role.  If you have that, tests executed against one instance allow you to make predictions about how another instance will behave.

That mechanism - that way of thinking - serves as a critical underpinning for test-driven development in the database world.