Tuesday, January 07, 2014

Why Revision Control is Not Enough

Several people have suggested that revision control might be an alternative to creating a class of databases.  I really have no way to counter such an argument because I cannot fathom what they were thinking in the first place.

The two concepts target completely and utterly different things.  I guess that all I can really do is talk about how different they are and hope some people change their stance.

Revision control (e.g.: CVS, Subversion, or Perforce) allows you to track and manage the changes you are making to source code.  The point is to (a) provide a place of safekeeping for your documents of design and (b) to allow you to access previous versions of a design.  When you update the design of a class, you create a new version of it in source control and do not generally see older versions unless you go looking for them.

The point is to allow you to work on the most recent design in a particular branch while keeping a history of what was done and not betting on a single computer to keep your source code safe.  In short: Source control is about managing the relationship between developers and designs.

A class of databases has a totally different functions.  Like any other class, it's most interesting relationship is between it and its objects.  Yes, a database class must store the steps required to build older versions in it but for a totally different reason that why a version-control system stores older versions of source code.

Whereas a revision control system tracks versions for the benefit of the development process, a class of databases must track revisions for the benefit of the database-construction process.

Storing the incremental deltas required to get from each version to the next and forcing all databases in a class to follow the exact same path of growth enables test-driven, agile database development in two ways.  For one thing, it drastically simplifies the database deployment model; every single database of a given type grows exactly the same way.  For another, it allows you to properly test the transitions from one version to the next.

The versions of a source file are created because a developer has learned something, changed his mind, or otherwise revised design and they are created to ensure that developers can revisit previous designs.  The versions of a database class are created because a revision to design was actually released to production and they are created so that every database in a class is built in the exact same way as every other database in that class.

Monday, January 06, 2014

Encapsulation of Database Design

It seems to me like every software developer goes through a phase where they think stored procedures are bad and a phase where they think they are good.  Like me, some alternate through these positions several times before settling on a position that works.

In the end, I discovered that there was never a dichotomy in the first place.  The real question was not whether to use stored procedures but how to use them.  The struggle over whether or not to use stored procedures was never really about stored procedures.  It was always about properly separating concerns and encapsulating designs.

In the Beginning...

I started off loving the idea of stored procedures.  I loved them based on a single argument that, at the time, was possibly relevant: performance.  That pernicious specter that looms over our industry, the myth of proactive optimization, cast its spell over me and I was totally sold.
In the nineties, it might even have been true that there was a compelling performance-based reason to write a stored procedure.  In general, for modern database systems, that is no longer the case.

Flip

As with many people in the pro-stored-procedure camp, I ran into one of the many potential failings of heavy stored-procedure use.  Putting behavior in a stored procedure because it is "faster" or "more appropriate" often drives bad design decisions.

The more logic I added to a database, the harder it was to test.  This was before I stumbled into the ideas in my book but it would still be true, at least to some extent, today.  Also, the fatter I made my database layer, the thinner I made the other layers.  The coup de grace was the fact that, on a regular basis, large stored procedures were actually slower than performing dumb selects and processing the data into meaningful results in a middle-tier service.

Essentially, every time I transferred business logic from somewhere more appropriate to a database, I made my system harder to test, less flexible, and slower.  So I did what humans often do and switch from one extreme to its opposite: no stored procedures of any kind.

Flop

The problem with toggling between extremes is that it reduces the world to binary decisions.  Of course it is true that, in any decision, one is considering a finite set of options which can be reduced to a series of binary decisions.  There's nothing wrong with choosing between two things but there is something wrong with limiting your options early in the decision-making process.

The "no stored procedures" stance didn't work out very well either, although it does work better than "everything in the database" way of working.  Instead of a tightly-wound, change resistant machine that served as the center of an application's universe and ground change to a halt, I was building delicate arrangements of glass work that shattered at the first hint of change which served as the center of an application's universe and ground change to a halt.

The problem with not having any stored procedures is that something has to have an intimate relationship with the structure of a database.  When the stored procedures go away, that tends to become whatever is connected to the database (usually a middle-tier service).

That kind of intricate understanding of implementation details between one module and another is inherently more difficult to enforce that such an understanding residing exclusively within a module.  It drives the cost of change through the roof.

The Other Way

As is often the case, the artificial binary decision was hiding the true nature of the decision in the first place.  The question was never "should stored procedures be used or not?"  The right question to ask is "for what should we use stored procedures?"  Note the "yes" and "no" answers to the former question map to answers in the latter but there are many other potential answers as well.

I don't want to iterate through all the possible ways stored procedures could be used in a database design; mostly because that list is probably infinite in length.  So I'll just cut straight to the one that I think is right: encapsulation.

A while back, someone I was working with, Christopher Taylor, suggested to me that stored procedures were the equivalent of a class's interface for a database.  They are a contract between a database and its clients.  So long as the contract is fulfilled, you are free to revise the implementation details to your heart's desire.  He didn't use these words and I don't recall the exact words he used but it the point is that it was he who turned me on to this idea.

Database Designs Require Encapsulation

Probably the most sapient portion of the observation had little to do with stored procedures, specifically.  Stored procedures are merely a tool we use in recognizing that databases need encapsulation of their designs just the same way that any other object would.

Depending on the features available in your chosen database platform, you could conceivably create encapsulation other ways.  For instance, some platforms allow updateable views.  So long as you could keep satisfying the contract specified by an earlier set of exposed tables, you can change the design of your database behind that wall of encapsulation.

As Scott Bain says, encapsulation is the first principle of design.  It should come as no surprise that this is true for database designs.