Monday, December 23, 2013

Knowledge, Behavior, and Information

I mention this in my book but I thought I might elaborate.  I think a useful way to conceptually divide the parts of a database is into three groups of design elements: information, knowledge, and behavior.

The Three Concepts

Information and knowledge are often confused in everyday language.  So, first, I'll disambiguate those two words.  Not all data are information or knowledge and rarely are the two interchangeable.

Information is a special subclass of data.  I'm sure someone who is an expert in communication theory or some other kind of academic would be glad to correct me and I won't fight them on what the technical definition of the word is.  I'm only interested in what the useful definition for my own purposes is, not the officially right definition.  For the purposes of this blog entry, and of everything I say and write, it is the part of a signal that the recipient did not know in advance.  Simply put, information is data which informs its recipient.

If information could be thought of as facts in transit, then knowledge would be facts at rest.  In essence, knowledge is potential information but it is also a potential driver for action.  That is, the two uses of any given object's knowledge are to inform other objects, thereby adding to their knowledge, and to inform decisions, thereby improving the value of an action taken.

That latter purpose is the perfect opening to briefly introduce the third player in the database design world: behavior.  If knowledge and information are facts at rest and in motion, behavior can be thought of as how something responds to knowledge or information.  For instance, you drive on the correct side of the road because you know you will slam into something if you do.  Likewise, you yank your hand away from a too-recently poured cup of coffee have become informed that the cup is too hot to touch without damaging tissue.

Information as Pertains to Databases

In the database world, information is the set of signals sent or received by a database.  A query and its parameters, the invocation of a stored procedure, the results set, and any errors that occurred are all examples of information as a database sees it.

In essence, information is the "surface" of a database's design.  It is impossible for external parties to access the value of a database except by sending and receiving signals.

Moreover, it is the means by which value is conveyed between a database and its clients.  It is pointless to update a database with information it already knows.  It is useless to query a database for what you already know.  Value is created by such actions that results in one of the other entities "learning" something.

Knowledge as Pertains to Databases

Knowledge is the reason why databases cannot be maintained using the simple "blast and rebuild" upgrade path we apply to most software deployment problems.  All the facts stored in a database are knowledge; not all the data, because you can introduce noise into a database's design, but all the facts.

Knowledge is the purpose of a database.  Most software products and components exist to convey facts between parties or to process data and discover new facts.  Some software exists to entertain.  Databases exist to preserve knowledge.  Each production database is a modern day Library of Alexandria, complete with the ability for some asshole to burn it down and, in so doing, to cause irreversible damage.

We have known this for a long time - as close to "forever" as matters.  Databases have always been designed around the knowledge they capture and preserve.  Those design decisions stand as a reflection of our implicit understanding that databases aren't merely data bases, but knowledge bases.

Behavior as Pertains to Databases

So what is the role of behavior in database design?  It's another one of those things that can be put simply or drawn crisply, but can take a lot of work to implement correctly.  The role of behavior in a database is to mediate between knowledge and information.

All the information that a database receives needs to be translated into knowledge and stored for safekeeping.  Why?  So that, later, that knowledge can be translated back into information to help other actors make decisions or discoveries.

You can codify the behavior of a database in many different ways.  At the time of this entry, the most common way is to couple the behavior offered by a database directly to the kinds of knowledge that database can store.  This is accomplished by creating, publicly exposing, and coupling clients to table structures and relationships.

The Relationship Between the Three

I find it useful to divide database design into three parts.  The information layer of design is where the interactions between databases and other objects are defined.  The knowledge layer of design is where the facts you want to store in a database are housed.  The behavior layer is where one codifies the manner in which facts are absorbed or emitted.

behavior translates between knowledge and information

I'll post more on each of the specific layers of design with some implementation recommendations later.