Max Guernsey, III: May 2011

There is a commonly held belief that the presence of good tests obviates the need for a compiler. It is a myth. It can easily be shown to be false and I will do so in a few paragraphs.

Conceptually
Before demonstrating in concrete terms why a compiler is still necessary. Now, perhaps it is true that the single task of adding a class to a system of classes might not greatly benefit from the presence of a compiler. It also might not be true. Either way, I'm not going to address it because it's not the hard part of what we do.

Where a compiler comes in handy in a way that the simple presence of tests cannot help us is in the process of changing design, not creating it. TDD does not replace design, it augments it by attaching executable specifications to elements thereof.

Compilers tell you when elements of your design are inconsistent. In ways that tests cannot. At least, in ways that tests cannot do without replicating the purpose and behavior of compilers. They do this by allowing you to create positive coupling that can be validated at compile time.

Duplication
To really understand the value of a compiler, you have to understand the nature of duplication. Duplication exists whenever you have to make a single change in multiple places.

Unfortunately, duplication is not actually something we can eliminate altogether. In fact, all the techniques we have been taught for eliminating duplication do not, in fact, eliminate duplication but actually replace uncheckable duplication with coupling - a form of duplication that can be mathematically checked for consistency.

For instance, the signature of a method and the calls to that method are duplication. There are cases when changing the signature also requires the calls to be changed or requires some other change like creating an overload. However, with a strong-typing system such as you often find with a compiler, you have a mechanism to ensure that all calls are at least basically in sync with the methods they are invoking.

Duplication doesn't stop you and rarely even slows you down when you are creating something new. It's when you are changing things that duplication becomes a killer. The reason it is a killer because people seldom find all instances of duplication on their own. I refer you to Shalloway's Law.

Change
Healthy products change a great deal. The more useful and important your software is, the more you are going to need to change its design. There's a small irony there, isn't there? The more your software is used, the more it will change. The more it changes, the more the design will change. The more the design changes, the more you will want to keep instances of necessary duplication in sync with one another.

The key to surviving change in any system of a reasonable size is not to eliminate duplication; that is impossible to do one-hundred percent. Instead, it is to eliminate unnecessary duplication and to comply with Shalloway's Principle (same link as before) in the case of necessary duplication.

Dependencies and Strong Typing
Strong typing, which is typically found in compiled languages, is the only tool we have invented at the time of this writing that combats necessary duplication by linking together all instances of a particular duplication. It's really the strong typing, not the compilation, but at this time the correlation is so strong that you can practically treat them as the same thing, setting aside bastardizations like VB.Net.

Strongly typed languages tend to do this by allowing programmers to create dependencies that are implied by one entity in a piece of software using another. For instance, all calls to a method depend on the definition of that method. You cannot compile things that call a method without basically correct parameters. Likewise, all references to an instance of a type grant the dependent access to its public methods and allow it to pass the referred-to object around as a parameter to suitable methods.

These dependencies can then be checked for an entire compilation unit in a single pass, guaranteeing a basic of potential correctness before even bothering to produce a program that can be tested. Note that, if there were a strongly typed, statically checked, non-compiled language, then the programmer would get the same benefits except that it would happen when the script is loaded rather than when the program is compiled. The difference between those two things would be imperceptible to most, if not all, programmers.

Strong, Weak, Explicit, Implicit, Static, and Dynamic Typing
It is also important to note that I am not arguing for or against most of the traits listed in the title of this subsection. I am arguing in favor of strong typing and against weak typing. I don't really care about implicit typing or explicit typing, nor do I care about static or dynamic typing, except to the extent that they may influence a language's ability to be strongly or weakly typed. Let's go over what these various things mean, in case it is not obvious.

A strongly typed system is one in which types and method calls can be validated prior to the execution of any part of the type system being evaluated. Dependencies are defined in some way (probably by use) that allows the basic compatibility of design elements to be automatically verified with no work required from a programmer beyond defining said design.

A weakly typed system is one that is not strongly typed. That is, one in which a developer has to do extra work to validate the consistency of relationships between entities in a system or one in which it is not automatically done before a design element can be used.

An explicitly typed system would be one in which all types, including abstractions, must be explicitly defined. Java and C# are examples of such a system. If you want to create a polymorphic relationship, you have to create some kind of interface that is implemented by variants of the abstraction.

An implicitly typed system is one in which types, including abstractions, can be inferred by some part of the programming language or environment. A lot of scripting languages do this: you define the interface for an abstraction by how the caller uses implementations. What you may or may not have considered is that C++ also had elements of implicit typing: templates created a de facto interface between the template-ized thing and its dependencies. Note that the dependencies of a C++ template are still checked at compile-time so implicit typing is, by no means, inextricably linked to strong, weak, static, or dynamic typing.

A statically typed system is one in which types cannot be redefined at runtime. C#, C++, and Java are all examples of such a language. You define a type and that's that.

Finally, there are dynamically typed languages. JavaScript would be an example of such a language, where a type could be changed after it has been loaded and even after it has produced an object to be used.

I don't care whether a language has implicit or explicit types. Nor do I care whether it has static or dynamic typing. I may have a slight bias toward the former in each category but I don't consider it a big problem if a language or platform decides to go the other way. It is specifically the strength or weakness of a language's typing system that matters in the context of this blog entry.

Strongly typed systems are preferable to weakly typed ones because they aid in refactoring and extension in ways that cannot easily be done using other tools available to date. They do this by forcing us to reconcile instances of necessary duplication before we even bother testing a code change.

Practically
Hopefully, you are at least intrigued by the theoretical information provided above. However, I recognize that abstract arguments aren't always enough to make one's case so here's a real world example. This is a simple case and, the more complex the case, the more dangerous weakly-typed systems are.

In this example, we have a template method pattern. The pattern is implemented the classical way (through inheritance) even though I seldom do that in real life. No point getting embroiled in another debate before this one is settled. :)

The Setup
In the weakly-typed language we are using (which I'm making up on the spot to avoid getting into a religious war), we have no way to define an abstract method that ensures all inheritors of our base class implement it.

Let's look at the base class now:

class BaseClass
  method DoesSomething(x, y)
    factor = 1
    factor *= ComputeFactorForX(x)
    factor *= ComputeFactorForY(y)

    return factor
  end method
end class

Now, let's consider one inheritor of this class:

class RarelyUsed extends BaseClass
  method ComputeFactorForX(x)
    return 1 + x * x
  end method

  method ComputeFactorForY(y)
    return -1 - (y * y)
  end method
end class

In addition, there are twelve other extensions of BaseClass which are used very frequently. Let's even go as far as to say that we have a strong suite of acceptance tests that validate the value delivered by the software being developed. Even with such a thing, however, it is impossible to test every combination of every object.

Now, as I've stated before. TDD is handy for helping us define the behavior of each of these classes. It may even allow us to prescribe the relationship between them. What it doesn't do automatically is enforce the relationship.

The Twist
Let's imagine that we want to test-drive an extension to the behavior of the base class. That is, we want to change the relationship between BaseClass and its inheritors. TDD allows us to define the behavior as we want and ensure it is done right:

class BaseClass
  method DoesSomething(x, y, z)
    factor = 1
    factor *= ComputeFactorForX(x)
    factor *= ComputeFactorForY(y)
    factor *= ComputeFactorForZ(z)
    return factor
  end method end class
end class

What it doesn't do is make sure that all inheritors of BaseClass conform to the contract it demands. We have to wait until the real application is put together in order to see if we did everything we need to do. Let's say we just plain forgot the type RarelyUsed. Why wouldn't we? It is, after all, rarely used and a member of a family of types that is thirteen-large.

The Punchline
How long do we have to wait for that feedback? Ironically, the best-case scenario is: "Forever." In that scenario, the bogus class is never used and will be deleted the next time someone looks at it. The next-best case scenario would be when your automated acceptance tests run. Even that creates an irritatingly-long delay.

What if it happens during manual testing? The cost of finding and fixing the problem is going to be dozens of times what it would have been if the problem were apparent when making the change originally.

What if it happens in production? The loss of goodwill is possibly irreparable but probably could be smoothed over. However, the cost of finding and fixing a problem with that kind of delay between introduction and discovery would be astronomical, when compared to the cost of finding and fixing it immediately. We're talking hours or days when compared to seconds or minutes.

The Rebuttal
One might argue that one's tests are deficient. For instance, one could have a method that checks for the existence of the required methods on an inheritor of BaseClass. That method could then be called from a test in the test suite for each inheritor.

I'll not try and claim that isn't true. It obviously is. My counter-rebuttal is this:

Yes, you can do that. You can also write your code in assembly and write tests for each method which ensure it is properly formed - that all code paths lead to an exit or to an infinite loop, that it properly puts things in the right registers, etc.

In fact, you can re-write as many parts of a strong-type-system-checker or a compiler. However, doing so either requires you to write a compiler and then use it from a whole bunch of tests or to duplicate your design in the form of tests.

Conclusion
I think I've made some strong arguments in favor of using a strongly-typed programming language over a weak one. You cannot avoid creating duplication so, in addition to keeping to to the minimum amount necessary, you should use automation to enforce as much consistency between duplicates as possible.

I've failed, however, to mention the biggest and simplest argument. I've done this partially because I'm saving it for last. Here it is: there's no cost. It's one-hundred percent free to use a strongly typed language instead of a weakly typed one.

At least, that's how all the proponents of weakly-typed languages seem to make things look. You see. I've spent days trying to get people who don't like type-safety to explain one advantage of not having it. Nobody - not one person I've talked to - has been able to identify a single reason why it even might be better to have a language be weakly typed than it is to have it be strongly-typed.

The closest anyone has come has been something along the lines of "it's more challenging, so I work harder and with more focus." Languages should not provide the challenge. There are plenty of real challenges: getting your tests right, creating a good design, and aligning implementation with business value, to name a few. We don't need made-up challenges coming from languages that represent a big step backward.

So here is my final argument: I've shown that there is an advantage to having a strongly typed system. I submit that until a single, real advantage of weak typing can be identified, that tips the scales in favor of strong typing. If someone can come up with a strongly typed language that is also implicitly typed and/or dynamically typed, then that should be considered to be a competitor to real languages like Java, C#, and Go.

TDD allows you to use a language more effectively than you could without it but cannot make up for fundamental deficiencies in that language. One such deficiency that is presently en vogue is the absence of a strong typing system. You can mitigate that deficiency by writing numerous additional tests but at a number of great costs.

Just because you write good tests doesn't mean you cannot benefit from a compiler or something that fills a similar role and nobody has been able to show me how you could possibly benefit from not having one.

Max Guernsey, III

Pages

Monday, May 30, 2011

I'm tired of hearing how teachers are underpaid

Walk away, walk away / I'll be a parade

Walk away, walk away / I'll be a parade

Saturday, May 21, 2011

Why TDD Does Not Replace Compilers