The Cranky Sysadmin A world of technology, fun, and ignorant rants.

January 5, 2009

The Cost of Checks

Filed under: Programming,System Administration — Cranky Sysadmin @ 10:14 am

I read an interesting article by Paul Graham about the cost of checks in an organization. Checks in this case refer to things like comprehensive QA of a product or feature before release. Paul’s premise is that all checks have a cost and some of the costs are surprisingly high. I tend to agree, but I also see things through the lens of an operations guy. When I release a product that has gone through no checks (I have been told to do this), there is a high probability that something will break. In many cases, the breakage can be crippling. I think there is a good chance that some checks will actually make a company more nimble, especially as a code base grows in complexity and more people depend on your product.

If a company spends as much or more time fixing the problems caused by a release as they spent on the release, maybe some checks need to be put in place to cut down that wasted effort. The problem is finding a balance where the check costs less then this extra effort and pain.

Maybe unit tests help. People who do unit tests seem to have different opinions about how effective they are. Some say that it causes them to take twice as long to write the code. Others say it helps them write the code faster since they have a codified requirement.

Automated testing is another thing that can make the release process better. There is a large cost associated with this in many cases though. You need a highly qualified QA guy who knows how to program and knows the product to write the tests. If he’s that good, maybe his time is better spent helping build the product (if this is a small organization). Could the automated tests be written by the group who develops the software? Sure, but it will slow down “real” development work.

I think the real problem is that as complexity increases, the need for checks increases. Eventually, one gets to a point where progress is ponderously slow. What do you do about this? Well, all of the solutions that I know of are bad in some way. One has to make a choice between pain and searing pain.

  • Keep it simple. Simple systems are easier to check and there is less risk in changing them since they’re easily understood.
  • Keep it small. This is really a subset of simple. If you have 5 simple systems (like web servers), you’ll have an easier time managing them then if you have 100 simple systems.
  • If you can’t keep it simple, only make it as complex as it has to be. This means managing customer expectations (which no one wants to do in my experience). Make sure the system isn’t more complex then your developers and operations staff can manage.
  • Don’t let complexity creep up on you. Know that it’s coming and plan for it. Know that your costs will rise as the complexity rises. Find ways to make the cost rise as slowly as possible.
  • If you have to have complexity, automate the heck out of everything you can find. This is easy to say, but if the system has grown so complex that all you do is fight fires, then you won’t have time to automate unless you are willing to accept a lower level of service for the time it takes to automate.

I am learning these things the hard way, so I don’t know of any elegant solutions to the problem of complexity which go beyond what I’ve already mentioned. Maybe after a few more startups I’ll have more useful advice then, “Watch out! You’re headed for a big bucket o’ misery!”

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress