Log in

No account? Create an account
On software development, or how to be less of an idiot while doing it, Part 2. - Alex Belits
On software development, or how to be less of an idiot while doing it, Part 2.
I finally got time to continue the list of my complaints about current sorry state of software development. Since no one responded to the previous entry, I assume that either what I have expressed is surprisingly widely understood, or even considered trivial among people who happened to read it, or they are all in such fundamental opposition to everything I said, they ignore it completely. There is, of course, a possibility that no one reads this in the first place.

No matter what it is, I will add another entry:

4. Software development is an iterative process -- both in long and in short term. In long term software usually goes through multiple versions, releases and fixes produced by developers. In short term developer writes individual functions and components, performs various modifications, then code runs (in intended environment, with debuggers or instrumented environments, or through standalone unit tests) and the results are used by developer for further changes until the software behaves as intended.

Releases and tests

Usually there is at least basic understanding of this. The problem is, how iterative process is handled. There is, of course, the dreaded "it compiles, ship it!" mentality that simply bypasses the whole process, but it becomes less common, and everyone seems to be implementing some kind of software testing and release procedures. However the result is often worse. I am absolutely serious, bad testing is often worse than no testing. Instead of trying to develop software, people end up writing tests. Too bad, there is absolutely no reason why a person would write any less bugs if he writes more code -- in reality more code inevitably contains more bugs, and tests are not an exception. Worse yet, it requires a very clear understanding of all possible corner cases to create a test that verifies if the code indeed acts as it should. And a person who can write those tests is likely to spend less time just making sure all those corner cases are covered -- including ones that will be extremely difficult to test but relatively easy to prove.

Just like, according to Brian Kernighan, debugging is twice as hard as writing code, writing tests is at least as hard as such debugging. A programmer that uses a program in combination with a test that confirms the most clear example of a scenario, really just makes a tautology in the most clear way that he recognizes as a tautology. If the program fails that, something is terribly wrong (with program or with tests), but if it succeeds it means almost nothing. If the program performs any I/O or deals with stored data, there are 28*n possible combinations of input with n being the total length of all data involved in bytes. Including less than obvious kinds of input such as timer if they are in any way involved. If program has any concurrency (processes, threads, interrupt handlers), there are also all possibilities of all threads or process with all combinations of input, shifted by any number of instructions. On all supported architectures. Even in best designed tests, the difference between true test coverage and zero is negligible. For all practical purposes, most tests verify nothing. Then why do they exist?

They serve a much simpler purpose. They allow a developer to find out if at some point he made a terrible mistake, and they allow a quick way to verify if some change broken a different piece of functionality in the most noticeable way. They do not at any degree assure things, they catch most blatant mistakes. Simple functionality test implemented in software as a unit test is not likely to give more information than just running software in a straightforward manner and checking if it works for all known functionality. With an advantage that full manual or automated test-by-trying-to-use-things will also reveal bugs in documentation, as users/testers will try to do what documentation says, and will notice that it does not work if it is described wrong. But where are those oh-so-important tests actually useful? Three things -- error handling, race conditions and regression. As I have described, it's not possible to test for any combination of input or concurrency. However usually mistakes are such that they manifest themselves over a whole class of conditions. Parsers fail on data of unusual size. Concurrency problems usually reveal themselves when multiple data exchanges are triggered by random, independent requests handled by a program. A mistake that was made once, noticed by developers and fixed by some specific change, is more likely to re-appear later rather than a completely new problem, because change could be made with some assumptions that will become invalid as a result of another change. The key is usually not a bug but a way how it was fixed.

Those things are nothing but heuristics. They guarantee nothing. They are not a replacement for an effort to avoid writing bugs in the first place. They catch an occasional blue whale of a bug while being completely oblivious to sharks. Throwing effort toward improving them will lead to vast increase of the amount of code, and considering that bug density is usually constant, it is more likely to make code buggier -- just some bugs will be in tests.

So what is the solution? There really isn't one -- if there was, we would see much less buggy software around. But what helps, is a responsible approach to software development. When writing software, assume that it will never be tested. That there is no way to test it in an environment even remotely close to one where it will be used. That it's going to be uploaded to a space probe on the way out of Solar System. That if it's buggy, it will disable the probe, and there will never be a chance to fix it again. Yes, I can hear the whining that it's unrealistic, takes too much time, expensive, or that that most developers can't possibly guess how their algorithms actually work, or to properly count size of all data involved, or to understand all state transitions in their state machines. But the choice is clear -- either, the effort is directed toward writing less buggy software, or software will be buggy. You can test it until your hard drives' bearings wear out and thermal paste evaporates from under your heatsinks -- once in the field, it will inevitably encounter something that it was not tested against. Modern software is too complex, modularity is too poor, interfaces are too convoluted, so reliability requirements formerly applicable to only few categories of relatively simple software, now are applicable to more, and more complex one. C'est la vie.

With this basic assumption accepted, one can think of the way how to make it easier. Simplifying interfaces, using clear, easy to understand and modify processing models, establishing useful, robust infrastructure, reusing code with most general functionality and interfaces, goes a long way toward making things less buggy. And then if there is a bug, it will be easy to analyze when it finally shows itself -- in testing or in the field. Unless, of course, this software is indeed for a space probe. It is not that hard, most OS kernels are written with this kind of attitude. Look at the size of Linux or any BSD kernel, and think, if your project is really so overwhelmingly large and complex that developers can't maintain clear understanding of how things work. Take into account that kernels are notoriously difficult to test, and all testing frameworks ever developed for those purposes, have abysmally inadequate coverage, so developers themselves don't rely on them at any meaningful extent. Sure, OS kernels have their share of bugs. But think of it -- considering the pace of development in those projects, and amount of bugs that appear there, are your projects anywhere close to this level of quality? Are you really dealing with unavoidable amount of complexity far above those giant projects? Is your difficulty of maintaining or replacing legacy code on the same scale as Linux (20 years in development) or BSD (almost 40 years)?

Perma-debugging cycle

There is another related aspect of iterative nature of development -- how developer works on his code. Since the moment when Borland released the first version of Turbo Pascal, any professional software developer is one or two keystrokes away from running the code that he just wrote. This has perfectly legitimate use -- write a component, create an environment where it can be used, compile and run it, so if you did something seriously wrong, you will notice it sooner. Typos, accidentally omitted pieces of algorithm, brain farts of various other kinds, will be caught in the most straightforward way possible. More importantly, if developer notices that he makes really stupid mistakes, he will know that he should better get some rest before he will do something subtle yet truly disastrous.

But this is not all that people use this feature for. There is always a temptation to write something that "feels right" that maybe looks like elegant solution but developer can't clearly see if it is actually right. Or, worse, developer just copy-pastes an example expecting that he will tweak it until it does what he expects. Developer compiles the code, sees how it fails, makes some change that his intuition suggests, and compiles it again... Eventually things start resemble a working program, so he enables debugger and single-steps through the code he wrote in such manner, looking for the point when things go "wrong"...

Once on that path, developer is out of the realm of engineering. Out of anything even remotely related to any valid development practice. He is in a cycle of what I call "perma-debugging". The most clear sign of perma-debugging is the fact that developer has only most foggy idea of how his code actually works. Unfortunately this is also something that developer is least likely to admit at that point -- even to himself. Things seem to be right. Glance at a code does not reveal anything egregiously wrong. The code looks exactly like the code that works. If something is wrong, it must be some minor problem, something that can be fixed with the next tweak... Eventually the sequence of tweaks works, and code passes the tests -- at least ones that developer himself thought of.

Without fail, the code at this point is still absolutely wrong, and its misbehavior will produce weird, hard to detect problems in the future. Faced with problems that seemingly defy analysis, developer will tweak things again -- this time probably in another place. Eventually tweaks work, only to fail again later... The problem is in the nature of this method itself -- its direction is to produce mistakes and conceal their symptoms rather than to fix them. Initially code does not even have "mistakes" -- it's plain, absolutely, definitely wrong. It was not developed as a result of clear understanding, analysis, design or application of anything applicable. It may be imitation (a.k.a. cargo cult programming). It may be guesswork ("this function/data structure/... seems to do that, let's use it..."). It may be making things up. There is no way for it to produce anything close to the intended result, except completely by accident.

However it is likely to be "close enough". Analogies usually end up with something somewhat related to what they look like. Guesswork usually gets some aspect right. Examples, however little applicable, usually contain working code that does something. Wrong types stand out in compiler errors, so they can be easily "fixed". Miscalculated values or wrong sizes may still let the function run. Wrong offsets in a structure that contains C strings will still likely overlap with the strings in question. Non-null-terminated character arrays are very likely to have null byte somewhere after them, and with very high probability it will be within 2-4 bytes if the next field is a non-negative integer. Thread-unsafe procedures don't break things most of the time when used in multithreaded code. Freed memory still contains its old contents until it happened to be used by something else. Bad error handling won't be called at all in most imaginable test examples unless they specifically cover errors...

There are millions of ways why code that seems right will almost work right, and usually there is some tweak that makes it seem to work absolutely right in some particular conditions, too, without making it any less wrong. And a person in a perma-debugging cycle is almost bound to stumble upon such a tweak, no matter how wrong he understands what is going on. Once the "problem" is "fixed", the code is poisoned with heavily camouflaged bug worthy of the Underhanded C Contest. One can only guess, when something terribly wrong will happen, but once it will, it is very unlikely that "seemingly correct" code will stand out enough to warrant any scrutiny. If code reviewer (if there will be a code reviewer) will be lucky, he will notice suspicious clumsiness in a workaround. The amount of time necessary to find such a problem may be enormous.

Unfortunately the level of understanding that developers get while dealing with new languages and environments is often not nearly clear enough to avoid this "technique" and still participate in projects. It is perfectly valid for a developer to do some experimenting with an environment to determine how exactly it behaves in some situation that wasn't clearly taught or documented. But those things have no place in production environment. Anything short of a crystal clear understanding will eventually lead a developer to a dilemma -- stall the development and shift all his efforts toward obtaining the missing knowledge, or enter a perma-debugging cycle. That's assuming that developer recognizes the problem in the first place -- plenty of developers are accustomed to do-by-example "tutorials", so analogy and guesswork are their primary tools to begin with.

The only way to avoid that problem is to be constantly aware of it and look for its symptoms in your own work. Sometimes all that is necessary, is an observation:
"I am tweaking broken code. I can not reliably predict the effect of a change I am making. The code, or my understanding of it, or most likely both, are wrong. This means, I am not really fixing anything. I should stop and re-do it while maintaining full understanding what it is and does"

...To be continued...
2 comments or Leave a comment
mackys From: mackys Date: May 23rd, 2011 03:24 pm (UTC) (Link)
> I assume that either what I have expressed is surprisingly widely understood

That's it in a nutshell. Pretty much the entire industry sucks, and anyone with an IQ larger than their shoe size knows it. :P
From: nickhalfasleep Date: May 23rd, 2011 06:52 pm (UTC) (Link)
I think you've stated the obvious, but it's only obvious to those who have done any amount of programming and shipping.

For me, and the software I write and ship, important routines have boundary conditions beyond which they fail and report errors. This saves me in conditions I know, but as you say, the unknown conditions are what bites you in the rear.
2 comments or Leave a comment