суббота, 28 апреля 2012 г.

Complexity Conservation Law

Complexity Conservation Law

Complexity Conservation Law

Translation by Michael Kapelko (kornerr AT GoogleMail)
20120428



Original article by Igor Tkachev in Russian: http://rsdn.ru/article/philosophy/Complexity.xml

Forward

It is a simple task to make things complex, but a complex task to make them simple.
-- Meyer's Law

There are many practices, concepts, patterns and other scary words we use everyday in our professional lives, but often blindly, without even wondering why we do it. Why are those things necessary, are they good or bad, when are they good or bad? Why do we need all these concepts? The answer is obvious. In the end, it all exists to help us fight software development complexity. Now it's time to question ourselves - what is complexity and how can knowing what complexity is help us to better understand and use the concepts that exist to fight complexity?

It's a bit hard to answer that question. Complexity is a versatile concept, contradictory at times. In short, complexity can be defined as a measure of efforts necessary to solve a problem. When working with code, we usually have the following types of complexity:

  • quantitative complexity (many characters);
  • code perception complexity;
  • code alteration (flexibility) complexity;
  • algorithmic or intellectual complexity, roughly speaking, how smart the person solving a problem is;
  • structural complexity - application architecture complexity;
  • learning complexity or entry threshold - minimal level of skill and knowledge necessary to understand certain solution of a problem;
  • problem understanding complexity.

What makes it even harder is that all these types of complexity are interdependent and have no strict boundaries. Complexity itself can be either objective or subjective to make someone's life harder. Fortunately, we're not interested in subjective complexity; our aim is objective complexity. How does one fight objective complexity? In my opinion, the main problem is that there's no way to completely remove objective complexity. We can decrease, increase, divide and multiply complexity; we can create it out of nothing, especially to complicate someone's life, but it cannot be removed completely. We can transform complexity into other kinds, optimize it, rearrange it, and, in the end, get it under control. Virtually all concepts of software development strive to achieve that purpose - transform complexity, optimize efforts to solve a problem. By transforming complexity, we get an opportunity to better handle it and, in consequence, better control the code we develop. This is, in essence, our main purpose - we must control the code, not let the code control us.

Complexity transformation is essentially one simple thing - by removing complexity from one place, it is always added somewhere else. If this doesn't happen, most likely we can't see or understand it yet. Complexity never leaves without a trail - it transforms itself into other kinds of complexity. Let's call it complexity conservation law.

Remark

To prevent misunderstanding, the author wants to calm those that understand the article literally. The author acknowledges he didn't discover a new nature law. Moreover, he understands that strictly scientifically speaking no such law exists. The choice to name the article that way is stipulated by the author's liking for pompous and provocative titles. That's why there's no need to look for formal definitions and proofs in the article. The article's purpose is to classify complexity types, better explain their nature, unforeseen consequences, and their influence on code development.

Complexity conservation law leads to a simple conclusion - there's no ideal way to fight complexity. It's impossible to apply a concept or pattern and only decrease complexity. We always sacrifice something. With that knowledge, we can pretty accurately measure the ratio of what we spend and what we receive. Based on that, we can decide if it's reasonable to apply this or that solution in each case.

Let's have a look at each type of complexity.

Quantitative complexity

This is a case that involves a lot of written code in order to solve a problem. If most of this code is of the same kind, we can fight such complexity with the code reuse concept. Sometimes it helps a lot, sometimes only a tiny bit. For the user, code reuse decreases quantitative complexity, code perception complexity, code alteration complexity, and algorithmic complexity, but increases learning complexity. For the writer, it increases practically all types of complexity.

When is it not helpful? It doesn't help much when we try to encapsulate the code for reuse beforehand. It can be reused, but it happens not to be needed in practice. We have added complexity into the problem solution, but gained nothing. It also doesn't help much when the author doesn't have enough skill and/or brains. It's no secret that one needs a certain skill level to develop libraries and reused components. If one does it without thinking, such code won't be used by anybody, at best. At worst, it will complicate both perception and flexibility everywhere it's used.

On the other hand, high quality sets of reused components (libraries) can significantly decrease complexity. Take, for example, the System.String class. It is easy to use and is self-sufficient. All string handling complexities have been shifted from our hands into those of Microsoft developers and thousands of servers testing the class efficiency in all possible versions and configurations 24/7. It's a complexity decrease for us, often for free, but not necessarily 100% free.

This concerns almost all the tools we use to solve problems. Modern operating systems, frameworks, libraries, development environments, programming languages - development of all this is aimed at shifting as much complexity as possible from us into these tools. Sometimes we pay for it with money and always with increased learning complexity, but the cost is worth the quality of the tools.

Perception complexity

This type of complexity is mostly about code formatting, not its contents. It's pretty easy. Open up any chunk of code and note the time you need to understand how it operates. Perception complexity depends on many factors: code style, following naming conventions, algorithm clarity/confusion, used tools' expressiveness, development environment including code highlighting and navigation, etc. Also, perception complexity has a direct relationship with algorithmic complexity and entry threshold.

To avoid confusion with other complexity types, you can imagine that you are left without your favourite Visual Studio, IDEA or any other Eclipse, and you have to edit foreign, badly formatted, incosistent with any naming convention code in Notepad. Also, with all monospace fonts absent and half of tabs replaced with spaces. Did you imagine it? It’s horrible!

Alteration complexity

Simplicity of code perception does not mean simplicity of code alteration. Of course, simple code is easy to change, but the main property of code alteration complexity is its flexibility. Simple code is not always flexible, and flexible code is not always simple. This, again, proves that complexity cannot be reduced, it can only be transformed. Many practices and patterns exist to increase code flexibility; consequently, they usually increase both perception and learning complexities. Let's take any pattern of such kind, say, IoC or Visitor. Some IoC implementations introduce additional entities to the code, make the algorithm less obvious and increase perception complexity - particularly, the navigation through it. Visitor divides a whole hierarchy structure processing algorithm into many pieces which, in turn, significantly increase perception complexity. This always happens. Something removed here results into another thing added elsewhere. Is it worth to strive for total code flexibility then? It has to be decided on a case by case basis. Sometimes we gain significantly and enforce code control, and other times we only introduce additional complexity to code without gaining anything and completely losing control over it. It's important to remember that, according to the deduced law, any pattern is at the same time an anti-pattern, so you should consider both positive and negative effects of pattern application. In some cases the most beautiful pattern can cause more harm than good.

Algorithmic complexity

In short, algorithmic or intellectual complexity is the minimal necessary level of intellect one needs to solve a problem. Particularly, it is the ability to hold a problem in one's head as a whole. Frequently this complexity is confused with learning, knowledge level, but it's different. Programming itself presumes having a certain amount of brains already, but it's no secret that different people use different level of brain activity to solve the same problem. This isn't the most important thing. In fact, in life there aren't many algorithms a simple mortal can't grasp. The most important thing is that algorithmic complexity is easily increased in magnitudes by small code manipulations, usually by mixing different algorithms' implementations in one place. Suppose we have an input stream of data, output stream of data, and data processing algorithm. Let's see how much this may cost.

Suppose we parse XML, process it a bit, and produce HTML. The algorithm of input stream parsing is pretty simple and has an intellectual complexity of, say, 3 units, data processing - 4 units, HTML output - 2 units. Suppose mixing it results in 3 * 4 * 2 = 24 units of complexity. Suppose our intellectual limit equals to 100 units. Thus, we have easily solved the problem with only a little of brain activity. Now, suppose we're writing a compiler, and at some place we mix text parsing, executable code generation, and some optimizations. These algorithms are obviously more complex than those of an XML example. Let's value each of them as 10 units; this results into 10 * 10 * 10 = 1000 units of complexity, i.e., it exceeds our intellectual limit by a factor of 10.

Obviously, everything happens with different formulas and units in life, but you should get the idea. The more complex the algorithm and the bigger the mix of them in one place, the faster the problem algorithmic complexity grows and the faster we lose control over it.

Let's take a look at Single Responsibility Principle (SRP).

Remark

I take examples where good and correct practices intentionally look not as good. It's not a way to diminish them, it's a way to get a better understanding of their core, looking closer at their not so obvious properties.

And now, SRP. SRP is the principle that allows to control algorithmic complexity with the help of decomposition. The principle dictates to divide algorithm into units, each of which does its own job. Thus, complexity of a problem is no longer a multiplication, but a sum, and most importantly the algorithmic complexity of each unit stays at its starting level. This way we don't need to handle an algorithm with the complexity of 1000 units, but 3 algorithms with complexity of 10 units each. What is so bad about this? Let's recall the complexity conservation law. SRP does not only decrease the algorithmic complexity of the code, but it also increases perception complexity, introduces quantitative complexity and, in some way, decreases flexibility. For the compiler example, such a trade will be well worth it because the introduced complexity is hardly seen through a microscope, compared to the rest of the compiler code. But what happens if we apply the principle for XML to HTML code example? First of all, we would have to introduce intermediate data structure (like AST for our compiler) and place the 3 algorithms into different methods or even classes. To combine all this, we would also have to add some sort of coordinating code. Our initial algorithm took about a hundred simple lines; now we can get 3, 4, or even 5 hundred lines placed in different modules. Will the gain be adequate to the introduced complexity? The answer is obvious.

One may think that such principles have to be used only in complex systems where SRP gives a lot of benefits. But the problem is that SRP dictates to divide a complex thing into several simpler ones, which are not guaranteed to be simple enough and might need further SRP application. When do we stop? We have to decide using a case by case basis. If you ask me, I do the following: if I can't understand the necessity at the moment, I prefer to postpone the SRP application until I can't hold a problem as a whole in my head and feel discomfort. If you feel the same, then stop and refactor.

Remark

Digression for managers.

When your programmers tell you they need time to refactor the code, it doesn't mean they are stupid, short-sighted or trying to waste time surfing the Internet. If time has come to refactor the code, it must be refactored. Why not do it right at the beginning? Because we might introduce unnecessary complexity without being sure that we will gain anything. Also, a more complex code requires more time working, and nobody guarantees your project would be at the stage it is now. The purpose of refactoring is to decrease and transform complexity, to find out and remove problematic ideas. This results in a better control over the code and its complexity.

Besides decomposition, algorithmic complexity can be controlled with the help of additional abstraction levels. If we imagine decomposition as a tool that allows us to cut a problem vertically by functions, abstraction arranges additional levels in layers to the logic, hiding implementation details of the previous layers. Thus, by abstracting from details immaterial to the new abstraction level, we free ourselves up to new things. But do not overdo this. As you know, any problem can be solved by introducing an additional abstraction level except the problem of having too many abstraction levels.

The third and least effective way to fight intellectual complexity is by being "in the zone". If you would think hard and long enough about a problem solution, in the end, you will build its adequate model with all nuances which you can put as a whole into your grey matter and solve it more or less successfully. I'm almost sure any problem can be solved this way. At this point, however, I recall the following phrase: "Anybody can be an academician, but while one needs 30 years for that, another needs 300". Besides, there's yet another defect of this method: as a rule, after solving a problem, the grey matter tries to get rid of the scare that filled it during solution time, and with time transforms a strict problem model into an area of vague doubts. The second approach to the problem requires almost the same amount of time to go into the zone, and nobody guarantees the new model will be identical to the previous one.

A note on optimizations. Most optimizations introduce an increase in quantitative and algorithmic complexities (sometimes in magnitudes) and a decrease in flexibility, converting one simple algorithm into a badly readable mix of two algorithms - the initial one and the optimization one. If the benefits are barely visible, it is better not to do it. However, it's not always obvious. Fortunately, declarative optimization ways start to appear nowadays which solve a problem leaving the initial code almost untouched. For example, PLinq and its AsParallel method, Nemerle / BLToolkit result caching attributes of called methods, etc. can be attributed to such an optimization. I think that the number of such tools (and their positive effect) will grow in the future. Nonetheless, it's important to understand now that the main problem of optimization is the algorithmic complexity's increase due to the mix of different algorithms in one place. Our task is to minimize any negative outcomes of such a mix.

Structural complexity

I think it's pretty logical to divide a code into two kinds: the one inside the methods and the one outside them. The one inside is algorithms + the control code. The one outside is grouping algorithms into modules, components, classes and other structuring. If you have difficulties changing your code, you should understand where the problem is located - inside the methods or outside them. Oddly enough, the problem is usually located in grouping, and thus refactoring should be started in there. Algorithms are easy to deal with: they either work as good as expected, not so good, or not at all. Not everything is obvious with algorithms' grouping. It's all about strong coupling, weak coupling, and a number of corresponding patterns and practices. Unfortunately, the terms of reference and functional requirements do not define the code structuring rules (except the cases where structuring is part of a task). As a result, grouping becomes a problem to solve the programmer's problems, i.e., it becomes a code support problem. Code support problems are code flexibility problems, difficulty of their change. We group codes into modules and classes for easier management, modification and extension. In complex systems, code grouping and competent organization are key to reach its acceptable flexibility. The more complex the system, the more attention should be paid to code grouping and structuring.

Structural complexity is tightly coupled with algorithmic one, and is often both the means to decrease it and increase it. SRP example illustrates this clearly. In the above examples, it's clear when to apply SRP and when not to. But it's not all that simple in reality. The verge between "when to apply" and "when not to" is often so unsteady, that there's only room for intuition and preferences. But, as we know, logic ends where preferences begin. Luckily, the rule works the opposite way as well. Sometimes neither logic nor intuition works, and the most highly acknowledged patterns and practices lead to code complexity increase. It's called over-architecture in programming which happens nearly as frequently as total absence of architecture. Actually, when you're in a difficult situation and you don't know what to decide, it's not that hard to find out the usefulness of a tool. We already know it brings additional complexity into the code. The only thing left is to ask yourself, or the one who insists on using it, what problems it solves and what benefits we gain. Often that's the only question you need to ask yourself to make the right choice.

Learning complexity

This might be the most non-obvious type of complexity as it relates to code indirectly, depending on the man who develops it. In other words, the same code can be both easy and difficult for different men. Knowledge of programming paradigms and the ability to use modern tools allow us to decrease all complexities by increasing the entry threshold and transforming complexity parts into used tools. This, in turn, allows us to significantly simplify the code and to solve more complex problems, and the other way around. We can solve a problem using primitive and outdated tools also, but the solution might be more complex for perception, take more space, and be less flexible.

But is this all that simple, does a high level of knowledge always lead to benefits? If you apply knowledge not just to apply it, but to solve a problem, as a rule, the result is beneficial. Only those members of the team lose for whom the necessary entry threshold is yet unreachable.

Often some managers try to minimize risks by maintaining quite a low knowledge level of the team. This allows for easier replacement of an unskilled person by another unskilled person. It's not hard to conclude, according to the complexity conservation law, all complexities of applications developed in such teams are located in their code. As a rule, there is a lot of such code; it is not flexible, hardly readable, and is of a great algorithmic complexity which is frequently high above the accessible team member level. The result is also quite predictable - delays of releases, budget excesses, project failures. How do we fight it? We must teach people by transferring the complexity of the code into their heads and used tools to make a balance between manager risks and complexity control. It is vital to always remember that a primitive brain won't solve a complex problem.

Problem understanding complexity

Obviously, before you can solve a problem, you have to understand it. Moreover, while the relationship between problem understanding and problem solving is quite evident, understanding and solving complexities has no relation at all. Often a problem's definition is simple, but the solution is complex, or vice versa. Having knowledge of a problem area or having a previous experience in solving similar problems, qualitative terms of reference significantly decrease the problem understanding complexity. On the other hand, outdated documentation can increase one's understanding complexity.

The simplest way to fight the understanding complexity in big projects is to have a specially trained person (an architect or analyst) who delves into all of the details of a problem and translates said details into a language that different developers can understand. The developers, in turn, can concentrate on technical details without needing to know everything about the system being developed. Such an approach is pretty reasonable, but the business becomes vulnerable if important knowledge is bound to only one or two persons.

Another widespread way to fight the understanding complexity is to postpone the time when you get to understand a problem until a better time. I.e., we start to solve a problem here and now, having only general understanding of the problem. Detailed understanding of what kind of problem we solve will come later, almost at the time we finish it. XP and Agile are of this approach. They allow the developer to control chaos in a problem statement. This approach is characterized by a higher code flexibility requirements. In fact, flexibility is the fundamental requirement in this case.

Note that this type of complexity is tightly coupled with learning complexity. In the end, the only difference between them is probably that problem understanding complexity is related to the subject area and learning one is related to the used tools.

Instead of conclusion

It would be a lot easier if we could find a way to measure complexity quantitatively, but, I'm afraid, it's impossible. I was intentionally escaping any measurement hints (except for abstract units) because I think it's impossible to compare quantitatively the perception complexity of an awfully formatted code, algorithmic code complexity, code size, learning complexity necessary to understand it, and problem understanding complexity. I guess one can find suitable measurements for each type of complexity separately, and even express them as a function of time which allows it to summarize different types of complexities. However, each of the complexities alone is not of great interest, because they don't allow you to see the big picture. So far we can only operate in "less, more, a lot, and small" terms, but it is often enough. In the end, our purpose is not to measure units, but to keep the code under control. The complexity conservation law helps us understand that any tool may be used with either positive or negative effects. In practice, you should be critical to absolutely any pattern or concept. Even to such well known things as decomposition, grouping or abstraction, because in 100% of cases they always increase code complexity; you have to perspire a lot to decrease it. In fact, you would be better off looking at used tools through a prism of complexity, and not the other way around. If you would take the concept that complexity control in software development is primary, and all the rest is secondary as your rule, most of the used practices become more clear and visible.

For the conclusion, it is worth mentioning that complexity accumulates itself. One time we didn't format the code here, another time we didn't follow the guidelines there, overused SRP or other patterns a bit, or underused it. First, the code grows, then it becomes less readable and flexible, then additional algorithmic or structural complexity is introduced unreasonably, and as a result, we start to lose control over the code. If you don't want this to happen, you have to always, at every development stage, fight excessive complexity; seek and remove it and strive for the simplest solution. Unfortunately, we can't always do that because of one simple fact. As you all know, the most complex thing in the world is simplicity.

I guess that's it. Comments, additions, refutations, and exposures are welcome.

Комментариев нет: