Complexity Conservation Law
Complexity Conservation Law
Translation by Michael Kapelko (kornerr AT GoogleMail)
20120428
Original article by Igor Tkachev in Russian:
http://rsdn.ru/article/philosophy/Complexity.xml
Forward
It is a simple task to make things complex, but a complex task to make them simple.
-- Meyer's Law
There are many practices, concepts, patterns and other scary words we use everyday
in our professional lives, but often blindly, without even wondering why we do it. Why
are those things necessary, are they good or bad, when are they good or bad? Why do we need all these
concepts? The answer is obvious. In the end, it all exists to help us fight software
development complexity. Now it's time to question ourselves - what is complexity and
how can knowing what complexity is help us to better understand and use the concepts that
exist to fight complexity?
It's a bit hard to answer that question. Complexity is a versatile concept,
contradictory at times. In short, complexity can be defined as a measure of efforts
necessary to solve a problem. When working with code, we usually have the following
types of complexity:
- quantitative complexity (many characters);
- code perception complexity;
- code alteration (flexibility) complexity;
- algorithmic or intellectual complexity, roughly speaking, how smart the person solving a problem is;
- structural complexity - application architecture complexity;
- learning complexity or entry threshold - minimal level of skill and knowledge necessary to understand certain solution of a problem;
- problem understanding complexity.
What makes it even harder is that all these types of complexity are interdependent
and have no strict boundaries. Complexity itself can be either objective or subjective
to make someone's life harder. Fortunately, we're not interested in subjective complexity;
our aim is objective complexity. How does one fight objective complexity? In my opinion, the
main problem is that there's no way to completely remove objective complexity. We can
decrease, increase, divide and multiply complexity; we can create it out of nothing,
especially to complicate someone's life, but it cannot be removed completely. We can
transform complexity into other kinds, optimize it, rearrange it, and, in the end, get it
under control. Virtually all concepts of software development strive to achieve that
purpose - transform complexity, optimize efforts to solve a problem. By transforming
complexity, we get an opportunity to better handle it and, in consequence, better control
the code we develop. This is, in essence, our main purpose - we must control the code,
not let the code control us.
Complexity transformation is essentially one simple thing - by removing complexity from one
place, it is always added somewhere else. If this doesn't happen, most likely we can't see
or understand it yet. Complexity never leaves without a trail - it transforms itself
into other kinds of complexity. Let's call it complexity conservation law.
Remark
To prevent misunderstanding, the author wants to calm those that understand the article
literally. The author acknowledges he didn't discover a new nature law. Moreover,
he understands that strictly scientifically speaking no such law exists. The choice
to name the article that way is stipulated by the author's liking for pompous and
provocative titles. That's why there's no need to look for formal definitions and
proofs in the article. The article's purpose is to
classify complexity types, better explain their nature, unforeseen consequences,
and their influence on code development.
Complexity conservation law leads to a simple conclusion - there's no ideal way to fight
complexity. It's impossible to apply a concept or pattern and only decrease complexity.
We always sacrifice something. With that knowledge, we can pretty accurately measure the
ratio of what we spend and what we receive. Based on that, we can decide if it's reasonable
to apply this or that solution in each case.
Let's have a look at each type of complexity.
Quantitative complexity
This is a case that involves a lot of written code in order to solve a problem.
If most of this code is of the same kind, we can fight such complexity with the
code reuse concept. Sometimes it helps a lot, sometimes only a tiny bit. For the
user, code reuse decreases quantitative complexity, code perception complexity,
code alteration complexity, and algorithmic complexity, but increases learning
complexity. For the writer, it increases practically all types of complexity.
When is it not helpful? It doesn't help much when we try to encapsulate the code for
reuse beforehand. It can be reused, but it happens not to be needed in practice. We
have added complexity into the problem solution, but gained nothing. It also doesn't
help much when the author doesn't have enough skill and/or brains. It's no secret
that one needs a certain skill level to develop libraries and reused components.
If one does it without thinking, such code won't be used by anybody, at best.
At worst, it will complicate both perception and flexibility everywhere it's used.
On the other hand, high quality sets of reused components (libraries) can significantly
decrease complexity. Take, for example, the System.String class. It is easy to use and is
self-sufficient. All string handling complexities have been shifted from our hands into
those of Microsoft developers and thousands of servers testing the class efficiency in all
possible versions and configurations 24/7. It's a complexity decrease for us, often for
free, but not necessarily 100% free.
This concerns almost all the tools we use to solve problems. Modern operating systems,
frameworks, libraries, development environments, programming languages - development
of all this is aimed at shifting as much complexity as possible from us into
these tools. Sometimes we pay for it with money and always with increased learning
complexity, but the cost is worth the quality of the tools.
Perception complexity
This type of complexity is mostly about code formatting, not its contents. It's
pretty easy. Open up any chunk of code and note the time you need to understand
how it operates. Perception complexity depends on many factors: code style,
following naming conventions, algorithm clarity/confusion, used tools' expressiveness,
development environment including code highlighting and navigation, etc. Also,
perception complexity has a direct relationship with algorithmic complexity and
entry threshold.
To avoid confusion with other complexity types, you can imagine
that you are left without your favourite Visual Studio, IDEA or any other Eclipse,
and you have to edit foreign, badly formatted, incosistent with any naming convention
code in Notepad. Also, with all monospace fonts absent and half of tabs replaced with
spaces. Did you imagine it? It’s horrible!
Alteration complexity
Simplicity of code perception does not mean simplicity of code alteration. Of course,
simple code is easy to change, but the main property of code alteration complexity is
its flexibility. Simple code is not always flexible, and flexible code is not always
simple. This, again, proves that complexity cannot be reduced, it can only be
transformed. Many practices and patterns exist to increase code flexibility;
consequently, they usually increase both perception and learning complexities. Let's
take any pattern of such kind, say, IoC or Visitor. Some IoC implementations introduce
additional entities to the code, make the algorithm less obvious and increase perception
complexity - particularly, the navigation through it. Visitor divides a whole hierarchy
structure processing algorithm into many pieces which, in turn, significantly increase
perception complexity. This always happens. Something removed here results into
another thing added elsewhere. Is it worth to strive for total code flexibility then?
It has to be decided on a case by case basis. Sometimes we gain significantly and enforce
code control, and other times we only introduce additional complexity to code without
gaining anything and completely losing control over it. It's important to remember that,
according to the deduced law, any pattern is at the same time an anti-pattern, so
you should consider both positive and negative effects of pattern application. In
some cases the most beautiful pattern can cause more harm than good.
Algorithmic complexity
In short, algorithmic or intellectual complexity is the minimal necessary level of
intellect one needs to solve a problem. Particularly, it is the ability to hold a problem in
one's head as a whole. Frequently this complexity is confused with learning, knowledge level,
but it's different. Programming itself presumes having a certain amount of brains already,
but it's no secret that different people use different level of brain activity to solve the
same problem. This isn't the most important thing. In fact, in life there aren't many
algorithms a simple mortal can't grasp. The most important thing is that algorithmic
complexity is easily increased in magnitudes by small code manipulations, usually by
mixing different algorithms' implementations in one place. Suppose we have an input stream
of data, output stream of data, and data processing algorithm. Let's see how much this
may cost.
Suppose we parse XML, process it a bit, and produce HTML. The algorithm of input stream
parsing is pretty simple and has an intellectual complexity of, say, 3 units, data
processing - 4 units, HTML output - 2 units. Suppose mixing it results in 3 * 4 * 2 = 24
units of complexity. Suppose our intellectual limit equals to 100 units. Thus, we have
easily solved the problem with only a little of brain activity. Now, suppose we're
writing a compiler, and at some place we mix text parsing, executable code generation,
and some optimizations. These algorithms are obviously more complex than those of an
XML example. Let's value each of them as 10 units; this results into 10 * 10 * 10 = 1000
units of complexity, i.e., it exceeds our intellectual limit by a factor of 10.
Obviously, everything happens with different formulas and units in life, but you should
get the idea. The more complex the algorithm and the bigger the mix of them in one place,
the faster the problem algorithmic complexity grows and the faster we lose control over it.
Let's take a look at Single Responsibility Principle (SRP).
Remark
I take examples where good and correct practices intentionally look not as good. It's not
a way to diminish them, it's a way to get a better understanding of their core, looking closer
at their not so obvious properties.
And now, SRP. SRP is the principle that allows to control algorithmic complexity with the help
of decomposition. The principle dictates to divide algorithm into units, each of which
does its own job. Thus, complexity of a problem is no longer a multiplication, but a sum,
and most importantly the algorithmic complexity of each unit stays at its starting level.
This way we don't need to handle an algorithm with the complexity of 1000 units, but 3
algorithms with complexity of 10 units each. What is so bad about this? Let's recall the
complexity conservation law.
SRP does not only decrease the algorithmic complexity of the code, but it also increases
perception complexity, introduces quantitative complexity and, in some way, decreases
flexibility. For the compiler example, such a trade will be well worth it because the introduced
complexity is hardly seen through a microscope, compared to the rest of the compiler code.
But what happens if we apply the principle for XML to HTML code example? First of all,
we would have to introduce intermediate data structure (like AST for our compiler)
and place the 3 algorithms into different methods or even classes. To combine all this, we
would also have to add some sort of coordinating code. Our initial algorithm took about a hundred
simple lines; now we can get 3, 4, or even 5 hundred lines placed in different modules.
Will the gain be adequate to the introduced complexity? The answer is obvious.
One may think that such principles have to be used only in complex systems where SRP gives
a lot of benefits. But the problem is that SRP dictates to divide a complex thing into
several simpler ones, which are not guaranteed to be simple enough and might need further
SRP application. When do we stop? We have to decide using a case by case basis. If you ask me,
I do the following: if I can't understand the necessity at the moment, I prefer to postpone
the SRP application until I can't hold a problem as a whole in my head and feel discomfort. If
you feel the same, then stop and refactor.
Remark
Digression for managers.
When your programmers tell you they need time to refactor the code, it doesn't mean they
are stupid, short-sighted or trying to waste time surfing the Internet. If time has come to
refactor the code, it must be refactored. Why not do it right at the beginning? Because
we might introduce unnecessary complexity without being sure that we will gain anything. Also,
a more complex code requires more time working, and nobody guarantees your project would be at
the stage it is now. The purpose of refactoring is to decrease and transform complexity,
to find out and remove problematic ideas. This results in a better control over the code
and its complexity.
Besides decomposition, algorithmic complexity can be controlled with the help of
additional abstraction levels. If we imagine decomposition as a tool that allows us to
cut a problem vertically by functions, abstraction arranges additional levels in
layers to the logic, hiding implementation details of the previous layers. Thus, by
abstracting from details immaterial to the new abstraction level, we free ourselves up to
new things. But do not overdo this. As you know, any problem can be solved by introducing an
additional abstraction level except the problem of having too many abstraction levels.
The third and least effective way to fight intellectual complexity is by being
"in the zone". If you would think hard and long enough about a problem solution,
in the end, you will build its adequate model with all nuances which you can put as a
whole into your grey matter and solve it more or less successfully. I'm almost sure any
problem can be solved this way. At this point, however, I recall the following phrase:
"Anybody can be an academician, but while one needs 30 years for that, another needs 300".
Besides, there's yet another defect of this method: as a rule, after solving a problem,
the grey matter tries to get rid of the scare that filled it during solution time, and
with time transforms a strict problem model into an area of vague doubts. The second approach
to the problem requires almost the same amount of time to go into the zone, and nobody
guarantees the new model will be identical to the previous one.
A note on optimizations. Most optimizations introduce an increase in quantitative and algorithmic
complexities (sometimes in magnitudes) and a decrease in flexibility, converting one
simple algorithm into a badly readable mix of two algorithms - the initial one and
the optimization one. If the benefits are barely visible, it is better not to do it. However, it's
not always obvious. Fortunately, declarative optimization ways start to appear nowadays
which solve a problem leaving the initial code almost untouched. For example, PLinq and
its AsParallel method, Nemerle / BLToolkit result caching attributes of called methods, etc.
can be attributed to such an optimization. I think that the number of such tools (and their positive
effect) will grow in the future. Nonetheless, it's important to understand now that the main
problem of optimization is the algorithmic complexity's increase due to the mix of different
algorithms in one place. Our task is to minimize any negative outcomes of such a mix.
Structural complexity
I think it's pretty logical to divide a code into two kinds: the one inside the
methods and the one outside them. The one inside is algorithms + the control code.
The one outside is grouping algorithms into modules, components, classes and other
structuring. If you have difficulties changing your code, you should understand
where the problem is located - inside the methods or outside them. Oddly enough,
the problem is usually located in grouping, and thus refactoring should be started
in there. Algorithms are easy to deal with: they either work as good as expected,
not so good, or not at all. Not everything is obvious with algorithms' grouping.
It's all about strong coupling, weak coupling, and a number of corresponding
patterns and practices. Unfortunately, the terms of reference and functional
requirements do not define the code structuring rules (except the cases where
structuring is part of a task). As a result, grouping becomes a problem to solve
the programmer's problems, i.e., it becomes a code support problem. Code support
problems are code flexibility problems, difficulty of their change. We group codes
into modules and classes for easier management, modification and extension.
In complex systems, code grouping and competent organization are key to reach its
acceptable flexibility. The more complex the system, the more attention should be
paid to code grouping and structuring.
Structural complexity is tightly coupled with algorithmic one, and is often both the
means to decrease it and increase it. SRP example illustrates this clearly. In the
above examples, it's clear when to apply SRP and when not to. But it's not all that
simple in reality. The verge between "when to apply" and "when not to" is often so
unsteady, that there's only room for intuition and preferences. But, as we know,
logic ends where preferences begin. Luckily, the rule works the opposite way as
well. Sometimes neither logic nor intuition works, and the most highly acknowledged
patterns and practices lead to code complexity increase. It's called
over-architecture in programming which happens nearly as frequently as total absence
of architecture. Actually, when you're in a difficult situation and you don't know
what to decide, it's not that hard to find out the usefulness of a tool. We already
know it brings additional complexity into the code. The only thing left is to ask
yourself, or the one who insists on using it, what problems it solves and what
benefits we gain. Often that's the only question you need to ask yourself to make
the right choice.
Learning complexity
This might be the most non-obvious type of complexity as it relates to code
indirectly, depending on the man who develops it. In other words, the same code can
be both easy and difficult for different men. Knowledge of programming paradigms
and the ability to use modern tools allow us to decrease all complexities by
increasing the entry threshold and transforming complexity parts into used tools.
This, in turn, allows us to significantly simplify the code and to solve more
complex problems, and the other way around. We can solve a problem using primitive
and outdated tools also, but the solution might be more complex for perception,
take more space, and be less flexible.
But is this all that simple, does a high level of knowledge always lead to benefits?
If you apply knowledge not just to apply it, but to solve a problem, as a rule,
the result is beneficial. Only those members of the team lose for whom the necessary
entry threshold is yet unreachable.
Often some managers try to minimize risks by maintaining quite a low knowledge level
of the team. This allows for easier replacement of an unskilled person by another
unskilled person. It's not hard to conclude, according to the complexity
conservation law, all complexities of applications developed in such teams are
located in their code. As a rule, there is a lot of such code; it is not flexible,
hardly readable, and is of a great algorithmic complexity which is frequently high
above the accessible team member level. The result is also quite predictable -
delays of releases, budget excesses, project failures. How do we fight it? We must
teach people by transferring the complexity of the code into their heads and used
tools to make a balance between manager risks and complexity control. It is vital
to always remember that a primitive brain won't solve a complex problem.
Problem understanding complexity
Obviously, before you can solve a problem, you have to understand it. Moreover,
while the relationship between problem understanding and problem solving is quite
evident, understanding and solving complexities has no relation at all. Often
a problem's definition is simple, but the solution is complex, or vice versa.
Having knowledge of a problem area or having a previous experience in solving
similar problems, qualitative terms of reference significantly decrease the problem
understanding complexity. On the other hand, outdated documentation can increase
one's understanding complexity.
The simplest way to fight the understanding complexity in big projects is to have a
specially trained person (an architect or analyst) who delves into all of the
details of a problem and translates said details into a language that different
developers can understand. The developers, in turn, can concentrate on technical
details without needing to know everything about the system being developed. Such
an approach is pretty reasonable, but the business becomes vulnerable if important
knowledge is bound to only one or two persons.
Another widespread way to fight the understanding complexity is to postpone the
time when you get to understand a problem until a better time. I.e., we start to
solve a problem here and now, having only general understanding of the problem.
Detailed understanding of what kind of problem we solve will come later, almost at
the time we finish it. XP and Agile are of this approach. They allow the
developer to control chaos in a problem statement. This approach is characterized
by a higher code flexibility requirements. In fact, flexibility is the fundamental
requirement in this case.
Note that this type of complexity is tightly coupled with learning complexity. In
the end, the only difference between them is probably that problem understanding
complexity is related to the subject area and learning one is related to the
used tools.
Instead of conclusion
It would be a lot easier if we could find a way to measure complexity
quantitatively, but, I'm afraid, it's impossible. I was intentionally escaping any
measurement hints (except for abstract units) because I think it's impossible to
compare quantitatively the perception complexity of an awfully formatted code,
algorithmic code complexity, code size, learning complexity necessary to understand
it, and problem understanding complexity. I guess one can find suitable measurements
for each type of complexity separately, and even express them as a function of time
which allows it to summarize different types of complexities. However, each
of the complexities alone is not of great interest, because they don't allow you to
see the big picture. So far we can only operate in "less, more, a lot, and small"
terms, but it is often enough. In the end, our purpose is not to measure units,
but to keep the code under control. The complexity conservation law helps us
understand that any tool may be used with either positive or negative effects. In
practice, you should be critical to absolutely any pattern or concept. Even to such
well known things as decomposition, grouping or abstraction, because in 100% of
cases they always increase code complexity; you have to perspire a lot to decrease
it. In fact, you would be better off looking at used tools through a prism of
complexity, and not the other way around. If you would take the concept that
complexity control in software development is primary, and all the rest is secondary
as your rule, most of the used practices become more clear and visible.
For the conclusion, it is worth mentioning that complexity accumulates itself. One
time we didn't format the code here, another time we didn't follow the guidelines
there, overused SRP or other patterns a bit, or underused it. First, the code grows,
then it becomes less readable and flexible, then additional algorithmic or
structural complexity is introduced unreasonably, and as a result, we start to lose
control over the code. If you don't want this to happen, you have to always, at
every development stage, fight excessive complexity; seek and remove it and strive
for the simplest solution. Unfortunately, we can't always do that because of one
simple fact. As you all know, the most complex thing in the world is simplicity.
I guess that's it. Comments, additions, refutations, and exposures are welcome.