Entropy for measuring Software Maturity
We’ve just released the first version of our commit entropy analysis tool to measure software maturity. This is the first in a series of blog posts that goes into detail about entropy in software development.
In a software development project change is one of the only constant factors. Requirements can change, as can the technical considerations and environmental circumstances. Our jobs as software project managers and engineers is largely managing this ability to change.
As software projects grow, the ability to change often diminishes. This is in contrast to the rate of change, which generally increases through the first releases until a project enters maintenance mode and, eventually, reaches End-Of-Life. This difference makes software projects unpredictable and has given rise to methodologies like Agile, SCRUM and Lean to streamline the rate of change. These methodologies do not, however, help increase the ability of a software project to support this rate of change.
One way to measure a software project’s ability to keep up with the rate of change is by utilizing the metric of entropy.
What is entropy?
Entropy is a term from information theory that is inspired by the concept of entropy in thermodynamics. In thermodynamics, and in general, entropy is a measure of disorder in a system. It’s this disorder that we are also interested in in software development.
Entropy in the context of information was first defined by Claude Shannon in 1948 in his famous paper: “A Mathematical Theory of Communication”. Shannon defines entropy as the amount of information you need to encode any sort of message. In other words, how much unique information is present in the message. If you have a coin that always turns up ‘heads’, you don’t need anything to record the outcome of a coin toss. A regular coin, you need one ‘bit’ of information to track if the coin came up heads or tails. A six-sided die: 2.6 bits (yes, in entropy, you can have fractions of bits).
This concept is often used in cryptography. The ‘entropy’ of a password is how many bits are required to store all possible combinations of a password. A 4-digit pincode carries less entropy than 16 alphanumeric characters with special characters mixed in. In cryptography, higher entropy means that it’s harder to brute-force, since there are more possible combinations.
The same concept can be applied to changes made in a software project. If a change only impacts a small part of the system, that change can be recorded with very few bits of information. If changes touch a large part of a system, you need many more bits to encode that change.
Using this logic, we can determine the impact of changes by calculating the entropy that each change carried. And in a typical software project, the larger part of the system you need to modify to implement a feature or change, the harder it is to implement that change. Therefore looking at the entropy of past changes tells us something about our ability to make those changes efficiently.
Coupling in software
One of the most common goals in software architecture is managing coupling. Coupling is the dependency of one part of the code to another. They are ‘coupled’ together, either explicitlyor implicitly.
Explicit coupling happens when one part of the code directly depends on or uses the other. This is unavoidable, but should be carefully managed. A tightly coupled system can become brittle and hard to change. Most design patterns that deal with explicit coupling implement some form or part of the SOLID or DRY principles.
SOLID is an abbreviation of 5 best practises in Object Oriented software development: Single responsibility, Open-closed, Liskov substitution, Interface segregation and Dependency inversion. The impact of these best practises is beyond the scope of this article, but they’re all designed to help create maintainable software architectures.
DRY stands for Don’t Repeat Yourself, and is an often heard mantra for developers. Not only does repeating yourself create additional work now, it also increases the maintenance burden later on. All repeated sections will probably require the same bug fixes and changes applied to them, if the developer remembers that the duplicate sections exist!
Implicit coupling can occur when there is no direct relationship in the code between two parts, but they are conceptually or otherwise linked together. This is usually harder to detect since it requires knowledge of how different components interact to see if changes in also affect another.