Software Entropy

Defining Entropy

Entropy is a measure of chaos, or disorder, in a system.

My college physics professor described entropy using two shoe closets.

Imagine a clean shoe closet, where all shoes are paired and sorted by color. The closet’s entropy is the total number of arrangements its shoes can have. A clean closet’s entropy is relatively small. There may be a few pairs of grey or blue shoes that can be switched around – but this doesn’t add much complexity. In a closet with low entropy, it’s easy to add or remove shoes from that closet as needed.

Now imagine a messy shoe closet. None of the shoes are paired, and they’re all tangled in a big pile. How many possible combinations can these shoes be in? You can quickly find out by trying to pull out the pair you want. The messy shoe closet has a much greater entropy than the clean one.

In short, we measure entropy by counting the number of possible states a system can be in. More states mean more entropy.

Entropy in Software

In software, our building blocks are simple enough for us to measure entropy in a crude way. Take this model for example:

Transaction(
  createdAt: String
  buyerId: String,
  sellerId: String
  amount: Int
)

As simple as it seems, this model is like our messy shoe closet. There are many more ways for this model to be wrong than there are for it to be right. We can see that by comparing it to an organized shoe closet:

Transaction(
  createdAt: DateTime,
  buyerId: UserId,
  sellerId: UserId,
  amount: Price
)

When createdAt was an arbitrary string, it could take on invalid values “foo” and “bar” just as easily as a valid value “06-23-2020”. There are many more possible states that the field can be in, and most of them are invalid. This choice of a broad data type allows chaos into our model. This unwanted chaos leads to misunderstandings, bugs, and wasted energy.

When each model is strongly typed to a strict set of values, this chaos is minimized. DateTime, UserId, and Price are typed such that all possible values are valid. Accordingly, these types are more predictable, easier to manipulate, and lead to less surprises in practice.

As in life, entropy is not all bad – some of it is desirable and some of it is not. In software, we need entropy to a certain extent: our code is valuable because it supports a variety of possible dates, users, and prices. But when this chaos grows beyond the value it adds, our software becomes painful to use and painful to maintain.

Modeling Software Entropy

Given our observations, we can describe a simple rule:

complexity = number of total possible states

A construct with only a few possible states is simple. Booleans and enums are much simpler than strings. A system with one moving piece is much simpler than a system with many moving pieces.

Sometimes, our problems are essentially complex. In these cases, our solutions need some essential complexity to match. But when does essential complexity become unnecessary? In these cases, we can use another rule:

cleanliness = number of valid possible states / number of total possible states

If there are thousands of total possible states, but only two of them are valid: it’s a messy solution. A simple example of this is representing a boolean value as a string.

if value == "true": do this
else if value == "false": do that
else: throw error

There are many ways for this code to go wrong; not just in execution but also in interpretation. Keeping our solutions clean improves correctness, readability, and maintainability. It’s one of the primary measures of “quality” in my view.

Minimizing Software Entropy

Given these definitions, we can ask ourselves some questions to guide our software decisions:

How many possible states does this solution have?
How many of those states are invalid?
Is there any way to make the solution simpler, by trimming the number of total possible states?
Is there any way to make the solution cleaner, by trimming the number of invalid possible states?

The power of this concept is that it smoothly scales up and down the ladder of abstraction. It applies to basic data types just as well as it does to solution architecture and product development.

How many moving pieces does our solution need? When an unimaginable requirement flies in and tries to blow our solution to the ground, how many pieces can be left standing? When an unexpected input arrives, do invalid states propagate across the system, or are they contained and eliminated on sight? In short, how clean is our solution?

To make life possible, we utilize chaos by creating complex systems that support a diversity of people and their use cases. To make life predictable, we combat undesirable chaos by keeping those systems as clean and orderly as possible.

In software, we work in a world where chaos is measurable and cleanliness is achievable. We just need the right set of signals and responses to make it happen.