James Peralta

Why You Need to Know System Characteristics

You need to understand system characteristics to make the right design decisions and satisfy non-functional requirements. When you say you want a system to be scalable or reliable, you have to know what that actually means—otherwise you can't design for it. This article covers what scalability, reliability, availability, and the rest mean so you can translate NFRs into concrete architectural choices.

Scalability

Scalability is the capability of a system, process, or network to grow and handle increased demand. Any distributed system that can continuously evolve in order to support a growing amount of work is considered scalable.

Reliability

Reliability refers to the ability of a system to continue operating correctly and effectively in the presence of faults, errors, or failures. In simple terms: a distributed system is considered reliable if it keeps delivering its services even when one or several of its software or hardware components fail.

If you want something to be reliable, try to remove single points of failure. Think about how systems can reboot, have replicas, or save to memory.

Example: If a user has added an item to their shopping cart, the system is expected not to lose it.

Availability

Availability is the time a system remains operational to perform its required function in a specific period. It's a simple measure of the percentage of time that a system, service, or machine remains operational under normal conditions. You'll hear terms like five nines (99.999%), six nines, seven nines—that's availability.

Important distinction: if a system is reliable, it tends to be available—but not the other way around. You can have something available just by keeping it up as long as possible, but it might not be reliable: it might have no redundancy and no fault tolerance, so it could still go down easily. You might just be really fast at bringing it back to life.

Efficiency

Efficiency relates to latency and throughput (or bandwidth).

Latency = response time. For a server request: how long does it take to send a request and get back a response?
Throughput = how many things can something process in a single unit of time?

Understand both when you think about efficiency.

Serviceability and Manageability

These are also important when designing a distributed system, because we need to know how easy it is to operate or maintain something.

Serviceability or manageability is the simplicity and speed with which a system can be repaired or maintained. If the time to fix a failed system increases, availability will decrease.

System Characteristics