10 Fallacies of distributed computing

Navaneeth Sen
9 min readDec 21, 2022

This is something which I read in the wikipedia recently and thought I will make a note of it for myself and anybody else who wants to understand these fallacies and make better decisions in future system designs.

Wikipedia lists 8 fallacies and I have added two more to the list after watching Neal Fords Oreilly session on Communication Styles for Distributed Architectures & Microservices where he pointed out two more.

There are several fallacies of distributed computing that can be misleading when designing and implementing distributed systems:

1. The Network is Reliable: This fallacy assumes that the network is always available and all messages are always delivered correctly between the source and destination addresses. However, networks can experience failures and delays, and these messages can be lost or get corrupted during the transmission.

In distributed computing, it is important to design systems that can handle any kind of network failures and data loss, rather than believing that the network will be reliable.

This can be done via the use of redundant systems, error resolution mechanisms, and other techniques. This fallacy can also lead to problems such as data loss, system downtime, and unreliable performance, which can have serious consequences in distributed computing systems that rely on the network for critical functions.

2. Latency is Zero: This is a fallacy in distributed computing because it is impossible for there to be zero latency in a distributed system. Latency refers to the time it takes for a message to be transmitted from one point to another.

In a distributed system, messages are often transmitted between nodes that are physically separated, meaning that there is a physical distance between the two points. This physical distance creates a delay in the transmission of the message, which is referred to as latency.

Even if the nodes in a distributed system are connected by high-speed networks, there will still be some amount of latency due to the time it takes for the message to be processed by the receiving node and the time it takes for a response to be transmitted back to the sender.

Furthermore, latency can be affected by other factors such as network congestion, network reliability, and the workload of the nodes. All of these factors contribute to the overall latency in a distributed system, making it impossible for there to be zero latency.

3. Bandwidth is Infinite: This is a fallacy because bandwidth is a physical constraint that is limited by the capabilities of the communication channels being used. It is not possible for bandwidth to be truly infinite.

In a distributed system, data must be transmitted between different nodes, such as computers or servers, over a network. The amount of data that can be transmitted over a network in a given period of time is known as the bandwidth of the network. Bandwidth is often expressed in multiples of bits per second (bps) or bytes per second (Bps).

The bandwidth of a network is determined by the capacity of the communication channels being used and the amount of traffic on the network.

For example, a network with a high-capacity communication channel, such as a fiber optic cable, will generally have a higher bandwidth than a network with a lower-capacity communication channel, such as a copper wire. In distributed systems, it is important to carefully consider the bandwidth requirements of the system and ensure that the network has sufficient bandwidth to handle the data transfer needs of the system. If the bandwidth of the network is not sufficient to support the data transfer requirements of the system, it can lead to delays and other performance issues.

Therefore, the idea that “bandwidth is infinite” in distributed computing is a fallacy because bandwidth is a limited resource that must be carefully managed to ensure the performance and efficiency of the system.

4. The Network is Secure: This is a fallacy of distributed computing because it assumes that the network used by a distributed system is secure and that data transmitted over it cannot be accessed by unauthorised parties.

However, this is not always the case in reality. Networks are not secure because of the below reasons, and it’s important to take steps to protect against these threats when designing and implementing distributed systems:

  1. Vulnerabilities: There are always potential vulnerabilities in any system, including distributed systems. These vulnerabilities can be exploited by attackers to gain unauthorised access or disrupt the system.
  2. Interdependencies: Distributed systems are made up of multiple interconnected components, each of which may have its own vulnerabilities. If one component is compromised, it can potentially affect the security of the entire system.
  3. Complexity: Distributed systems are complex and often involve multiple layers of security controls. It is difficult to ensure that all of these controls are properly configured and functioning properly at all times.
  4. Human error: Humans are fallible and can make mistakes that can compromise the security of a distributed system. For example, they may click on a malicious link, use weak passwords, or fail to properly secure their devices.

There are several ways to improve the security of a distributed system, including:

· Encrypting data: Data transmitted over a network can be vulnerable to interception, so it’s important to use encryption to protect sensitive information.

· Using secure communication protocols: Protocols such as HTTPS and SSL can help to secure communication between nodes in a distributed system.

· Implementing authentication and access control: It’s important to ensure that only authorised users have access to sensitive data and resources within a distributed system.

· Regularly updating software and security protocols: Keeping software and security protocols up to date can help to protect against new security threats as they emerge.

By taking these and other security measures into account, it’s possible to build distributed systems that are secure and resistant to threats such as hacking and data interception. However, it’s important to remember that no system is completely secure, and it’s always a good idea to be prepared for the possibility of security breaches and to have a plan in place for responding to them.

5. Topology Doesn’t Change: This is a fallacy of distributed computing because it assumes that the topology of a distributed system, or the way that nodes are connected, remains constant over time.

A topology in distributed systems refers to the way in which the various components of the system are connected and interact with each other. There are several different types of topologies that can be used in distributed systems, including:

  1. Client-server topology: This type of topology involves a central server that provides resources or services to clients that request them.
  2. Peer-to-peer topology: In this type of topology, all nodes in the system are equal and can communicate directly with each other.
  3. Hybrid topology: This type of topology combines elements of both client-server and peer-to-peer topologies.
  4. Star topology: In this type of topology, all nodes in the system are connected to a central hub or switch.
  5. Ring topology: In this type of topology, nodes in the system are connected in a loop, with each node connected to two other nodes.

The choice of topology for a distributed system depends on the specific requirements and goals of the system.

In reality, the above topologies of a distributed system can change due to a variety of factors such as node failures, network infrastructure changes, and updates to the system. Ignoring the fact that the topology of a distributed system can change can lead to problems such as system downtime, data loss, and reduced performance. It’s important to design distributed systems with the assumption that the topology may change, and to implement mechanisms to handle these changes in a graceful and efficient manner.

For example, a distributed system may need to implement mechanisms to detect and handle node failures, to reroute data around failed nodes, and to recover from data loss. It may also need to be able to handle changes to the network infrastructure, such as the addition or removal of nodes, and to maintain consistent performance as the topology changes. Designing a distributed system to handle changes to the topology requires careful planning and consideration of the potential impacts of these changes on the system’s performance and reliability.

6. There is One Administrator: This is a fallacy of distributed computing because it assumes that there is only one administrator responsible for managing a distributed system. In reality, distributed systems often have multiple administrators, and coordinating their actions can be complex.

Having multiple administrators can be beneficial in a distributed system, as it allows for redundancy and ensures that there is always someone available to manage the system. However, it can also introduce challenges, such as the need for coordination and communication between administrators and the potential for conflicts or inconsistencies in the way that the system is managed. Ignoring the potential for multiple administrators in a distributed system can lead to problems such as a lack of accountability and difficulty in identifying and resolving issues within the system.

It’s important to carefully consider the role of administrators in a distributed system and to design the system in a way that accounts for the potential for multiple administrators.

7. Transport Cost is Zero: This is a fallacy of distributed computing because it assumes that the cost of transmitting data between nodes in a distributed system is zero. However, this is not the case in reality. Transferring data between nodes can consume resources such as bandwidth and storage, and these costs should be taken into account when designing distributed systems.

For example, if a distributed system requires a large amount of data to be transferred between nodes, this can consume a significant amount of bandwidth and may result in additional costs for the system. Similarly, storing data on nodes may require the use of additional storage resources, which can also have costs associated with them. Ignoring the cost of data transfer and storage can lead to inefficient use of resources and can negatively impact the performance and reliability of the system.

It’s important to carefully consider the cost of data transfer and storage when designing distributed systems, and to ensure that the system is designed to use these resources efficiently.

8. Network is Homogeneous: This fallacy assumes that all nodes in a distributed system are the same, with no differences in hardware, software, or network infrastructure. However, in reality, distributed systems often consist of nodes with different capabilities and configurations, and these differences can have a significant impact on the performance and reliability of the system.

It’s important to consider the heterogeneity of a distributed system when designing and implementing it, as it can affect the way that the system functions and how it is able to handle failures or changes in the network.

For example, if a distributed system is built on nodes with different hardware or software configurations, it may be necessary to design the system to handle these differences and to ensure that it can continue to function properly even if some nodes fail or are removed from the network.

While watching through Neal Ford communication-styles-for-distributed-architectures-and-microservices, he pointed out two more to be added to the list which I have added below:

9. Just version it: This is a fallacy as it’s not a comprehensive approach to dealing with problems in distributed systems.

In a distributed system, multiple components or subsystems operate independently and communicate with each other over a network. These systems can be complex and prone to various types of failures or issues. Simply “versioning” the system or its components may not be sufficient to address all of the challenges that can arise in a distributed system.

Effective management of distributed systems requires a robust set of strategies and techniques to ensure that the system is reliable, scalable, and resilient. This can include measures such as monitoring and logging to identify and troubleshoot issues, implementing failover and recovery mechanisms, and using versioning as part of a broader strategy for maintaining and updating the system.

In short, while versioning can be a useful tool for managing distributed systems, it’s not a standalone solution and should be part of a larger approach to maintaining and improving the reliability and performance of the system.

10. Compensating updates in transactional sagas always work: This is a fallacy as it is an overly optimistic assumption.

Compensating updates are used in transactional sagas to undo the effects of a previously completed action in the event of a failure or error. However, in a distributed system, there are many potential sources of failure or error that can occur, including network delays, communication failures, and data inconsistencies. These issues can make it difficult for compensating updates to be applied correctly, or for them to be applied at all.

Additionally, transactional sagas can be complex and involve multiple steps and actions, which can make it difficult to determine the correct compensating updates to apply. This can lead to issues such as data loss or corruption, or even more complex errors that may require manual intervention to resolve.

Overall, while compensating updates in transactional sagas can be a useful tool in distributed systems, they are not a foolproof solution and must be used with caution and careful consideration of the potential risks and limitations.

--

--

Navaneeth Sen

Software Engineer | Java, Python, Linux, Unix | AI, DVB | 💻 | Azure | PyTorch | Hackathons | Innovations | Highly Inquisitive and Curious