What is a causal inference?

Overview Causality is the relationship between a cause and effect. In distributed systems, ordering of events is challenging, as timestamps across nodes cannot be compared. This is where causality is useful in the distributed system. Causality and the happens-before relationship Let’s define an event as something happening at one node. This something can be either sending or receiving … Continue reading What is a causal inference?

Configuring Hadoop on Mac OSx in pseudo-distributed cluster mode.

Hadoop is an open-source Apache project that enables processing of extremely large datasets in a distributed computing environment. There are three different modes in which it can be run: 1. Standalone Mode2. Pseudo-Distributed Mode3. Fully-Distributed Mode This post covers setting up of Hadoop 2.5.1 in a Pseudo-distributed mode. A Pseudo-Distributed mode is one where each … Continue reading Configuring Hadoop on Mac OSx in pseudo-distributed cluster mode.

An Introduction to Zookeeper – Part I of the Zookeeper series

A distributed system consists of multiple computers that communicate through a computer network and interact with each other to achieve a common goal.Major benefits that distributed systems offer over centralized systems is scalability and redundancy. Systems can be easily expanded by adding more machines as needed, and even if one of the machines is unavailable, … Continue reading An Introduction to Zookeeper – Part I of the Zookeeper series