Accessing Solr Cloud on AWS from SolrJ

March 31, 2016January 27, 2020 ~ Anjana Shankar ~ Leave a comment

We have a Solr cloud installation on AWS EC2 instances. We use the SolrJ Client from our Java application. Till date we used to have a Solr Cloud installation on our local machine in order to test the code against. As the team started growing, we realized that we should have a way to access … Continue reading Accessing Solr Cloud on AWS from SolrJ

Configuring Hive on Ubuntu

December 23, 2014January 28, 2020 ~ Anjana Shankar ~ Leave a comment

Hive facilitates querying and managing large datasets residing in distributed storage. It is built on top of Hadoop. Hive defines a simple query language called as Hive Query language (HQL) which enables users familiar with SQL to query the data. Hive converts your HQL (Hive Query Language) queries into a series of MapReduce jobs for … Continue reading Configuring Hive on Ubuntu

Configuring Hadoop on Ubuntu in pseudo-distributed mode

December 16, 2014March 11, 2020 ~ Anjana Shankar ~ Leave a comment

Hadoop is an open-source Apache project that enables processing of extremely large datasets in a distributed computing environment. There are three different modes in which it can be run: 1. Standalone Mode2. Pseudo-Distributed Mode3. Fully-Distributed Mode This post covers setting up of Hadoop 2.5.1 in a Pseudo-distributed mode on an Ubuntu machine. For setting up … Continue reading Configuring Hadoop on Ubuntu in pseudo-distributed mode

Configuring Hadoop on Mac OSx in pseudo-distributed cluster mode.

November 7, 2014January 28, 2020 ~ Anjana Shankar ~ 2 Comments

Hadoop is an open-source Apache project that enables processing of extremely large datasets in a distributed computing environment. There are three different modes in which it can be run: 1. Standalone Mode2. Pseudo-Distributed Mode3. Fully-Distributed Mode This post covers setting up of Hadoop 2.5.1 in a Pseudo-distributed mode. A Pseudo-Distributed mode is one where each … Continue reading Configuring Hadoop on Mac OSx in pseudo-distributed cluster mode.

An Introduction to Zookeeper – Part I of the Zookeeper series

August 21, 2014January 28, 2020 ~ Anjana Shankar ~ Leave a comment

A distributed system consists of multiple computers that communicate through a computer network and interact with each other to achieve a common goal.Major benefits that distributed systems offer over centralized systems is scalability and redundancy. Systems can be easily expanded by adding more machines as needed, and even if one of the machines is unavailable, … Continue reading An Introduction to Zookeeper – Part I of the Zookeeper series

Configuration and Coordination with Zookeeper

August 3, 2014January 28, 2020 ~ Anjana Shankar ~ Leave a comment

It took me a while to understand the concept of Zookeeper and it took me another some to understand how to use it for the task that I had begun with. This post is intended to help others cross the bridge faster.Dynamic Configuration Management for today's system comes with all the nitty-gritties that are involved … Continue reading Configuration and Coordination with Zookeeper