Configuring Hive on Ubuntu

Hive facilitates querying and managing large datasets residing in distributed storage. It is built on top of Hadoop. Hive defines a simple query language called as Hive Query language (HQL) which enables users familiar with SQL to query the data. Hive converts your HQL (Hive Query Language) queries into a series of MapReduce jobs for execution on a Hadoop cluster. In this post we will configure Hive on our machine.

Download Hive from the Apache Hive site. Unpack the .tar to the location of your choice and assign ownership to the user setting up Hive. At the time of this writing, the latest version available is 0.14.0.

Prerequisites:
Java: 1.6 or higher. Preferred version would be 1.7
Hadoop: 2.x. For Hadoop installation you can refer to this post.

Installation

Set the environment variable HIVE_HOME to point to the installation directory. You can set this in your .bashrc

export HIVE_HOME=/user/hive

Finally, add $HIVE_HOME/bin to your PATH.

$export PATH=$HIVE_HOME/bin:$PATH

Setting HADOOP_PATH in HIVE config.sh
Append the following line to the file $HIVE_HOME/bin/config.sh.

export HADOOP_HOME=/user/hadoop

Running Hive
You must create /tmp and /user/hive/warehouse and set appropriate permissions before you can create any table in hive.

$ hadoop fs -mkdir /usr/hive/warehouse
$ hadoop fs -chmod g+w /usr/hive/warehouse
$ hadoop fs -mkdir /tmp
$ hadoop fs -chmod g+w /tmp

Start the hive shell

$ hive

The shell would look something like

Logging initialized using configuration in jar:file:/user/hive/lib/hive-common-0.14.0.jar!/hive-log4j.properties
hive >

Reference : https://cwiki.apache.org/confluence/display/Hive/Home

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s