Hive facilitates querying and managing large datasets residing in distributed storage. It is built on top of Hadoop. Hive defines a simple query language called as Hive Query language (HQL) which enables users familiar with SQL to query the data. Hive converts your HQL (Hive Query Language) queries into a series of MapReduce jobs for execution on a Hadoop cluster. In this post we will configure Hive on our machine.
Download Hive from the Apache Hive site. Unpack the .tar to the location of your choice and assign ownership to the user setting up Hive. At the time of this writing, the latest version available is 0.14.0.
Java: 1.6 or higher. Preferred version would be 1.7
Hadoop: 2.x. For Hadoop installation you can refer to this post.
Set the environment variable HIVE_HOME to point to the installation directory. You can set this in your .bashrc
Finally, add $HIVE_HOME/bin to your PATH.
Setting HADOOP_PATH in HIVE config.sh
Append the following line to the file $HIVE_HOME/bin/config.sh.
You must create /tmp and /user/hive/warehouse and set appropriate permissions before you can create any table in hive.
$ hadoop fs -mkdir /usr/hive/warehouse $ hadoop fs -chmod g+w /usr/hive/warehouse $ hadoop fs -mkdir /tmp $ hadoop fs -chmod g+w /tmp
Start the hive shell
The shell would look something like
Logging initialized using configuration in jar:file:/user/hive/lib/hive-common-0.14.0.jar!/hive-log4j.properties hive >