Apache Ignite - Spark, Hadoop, and File System Documentation

The apacheignite-fs Developer Hub

Welcome to the apacheignite-fs developer hub. You'll find comprehensive guides and documentation to help you start working with apacheignite-fs as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Ignite and Apache Hive

IGFS and Hadoop accelerator are to be discontinued soon: https://issues.apache.org/jira/browse/IGNITE-11942

Contact the Ignite community for alternate solutions. Some of the solutions will be documented later.‚Äč

This article explains how to properly configure and start Hive over Hadoop accelerated by Ignite. It also shows how to start HiveServer2 and a remote client with such configuration.

Prerequisites

We assume that Hadoop is already installed and configured to run over Ignite, and Ignite node(s) providing IGFS file system and map-reduce job tracker functionality is up and running.

You will also need to install Hive: http://hive.apache.org/.

Starting Hive

Here are the steps required to run Hive over "Ignited" Hadoop:

  • Provide the location of correct hadoop executable. This can be done either with adding path to the executable file into PATH environment variable (note that this executable should be located in a folder named bin/ anyway), or by specifying HADOOP_HOME environment variable.
  • Provide the location of configuration files (core-site.xml, hive-site.xml, mapred-site.xml). To do this put all these files in a directory and specify the path to this directory as HIVE_CONF_DIR environment variable.

Configuration Template

We recommend to use Hive template configuration file <IGNITE_HOME>/config/hadoop/hive-site.ignite.xml to get Ignite specific settings.

There is a potential issue related to different jline library versions in Hive and Hadoop. It can be resolved by setting HADOOP_USER_CLASSPATH_FIRST=true environment variable.

For convenience you can create a simple script that will properly set all required variables and run Hive, like this:

# Specify Hive home directory:
export HIVE_HOME=<Hive installation directory>

# Specofy configuration files location:
export HIVE_CONF_DIR=<Path to our configuration folder>

# If you did not set hadoop executable in PATH, specify Hadoop home explicitly:
export HADOOP_HOME=<Hadoop installation folder>

# Avoid problem with different 'jline' library in Hadoop: 
export HADOOP_USER_CLASSPATH_FIRST=true

${HIVE_HOME}/bin/hive "${@}"

This script can be used to start Hive interactive console:

$ hive-ig cli
hive> show tables;
OK
u_data
Time taken: 0.626 seconds, Fetched: 1 row(s)
hive> quit;
$

Starting HiveServer2

You may also want to use HiveServer2 for enhanced client features. To start it you can also use the script created above:

hive-ig --service hiveserver2

After the server is started, you can connect to it with any available client (e.g., beeline). As a remote client, beeline can be run from any host, and it does not require any specific environment to work with "Ignited" Hive. Here is the example:

$ ./beeline 
Beeline version 1.2.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000 scott tiger org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://localhost:10000
Connected to: Apache Hive (version 1.2.1)
Driver: Hive JDBC (version 1.2.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> show tables;
+-----------+--+
| tab_name  |
+-----------+--+
| u_data    |
+-----------+--+
1 row selected (0.957 seconds)
0: jdbc:hive2://localhost:10000>

Ignite and Apache Hive


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.