Apache Ignite - Spark, Hadoop, and File System Documentation

The apacheignite-fs Developer Hub

Welcome to the apacheignite-fs developer hub. You'll find comprehensive guides and documentation to help you start working with apacheignite-fs as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Installing on Cloudera CDH

This article explains how to install Apache Ignite Hadoop Accelerator on Cloudera CDH distribution.

Installation consists of the following main steps:

  • Adding Ignite JARs to Hadoop classpath
  • Starting Ignite node(s)
  • Passing correct configuration to Hadoop

Please read the following articles first to get better understanding of product's architecture:

Ignite

  1. Download the latest version of Apache Ignite Hadoop Accelerator and unpack it somewhere.

  2. Set IGNITE_HOME environment variable to the directory where you unpacked Apache Ignite Hadoop Accelerator.

  3. Ensure that the following Hadoop environment variables are set and valid. Assuming that CDH is installed to /usr/lib directory:

export HADOOP_HOME=/usr/lib/hadoop/
export HADOOP_COMMON_HOME=/usr/lib/hadoop/
export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs/ 
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce/
  1. Configure secondary file system if you want to cache data from HDFS. Open $IGNITE_HOME/config/default-config.xml, uncomment secondaryFileSystem property and set correct HDFS URI:
<bean class="org.apache.ignite.configuration.FileSystemConfiguration">
  ...
  <property name="secondaryFileSystem">
    <bean class="org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem">
      <property name="fileSystemFactory">
        <bean class="org.apache.ignite.hadoop.fs.CachingHadoopFileSystemFactory">
          <property name="uri" value="hdfs://your_hdfs_host:8020"/>
        </bean>
      </property>
    </bean>
  </property>
</bean>

You can also pass additional Hadoop configuration files to the file system factory if needed:

<bean class="org.apache.ignite.hadoop.fs.CachingHadoopFileSystemFactory">
  <property name="uri" value="hdfs://your_hdfs_host:9000"/>
  <property name="configPaths">
    <list>
      <value>/path/to/core-site.xml</value>
    </list>
  </property>
</bean>
  1. At this point, Ignite node is ready to be started:
$IGNITE_HOME/bin/ignite.sh

CDH

  1. Ensure that IGNITE_HOME environment variable is set and points to the directory where you unpacked Apache Ignite Hadoop Accelerator.

  2. Copy or symlink Ignite JARs to Hadoop classpath. This is required to let Hadoop load Ignite classes in runtime:

cd /usr/lib/hadoop/lib
ln -s $IGNITE_HOME/libs/ignite-core-[version].jar
ln -s $IGNITE_HOME/libs/ignite-shmem-1.0.0.jar
ln -s $IGNITE_HOME/libs/ignite-hadoop/ignite-hadoop-[version].jar
  1. Create Hadoop configuration

Hadoop determines what file system and job tracker to use based on configuration files, core-site.xml and mapred-site.xml respectively.

Recommended way to setup this configuration is to create separate directory, copy existing core-site.xml and mapred-site.xml files there, and then apply necessary configuration changes. For example:

mkdir ~/ignite_conf
cd ~/ignite_conf
cp /usr/hdp/current/hadoop-client/etc/core-site.xml .
cp /usr/hdp/current/hadoop-client/etc/mapred-site.xml .

If you want to use IGFS, please add class name mapping to core-site.xml:

<configuration>
  ...
  <property>
    <name>fs.igfs.impl</name>
    <value>org.apache.ignite.hadoop.fs.v1.IgniteHadoopFileSystem</value>
  </property>
  <property>
    <name>fs.AbstractFileSystem.igfs.impl</name>
    <value>org.apache.ignite.hadoop.fs.v2.IgniteHadoopFileSystem</value>
  </property> 
  ...
</configuration>

If you want to use IGFS as default file system (i.e. without igfs:// prefix), then you should set it fs.defaultFS property in core-site.xml:

<configuration>
  ...
  <property>
    <name>fs.defaultFS</name>
    <value>igfs://igfs@/</value>
  </property>
  ...
</configuration>

If you want to use Ignite Hadoop Accelerator for map-reduce jobs, then you should point mapred-site.xml to proper job tracker:

<configuration>
  ...
  <property>
    <name>mapreduce.framework.name</name>
    <value>ignite</value>
  </property>
  <property>
    <name>mapreduce.jobtracker.address</name>
    <value>[your_host]:11211</value>
  </property>
  ...
</configuration>

Alternatively you can use configuration files shipped with Ignite distribution, located in $IGNITE_HOME/config/hadoop directory.

Apache Ignite Hadoop Accelerator Usage

At this point installation is finished and you can start running jobs or work with IGFS.

Query IGFS:

hadoop --config ~/ignite_conf fs -ls /

Run a job with default configuration:

hadoop --config ~/ignite_conf jar [your_job]