Apache Ignite - Spark, Hadoop, and File System Documentation

The apacheignite-fs Developer Hub

Welcome to the apacheignite-fs developer hub. You'll find comprehensive guides and documentation to help you start working with apacheignite-fs as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Hadoop FileSystem Cache

Delegate operations to another file system.

Ignite Hadoop Accelerator contains implementation of IGFS secondary file system IgniteHadoopIgfsSecondaryFileSystem which allows read-through and write-through for any Hadoop FileSystem implementation.

To use the secondary file system specify it in IGFS configuration or in your Java source code:

<bean class="org.apache.ignite.configuration.FileSystemConfiguration">
  <property name="secondaryFileSystem">
    <bean class="org.apache.ignite.hadoop.fs.IgniteHadoopIgfsSecondaryFileSystem">
      <constructor-arg value="hdfs://myHdfs:9000"/>                            
FileSystemConfiguration fileSystemCfg = new FileSystemConfiguration();
IgniteHadoopIgfsSecondaryFileSystem hadoopFileSystem = new IgniteHadoopIgfsSecondaryFileSystem("hdfs://myHdfs:9000");

By default, Apache Ignite will not have Hadoop libraries in the classpath during an Apache Ignite node startup. If you decide to use HDFS as a secondary file system then you have to follow these steps in advance:

  1. Use "Apache Ignite Hadoop Accelerator" edition of Ignite distribution (use -Dignite.edition=hadoop if you're building the distribution by yourself).

  2. Set HADOOP_HOME environment variable before starting an Apache Ignite node if you're using Apache Hadoop distribution. If you use some other Hadoop distribution (HDP, Cloudera, BigTop, etc.) make sure that /etc/default/hadoop file exists and has appropriate content.

See respective Ignite installation guide for your Hadoop distribution for details.

Alternatively, you can manually add necessary Hadoop dependencies to Ignite node classpath: these are dependencies of groupId "org.apache.hadoop" listed in file modules/hadoop/pom.xml. Currently they are the following:

  • hadoop-annotations
  • hadoop-auth
  • hadoop-common
  • hadoop-hdfs
  • hadoop-mapreduce-client-common
  • hadoop-mapreduce-client-core

Hadoop FileSystem Cache

Delegate operations to another file system.