Apache Ignite - Spark, Hadoop, and File System Documentation

The apacheignite-fs Developer Hub

Welcome to the apacheignite-fs developer hub. You'll find comprehensive guides and documentation to help you start working with apacheignite-fs as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Apache Ignite provides seamless integrations with Hadoop and Spark. While the Ignite-Hadoop integration allows you to use Ignite File System as a primary caching layer to store HDFS data, Ignite-Spark integration allows you to share state in-memory across multiple spark jobs using an implementation of Spark RDD and DataFrames.

Ignite for Spark

Apache Ignite is a distributed memory-centric database and caching platform that is used by Apache Spark users to:

  • Achieve true in-memory performance at scale and avoid data movement from a data source to Spark workers and applications.
  • Boost DataFrame and SQL performance.
  • More easily share state and data among Spark jobs.

Read more

In-Memory File System

One of unique capabilities of Ignite is a distributed in-memory file system called Ignite File System (IGFS). IGFS delivers similar functionality to Hadoop HDFS, but only in memory. In fact, in addition to its own APIs, IGFS implements Hadoop FileSystem API and can be transparently plugged into Hadoop or Spark deployments.

Read More

In-Memory Map Reduce

Ignite In-Memory MapReduce allows to effectively parallelize the processing data stored in any Hadoop file system. It eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.

Read More

Hadoop Accelerator

Apache Ignite Hadoop Accelerator provides a set of components allowing for in-memory Hadoop job execution and file system operations. It can be used in combination with Ignite File System and In-Memory MapReduce, and can be easily plugged in to any Hadoop distribution.

Overview