Apache Ignite provides seamless integrations with Hadoop and Spark. While the Ignite-Hadoop integration allows you to use Ignite File System as a primary caching layer to store HDFS data, Ignite-Spark integration allows you to share state in-memory across multiple spark jobs using an implementation of Spark RDD and DataFrames.
Ignite for Spark
Apache Ignite is a distributed memory-centric database and caching platform that is used by Apache Spark users to:
- Achieve true in-memory performance at scale and avoid data movement from a data source to Spark workers and applications.
- Boost DataFrame and SQL performance.
- More easily share state and data among Spark jobs.
In-Memory File System
One of unique capabilities of Ignite is a distributed in-memory file system called Ignite File System (IGFS). IGFS delivers similar functionality to Hadoop HDFS, but only in memory. In fact, in addition to its own APIs, IGFS implements Hadoop FileSystem API and can be transparently plugged into Hadoop or Spark deployments.
In-Memory Map Reduce
Ignite In-Memory MapReduce allows to effectively parallelize the processing data stored in any Hadoop file system. It eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.



