Apache Ignite provides seamless integrations with Hadoop and Spark. While the Ignite-Hadoop integration allows you to use Ignite File System as a primary caching layer to store HDFS data, Ignite-Spark integration allows you to share state in-memory across multiple spark jobs using an implementation of Spark RDD and DataFrames.
Apache Ignite is a distributed memory-centric database and caching platform that is used by Apache Spark users to:
- Achieve true in-memory performance at scale and avoid data movement from a data source to Spark workers and applications.
- Boost DataFrame and SQL performance.
- More easily share state and data among Spark jobs.
One of unique capabilities of Ignite is a distributed in-memory file system called Ignite File System (IGFS). IGFS delivers similar functionality to Hadoop HDFS, but only in memory. In fact, in addition to its own APIs, IGFS implements Hadoop FileSystem API and can be transparently plugged into Hadoop or Spark deployments.
Ignite In-Memory MapReduce allows to effectively parallelize the processing data stored in any Hadoop file system. It eliminates the overhead associated with job tracker and task trackers in a standard Hadoop architecture while providing low-latency, HPC-style distributed processing.
Apache Ignite Hadoop Accelerator provides a set of components allowing for in-memory Hadoop job execution and file system operations. It can be used in combination with Ignite File System and In-Memory MapReduce, and can be easily plugged in to any Hadoop distribution.