Understanding Apache Spark

Apache Spark is a distributed computing framework implemented along with Apache Hadoop to provide a wider range of functionality than is available in traditional MapReduce. The organizing concept is the DataFrame, a Dataset (distributed collection of data) organized into named columns. It is similar in concept to a relational database table, but has built in operations available.