- Hadoop is an ecosystem of open source components that fundamentally changes the way enterprises store, process, and analyze data.
- Hadoop is available either open-source through the Apache distribution, or through vendors such as Cloudera (the largest Hadoop vendor by size and scope), MapR, or HortonWorks.
- Initial release April 1, 2006;
- Version Series 3.3.x (i.e. 3.3.0 etc) was released on 14 July 2020
- The term Hadoop is often used for both base modules and sub-modules and also the ecosystem, or collection of additional software packages that can be installed on top of or alongside Hadoop, such as Apache Pig, Apache Hive, Apache HBase, Apache Phoenix, Apache Spark, Apache ZooKeeper, Cloudera Impala, Apache Flume, Apache Sqoop, Apache Oozie, and Apache Storm.
Spark: Spark has been found to run 100 times faster in-memory, and 10 times faster on disk. It's also been used to sort 100 TB of data 3 times faster than Hadoop MapReduce on one-tenth of the machines. Spark has particularly been found to be faster on machine learning applications, such as Naive Bayes and k-means.(https://logz.io/blog/hadoop-vs-spark/).