Apache Mahout
![]() | |
Developer(s) | Apache Software Foundation |
---|---|
Initial release | 7 April 2009[1] |
Stable release | 14.1
/ 7 October 2020[2] |
Repository | Mahout Repository |
Written in | Java, Scala |
Operating system | Cross-platform |
Type | Machine Learning |
License | Apache License 2.0 |
Website | mahout |
Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily on linear algebra. In the past, many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark.[3][4] Mahout also provides Java/Scala libraries for common maths operations (focused on linear algebra and statistics) and primitive Java collections. Mahout is a work in progress; a number of algorithms have been implemented.[5]
Features
- The Mahout-Samsara Scala Domain Specific Language (DSL)
- In August 2016, a framework was developed to create a "Python Like" interface for users who wanted prepackaged algorithms. See MAHOUT PR 246
- In October 2016, work began on hardware acceleration- specifically by utilizing ViennaCL (for GPU based acceleration) and OpenMP (for outside-of-the-JVM based CPU acceleration) see MAHOUT PR 261
History
Transition from Map Reduce to Apache Spark
While Mahout's core algorithms for clustering, classification and batch based collaborative filtering were implemented on top of Apache Hadoop using the map/reduce paradigm, it did not restrict contributions to Hadoop-based implementations. Contributions that run on a single node or on a non-Hadoop cluster were also welcomed. For example, the 'Taste' collaborative-filtering recommender component of Mahout was originally a separate project and can run stand-alone without Hadoop.
Starting with the release 0.10.0, the project shifted its focus to building a backend-independent programming environment, code named "Samsara".[6][7][8] The environment consists of an algebraic backend-independent optimizer and an algebraic Scala DSL unifying in-memory and distributed algebraic operators. Supported algebraic platforms are Apache Spark, H2O, and Apache Flink.[citation needed] Support for MapReduce algorithms started being gradually phased out in 2014.[9]
Release History
Version | Release Date | Notes |
---|---|---|
0.1 | 2009-04-07 | |
0.2 | 2009-11-18 | |
0.3 | 2010-03-17 | |
0.4 | 2010-10-31 | |
0.5 | 2011-05-27 | |
0.6 | 2012-02-06 | |
0.7 | 2012-05-16 | |
0.8 | 2013-07-25 | |
0.9 | 2014-02-01 | |
0.10.0 | 2015-04-11 | Samsara DSL |
0.10.1 | 2015-05-31 | |
0.10.2 | 2015-08-06 | |
0.11.0 | 2015-08-07 | |
0.11.1 | 2015-11-06 | |
0.11.2 | 2016-03-11 | |
0.12.0 | 2016-04-11 | Added Apache Flink engine |
0.12.1 | 2016-05-19 | |
0.12.2 | 2016-06-13 | |
0.13.0 | 2018-05-04 | |
0.14.0 | 2019-03-07 | Source only (no binaries) |
14.1 | 2020-10-07 |
References
- ^ "Apache Mahout: First release 0.1 released".
- ^ "Apache Mahout: Scalable machine learning and data mining". Retrieved 6 March 2019.
- ^ "Introducing Apache Mahout". ibm.com. 2011. Retrieved 13 September 2011.
- ^ "InfoQ: Apache Mahout: Highly Scalable Machine Learning Algorithms". infoq.com. 2011. Retrieved 13 September 2011.
- ^ "Algorithms - Apache Mahout - Apache Software Foundation". cwiki.apache.org. 2011. Retrieved 13 September 2011.
- ^ "Mahout-Samsara's In-Core Linear Algebra DSL Reference".
- ^ "Mahout-Samsara's Distributed Linear Algebra DSL Reference".
- ^ "Mahout 0.10.x: first Mahout release as a programming environment". www.weatheringthroughtechdays.com. Archived from the original on 9 October 2016. Retrieved 29 February 2016.
- ^ "MAHOUT-1510 ("Good-bye MapReduce")".
External links