Jump to content

Apache Mahout

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 72.219.92.140 (talk) at 07:32, 31 October 2020 (GPU/CPU accelerators: add viennacl reference as url (apalumbo)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Apache Mahout
Developer(s)Apache Software Foundation
Initial release7 April 2009; 16 years ago (2009-04-07)[1]
Stable release
14.1 / 7 October 2020; 4 years ago (2020-10-07)[2]
RepositoryMahout Repository
Written inJava, Scala
Operating systemCross-platform
TypeMachine Learning
LicenseApache License 2.0
Websitemahout.apache.org

Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms focused primarily on linear algebra. In the past, many of the implementations use the Apache Hadoop platform, however today it is primarily focused on Apache Spark.[3][4] Mahout also provides Java/Scala libraries for common maths operations (focused on linear algebra and statistics) and primitive Java collections. Mahout is a work in progress; a number of algorithms have been implemented.[5]

Features

Samsara

Apache Mahout-Samsara refers to a Scala domain specific language (DSL) that allows users to use R-Like syntax as opposed to traditional Scala-like syntax. This allows user to express algorithms concisely and clearly.


val G = B %*% B.t - C - C.t + (ksi dot ksi) * (s_q cross s_q)


Backend Agnostic

Apache Mahout’s code abstracts the domain specific language from the engine where the code is run. While active development is done with the Apache Spark engine, users are free to implement any engine they choose- H2O and Apache Flink have been implemented in the past and examples exist in the code base.

GPU/CPU accelerators

The JVM has notoriously slow computation, to solve this set back “native solvers” were added which moves in-core, and by extension, distributed BLAS operations out of the JVM into either the CPU or GPUs via the ViennaCL library[6]. "Extending Mahout Samsara to GPU Clusters".

Recommenders

Apache Mahout features implementations of Alternating Least Squares, Co-Occurrence, and Correlated Co-Occurrence, a unique-to-Mahout recommender algorithm that extends co-occurrence to be used on multiple dimensions of data.

History

Transition from Map Reduce to Apache Spark

While Mahout's core algorithms for clustering, classification and batch based collaborative filtering were implemented on top of Apache Hadoop using the map/reduce paradigm, it did not restrict contributions to Hadoop-based implementations. Contributions that run on a single node or on a non-Hadoop cluster were also welcomed. For example, the 'Taste' collaborative-filtering recommender component of Mahout was originally a separate project and can run stand-alone without Hadoop.

Starting with the release 0.10.0, the project shifted its focus to building a backend-independent programming environment, code named "Samsara".[7][8][9] The environment consists of an algebraic backend-independent optimizer and an algebraic Scala DSL unifying in-memory and distributed algebraic operators. Supported algebraic platforms are Apache Spark, H2O, and Apache Flink.[citation needed] Support for MapReduce algorithms started being gradually phased out in 2014.[10]


Release History

Release History
Version Release Date Notes
0.1 2009-04-07
0.2 2009-11-18
0.3 2010-03-17
0.4 2010-10-31
0.5 2011-05-27
0.6 2012-02-06
0.7 2012-05-16
0.8 2013-07-25
0.9 2014-02-01
0.10.0 2015-04-11 Samsara DSL
0.10.1 2015-05-31
0.10.2 2015-08-06
0.11.0 2015-08-07
0.11.1 2015-11-06
0.11.2 2016-03-11
0.12.0 2016-04-11 Added Apache Flink engine
0.12.1 2016-05-19
0.12.2 2016-06-13
0.13.0 2018-05-04
0.14.0 2019-03-07 Source only (no binaries)
14.1 2020-10-07



References