Jump to content

User:Fularp/sandbox

From Wikipedia, the free encyclopedia

SEMMA is an acronym that stands for Sample, Exlore, Modify, Model and Assess. It is a list of sequential steps that pretends to guide the implementation of data mining applications developed by SAS Institute Inc., one of the largest producer of business intelligence software[1]. Although SEMMA is often considered as a general data mining methodology, SAS claims that it is rather a logical organisation of the functional tool set of one of their product, SAS Enterprise Miner, for carrying out the core tasks of data mining.[2]

Background

[edit]

In the expanding field of data mining, there has been a call for a standard, a methodology or a simply list of best practices for the deverisified and iterative process of data mining that users can apply to their data mining projects regardless of industry. While Cross Industry Standard Process for Data Mining or CRISP-DM, founded by the European Strategic Program on Research in Information Technology initiative, aimed to create a netural methodology, SAS also offered a pattern fo follow in its data mining tools.

Phases of SEMMA

[edit]

The phases of SEMMA and related tasks are the following:[2]

  • Sample. The process starts with data sampling, e.g., selecting the data set for modeling. The data set should be large enough to contain sufficient information to retrieve yet small enough to be used efficiently. This phase also deals with data partitioning.
  • Explore. This phase covers the understanding of the data by discovering anticipated and unanticipated realtionships between the variables, and also abnormalities with the help of data visualization.
  • Modify. The Modify phase contains methods to select, create and transform variables in perparation for data modeling.
  • Model. In the Model phase the focus is on applying various modeling, data mining techniques on the prepared variables in order to create models that possibly provide the desires outcome.
  • Assess. The last phase is Assess. The evaluation of the modeling results shows the reliability and usefulness of the created models.

Criticism

[edit]

SEMMA mainly focuses on the statistical, modeling and data manipulating tasks of data mining projects, leaving the business aspects out (unlike, i.e., CRISP-DM and its Business Understanding phase). Additionally, SEMMA is designed to work with the SAS Enterprise Miner software. Therefore, applying it outside the limitations of that system can be ambiguous.[3]

See also

[edit]

References

[edit]
  1. ^ Azevedo, A. and Santos, M. F. KDD, SEMMA and CRISP-DM: a parallel overview In Proceedings of the IADIS European Conference on Data Mining 2008, pp 182-185.
  2. ^ a b SAS Enterprise Miner website
  3. ^ Rohanizadeh, S. S. and Moghadam, M. B. A Proposed Data Mining Methodology and its Application to Industrial Procedures Journal of Industrial Engineering 4 (2009) pp 37-50.

Category:Applied data mining