User:Fularp/sandbox

This is the user sandbox of Fularp. A user sandbox is a subpage of the user's user page. It serves as a testing spot and page development space for the user and is not an encyclopedia article. Create or edit your own sandbox here.

Other sandboxes: Main sandbox | Template sandbox

Finished writing a draft article? Are you ready to request review of it by an experienced editor for possible inclusion in Wikipedia? Submit your draft for review!

SEMMA is an acronym that stands for Sample, Exlore, Modify, Model and Assess. It is a list of sequential steps that pretends to guide the implementation of data mining applications developed by SAS Institute Inc., one of the largest producer of business intelligence software^[1]. Although SEMMA is often consideres as a general data mining methodology, SAS claims that it is rather a logical organisation of the functional tool set of one of their product, SAS Enterprise Miner, for carrying out the core tasks of data mining.^[2]

Major phases

CRISP-DM breaks the process of data mining into six major phases^[3]:

Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment

History

CRISP-DM was conceived in 1996. In 1997 it got underway as a European Union project under the ESPRIT funding initiative. The project was led by four companies: SPSS, Teradata, Daimler_AG and OHRA.

This core consortium brought different experiences to the project: ISL, later acquired and merged into SPSS Inc. The computer giant NCR Corporation produced the Teradata data warehouse and its own data mining software. Daimler-Benz had a significant data mining team. OHRA, an insurance company, was just starting to explore the potential use of data mining.

The first version of the methodology was released as CRISP-DM 1.0 in 1999.

CRISP-DM 2.0

In July 2006 the consortium announced that it was going to start the process of working towards a second version of CRISP-DM. On 26 September 2006, the CRISP-DM SIG met to discuss potential enhancements for CRISP-DM 2.0 and the subsequent roadmap. However, these efforts appear to be stalled. The SIG has not met, updated the CRISP website, or communicated anything to members since early 2007. As of June 22, 2011, the website redirects to an IBM page about SPSS.

Advantages

Industry neutral
Tool neutral
Closely related to the Knowledge Discovery in Databases Process Model
Anchors the data mining process

References

^ Azevedo A., Santos M. F. KDD, SEMMA and CRISP-DM: a parallel overview In Proceedings of the IADIS European Conference on Data Mining 2008, pp 182-185.
^ SAS Enterprise Miner website
^ Harper, Gavin (2006). "Methods for mining HTS data". Drug Discovery Today. 11 (15–16): 694–699. doi:10.1016/j.drudis.2006.06.006. PMID 16846796. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)

[AzevedoSantos-1] Azevedo A., Santos M. F. KDD, SEMMA and CRISP-DM: a parallel overview In Proceedings of the IADIS European Conference on Data Mining 2008, pp 182-185.

[2] SAS Enterprise Miner website

[Harper06-3] Harper, Gavin (2006). "Methods for mining HTS data". Drug Discovery Today. 11 (15–16): 694–699. doi:10.1016/j.drudis.2006.06.006. PMID 16846796. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help); Unknown parameter |month= ignored (help)

[1]

[2]

[3]