Jump to content

User:Lixinso/datamining

From Wikipedia, the free encyclopedia

Fundamentals

[edit]

Matrices & Linear Algebra Fundamentals

[edit]

Hash Functions, Binary Tree O(n)

[edit]

Relational Algebra, DB Basics

[edit]

Inner,Outer,Cross,Theta Join

[edit]

CAP Theorem

[edit]

Tabular Data

[edit]

Data Frames & Series

[edit]

Sharding

[edit]

OLAP

[edit]

Multidimensional Data Model

[edit]

ETL

[edit]

Reporting Vs BI Vs Analytics

[edit]

JSON & XML

[edit]

NoSQL

[edit]

Regex

[edit]

Vendor Landscape

[edit]

Env Setup

[edit]

Statistics

[edit]

Exploratory Data Analysis

[edit]

Histograms

[edit]

Percentiles & Outliers

[edit]

Probability Theory

[edit]

Bayes Theorem

[edit]

Bayes' theorem

Random Variables

[edit]

Cumul Dist Fn(CDF)

[edit]

Continuos Distributions(Normal,Poison,Gaussian)

[edit]

ANOVA

[edit]

Prob Den Fn (PDF)

[edit]

Central Limit Theorem

[edit]

Monte Carlo Method

[edit]

Hypothesis Testing

[edit]

p-Value

[edit]

Chi^2 Test

[edit]

Estimation

[edit]

Confid int(CI)

[edit]

MLE

[edit]

Kernel Density Estimate

[edit]

Regression

[edit]

Covariance

[edit]

Correlation

[edit]

Pearson Coeff

[edit]

Causation

[edit]

Least^2 Fit

[edit]

Euclidean Distance

[edit]

Programming

[edit]

Install Pkgs

[edit]

Factor Analysis

[edit]

Functions

[edit]

Manipulate Data Frames

[edit]

Subsetting Data

[edit]

Reading Raw Data

[edit]

Reading CSV Data

[edit]

Data Frames

[edit]

Lists

[edit]

Factors

[edit]

Arrays

[edit]

Matrices

[edit]

Vectors

[edit]

Variables

[edit]

Expressions

[edit]

R Basics

[edit]

R Setup

[edit]

R Studio

[edit]

Working in Excel

[edit]

Python Basics

[edit]

Rapid Miner

[edit]

IBM SPSS

[edit]

Machine Learning

[edit]

What's ML

[edit]

Numberical Var

[edit]

Categorical Var

[edit]

Supervised Learning

[edit]

Unsupervised Learning

[edit]

Concepts, Inputs & Attributes

[edit]

Training & Testing Data

[edit]

Classifier

[edit]

Predication

[edit]

Lift

[edit]

OverFitting

[edit]

Bias & Variance

[edit]

Trees & Classification

[edit]

Classification Rate

[edit]

Decision Trees

[edit]

Boosting

[edit]

Naive Bayes Classifier

[edit]

K-Nearest Neighbor

[edit]

Regression

[edit]

Ranking

[edit]

Linear Regression

[edit]

Perceptron

[edit]

Hierarchical Clustering

[edit]

K-Means Clustering

[edit]

Neural Networks

[edit]

Sentiment Analysis

[edit]

Collaborative Filtering

[edit]

Tagging

[edit]

Vocabulary Mapping

[edit]

Text Mining / NLP

[edit]

Classify Text

[edit]

Using NLTK

[edit]

Using WEKA

[edit]

Feature Extraction

[edit]

Market Based Analysis

[edit]

Association Rules

[edit]

Support Vector Machines

[edit]

Term Frequency & Weight

[edit]

Term Document Matrix

[edit]

UIMA

[edit]

Text Analysis

[edit]

Named Entity Recognition

[edit]

Corpus

[edit]

Big Data

[edit]

==Data Replication Principles ==Setup Hadoop (IBM / Cloudera / HortonWorks)

Name & Data Nodes

[edit]

Job & Task Tracker

[edit]

MR Programming

[edit]

Sqoop : Loading Data in HDFS

[edit]

Flume , Scribe: For Unstruct Data

[edit]

SQL with Pig

[edit]

DWH with Hive

[edit]

Scribe , Chukwa For Weblog

[edit]

Using Mahout

[edit]

Zookeeper Avro

[edit]

Storm : Hadoop Realtime

[edit]

RHadoop RHIPE

[edit]

rmr

[edit]

Cassandra

[edit]

MongoDB,Neo4j

[edit]

Visualization

[edit]

Tableau

[edit]

IBM ManyEyes

[edit]

InfoVis

[edit]

D3.js

[edit]

Decision Tree

[edit]

Timeline

[edit]

Survey Plot

[edit]

==Histogram & Pie (Uni)

==Uni,BI&Multivariate Viz

ToolBox

[edit]

MS Excel w/ Analysis ToolPak

[edit]

Java , Python

[edit]

R,R-Studio,Rattle

[edit]

Weka,Knime,RapidMiner

[edit]

Hadoop Dist of Choice

[edit]

Spark,Storm

[edit]

Flume,Scribe,Chukwa

[edit]

Nutch,Talend,Scraperwiki

[edit]

Webscraper,Flume,Sqoop

[edit]

tm,RWeka,NLK

[edit]

RHIPE

[edit]

D3.js,ggplot2,Shiny

[edit]

IBM Lanuageware

[edit]

Cassandra MongoDB

[edit]

Data Ingestion

[edit]

Summary of Data Formats

[edit]

Data Discovvery

[edit]

Data Sources & Acquisition

[edit]

Data Integration

[edit]

Data Fusion

[edit]

Transformation & Enrichment

[edit]

Data Survey

[edit]

Google OpenRefine

[edit]

How much Data ?

[edit]

Using ETL

[edit]

Data Munging

[edit]

Dimensionality & Numerosity

[edit]