Data Mining: Classification VS Clustering (cluster analysis)
For someone who is new to Data mining, classification and clustering can seem similar because both data mining algorithms essentially “divide” the datasets into sub-datasets; But there is difference between them and this blog-post, we’ll see exactly that:
CLASSIFICATION CLUSTERING
We have a Training set containing data that have been previously categorized
Based on this training set, the algorithms finds the category that the new data points belong to
We do not know the characteristics of similarity of data in advance
Using statistical concepts, we split the datasets into sub-datasets such that the Sub-datasets have “Similar” data
Since a Training set exists, we describe this technique as Supervised learning Since Training set is not used, we describe this technique as Unsupervised learning Example:We use training dataset which categorized customers that have churned. Now based on this training set, we can classify whether a customer will churn or not. Example:We use a dataset of customers and split them into sub-datasets of customers with “similar” characteristics. Now this information can be used to market a product to a specific segment of customers that has been identified by clustering algorithm
If you want to learn about Data Mining, check out the “free Book in PDF format: Mining the massive data-sets”.
