|A meta analysis study of outlier detection methods in classification|
Acuna, E.; Rodriguez, C. (2004). A meta analysis study of outlier detection methods in classification, in: Proceedings of the International IPSI 2004 Conference: Symposium on Challenges in Internet and Interdisciplinary Research, Venice, Italy, November 10-15, 2004. pp. 1-25
In: (2004). Proceedings of the International IPSI 2004 Conference: Symposium on Challenges in Internet and Interdisciplinary Research, Venice, Italy, November 10-15, 2004. [S.n.]: [s.l.]. , more
|Authors|| || Top |
An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism (Hawkins, 1980). Outlier detection has many applications, such as data cleaning, Fraud detection and network intrusion. The existence of outliers can indicate individuals or groups that have behavior very different to the most of the individuals of the dataset. Frequently, outliers are removed to improve accuracy of the estimators. But sometimes the presence of an outlier has a certain meaning which explanation can be lost if the outlier is deleted. In this work we compare detection outlier techniques based on statistical measures, clustering methods and data mining methods. In particular we compare detection of outliers using robust estimators of the center and the covariance matrix used in the Mahalanobis distance, detection of outliers using partitioning around medoids (PAM), and two data mining techniques to detect outliers: The Bay’s algorithm for distance-based outliers (Bay, 2003) y the LOF a density-based local outlier algorithm (Breunig et al., 2000). A decision on doubtful outliers is taken by looking into two visualization techniques for high dimensional data: The parallel coordinate plot and the surveyplot. The comparison is carried out in 15 datasets.