IMIS | Flanders Marine Institute
 

Flanders Marine Institute

Platform for marine research

IMIS

Publications | Institutes | Persons | Datasets | Projects | Maps
[ report an error in this record ]basket (0): add | show Printer-friendly version

A meta analysis study of outlier detection methods in classification
Acuna, E.; Rodriguez, C. (2004). A meta analysis study of outlier detection methods in classification, in: Proceedings of the International IPSI 2004 Conference: Symposium on Challenges in Internet and Interdisciplinary Research, Venice, Italy, November 10-15, 2004. pp. 1-25
In: (2004). Proceedings of the International IPSI 2004 Conference: Symposium on Challenges in Internet and Interdisciplinary Research, Venice, Italy, November 10-15, 2004. [S.n.]: [s.l.]. , more

Available in Authors 

Authors  Top 
  • Acuna, E.
  • Rodriguez, C.

Abstract
    An outlier is an observation that deviates so much from other observations as to arouse suspicion that it was generated by a different mechanism (Hawkins, 1980). Outlier detection has many applications, such as data cleaning, Fraud detection and network intrusion. The existence of outliers can indicate individuals or groups that have behavior very different to the most of the individuals of the dataset. Frequently, outliers are removed to improve accuracy of the estimators. But sometimes the presence of an outlier has a certain meaning which explanation can be lost if the outlier is deleted. In this work we compare detection outlier techniques based on statistical measures, clustering methods and data mining methods. In particular we compare detection of outliers using robust estimators of the center and the covariance matrix used in the Mahalanobis distance, detection of outliers using partitioning around medoids (PAM), and two data mining techniques to detect outliers: The Bay’s algorithm for distance-based outliers (Bay, 2003) y the LOF a density-based local outlier algorithm (Breunig et al., 2000). A decision on doubtful outliers is taken by looking into two visualization techniques for high dimensional data: The parallel coordinate plot and the surveyplot. The comparison is carried out in 15 datasets.

All data in IMIS is subject to the VLIZ privacy policy Top | Authors