FRaC (Feature Regression and Classification) Anomaly Detection Algorithm


Abstract

FRaC is a new general approach to the anomaly detection problem; that is, the task of identifying instances that come from a different class or distribution than the majority (unsupervised anomaly detection) or a set of verified “normal” data (semi-supervised anomaly detection).

Traditional approaches typically compare the position of a new data point to the set of “normal” training data points in a chosen representation of the feature space. For some data sets, the normal data may not have discernible positions in feature space, but do have consistent relationships among the features that fail to appear in anomalies.

Our approach is to learn predictive models of the relationships among the features of “normal” data and to identify anomalies as instances whose features fail to match the predictions of the learned models.

The key to making this approach work is to precisely quantify the amount of evidence provided by each observation. To this end, we have developed a novel, information-theoretic anomaly measure that combines the contributions of all feature models.

Our experimental results consistently show That FRaC is a superior general approach to anomaly detection, and that it is particularly robust against noisy and irrelevant features.

Source Code

Data Mining 2011 Paper

K. Noto, C. E. Brodley, and D. Slonim.
FRaC: A Feature-Modeling Appraoch for Semi-Supervised and Unsupervised Anomaly Detection.

Data Mining and Knowledge Discovery, 25(1), pp.109—133, 2011.

ICDM 2010 Paper

K. Noto, C. E. Brodley, and D. Slonim.
Anomaly Detection Using an Ensemble of Feature Models.
Proceedings of the 10th IEEE International Conference on Data Mining (ICDM 2010).
IEEE Computer Society Press.

References

[LOF] M.M. Breunig, H.P. Kriegel, R.T. Ng and J. Sander. LOF: identifying density-based local outliers. ACM SIGMOD Record 29(2). 2000.
[COF] J. Tang, Z. Chen, A.W. Fu and D.W. Cheung. Enhancing effectiveness of outlier detections for low density patterns. Lecture notes in computer science (Springer). 2002.
[1-Class SVMs] B. Schölkopf, A. J. Smola, R. C. Williamson and P. L. Bartlett. New support vector algorithms. Neural Computation 12(5). 2000.
[WEKA] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann and I. H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1). 2009.
[LIBSVM] C-C Chang and C-J Lin. LIBSVM: A library for support vector machines. 2001.
[UCI Repository] A. Asuncion and D.J. Newman. UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences. 2007.
[Feature Bagging] A. Lazarevic and V. Kumar. Feature Bagging for Outlier Detection. Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2005.