FRaC is a new general approach to the anomaly detection problem; that is, the task of identifying instances that come from a different class or distribution than the majority (unsupervised anomaly detection) or a set of verified “normal” data (semi-supervised anomaly detection).
Traditional approaches typically compare the position of a new data point to the set of “normal” training data points in a chosen representation of the feature space. For some data sets, the normal data may not have discernible positions in feature space, but do have consistent relationships among the features that fail to appear in anomalies.
Our approach is to learn predictive models of the relationships among the features of “normal” data and to identify anomalies as instances whose features fail to match the predictions of the learned models.
The key to making this approach work is to precisely quantify the amount of evidence provided by each observation. To this end, we have developed a novel, information-theoretic anomaly measure that combines the contributions of all feature models.
Our experimental results consistently show That FRaC is a superior general approach to anomaly detection, and that it is particularly robust against noisy and irrelevant features.
-N
and -p
).
Also, the procedures are not guaranteed to be identical
between the two version
(e.g., The Python version may randomly reorder the training set before doing cross-validation
in a different way than the C++ version).