Imbalanced Anomaly Detector



Dealing with imbalanced data distributions is a classic and challenging problem in anomaly detection tasks. When the number of normal instances significantly outweighs the abnormal ones, our machine learning models can easily become biased toward the majority class.

Inspired by Manufacturing Anomaly Detection, I conducted a personal project focused on identifying low-quality wine. In this post, I present an experiment comparing various anomaly and outlier detection techniques on a highly imbalanced dataset. I also explore the characteristics of the dataset, the evaluation metrics employed, and the performance of multiple algorithms to determine which approach yields the best results.
Pipeline

1. The Dataset

For this experiment, we utilized the Wine Quality Dataset (Red Wine).

To frame this as an anomaly detection problem, we applied the following transformations:

  • Normal Class (0): Wine quality score ≤ 6
  • Abnormal Class (1): Wine quality score > 6
dist image

Class Distribution:

  • Normal (0): 1,382 instances
  • Abnormal (1): 217 instances

This yields a roughly 85:15 class imbalance.

a) EDA

variable distㄴ I visualized independent variables (features) distribution for check anomaly data's pattern.variable distㄴ Analyzing pairwise distrubutions to identify clusters is a highly effective approach across various use cases.

2. Evaluation Metric: Equal Error Rate (EER)

When evaluating models for anomaly detection, relying purely on accuracy is very misleading due to the severe class imbalance. Predicting everything as "Normal" would still net us 86% accuracy!

Instead, we focus on the Equal Error Rate (EER).

  • FAR (False Acceptance Rate): The rate at which abnormal instances are incorrectly classified as normal (= False Positive Rate, FPR).
  • FRR (False Rejectance Rate): The rate at which normal instances are incorrectly classified as abnormal (= False Negative Rate, FNR).

The EER is the point on the ROC curve where FAR = FRR. A lower EER indicates a more stable and accurate model, as it balances the trade-off between falsely accepting anomalies and falsely rejecting normal data.


3. The Experiments & Models Tested

We tested an array of classification, clustering, and dedicated outlier detection algorithms on this dataset.

Here are the tested models and their underlying approach:

  1. PCA / Autoencoder: Reconstruction error based detection. Standard data is easily reconstructed (low error), while anomalies produce high reconstruction error.
  2. KNN (K-Nearest Neighbors): Distance-based approach. Anomalies lie far away from their K nearest neighbors.
  3. KMeans: Clustering approach. Anomalies are far from any cluster centroid.
  4. MoG (Mixture of Gaussians): Density-based approach calculating the probability of data points under a fitted distribution.
  5. Isolation Forest: Tree-based approach explicitly isolating anomalies rather than profiling normal data.
  6. One-Class SVM: A classic semi-supervised method capturing the density of the majority class using an RBF kernel.
  7. Weighted KNN: Distance-based approach similar to standard KNN but weighting the votes by inverse distance.
  8. ABOD (Angle-Based Outlier Detection): An approach measuring the variance of angles between data points. Normal points within a cluster have high variance in angles to other points, whereas outliers on the periphery have a much smaller variance.

4. Results & Findings

After transforming the data via standardization and testing the models side-by-side on the test set, we calculated the EER & Confusion Matrix for each.

variable distvariable dist
ModelEqual Error Rate (EER)Performance relative to others
Weighted KNN0.1915🏆 Best
KNN0.2170🥈 Excellent
Isolation Forest0.4432Good
PCA (Autoencoder-based)0.4468Good
KMeans0.5092Average
One-Class SVM0.5314Average
ABOD0.6264Poor
Mixture of Gaussians (MoG)0.7656Worst

Key Takeaways

  • Distance-Based Methods Won: In this specific dataset, standard distance-based methods like KNN and Weighted KNN dominated the leaderboard. The local distance weighting provided by Weighted KNN pushed its EER score below 0.20, effectively trading off false acceptances and false rejections exceptionally well.
  • Classic Dedicated Outlier Detectors Struggled: Interestingly, algorithms built specifically for anomaly detection, such as One-Class SVM and ABOD, performed mostly averagely or poorly. One-Class SVM achieved an EER of 0.5314, showing that relying purely on the density estimation of an RBF kernel may struggle to find a good margin inside a highly dense and potentially overlapping feature space.
  • Tree-based Isolation works reasonably well: Isolation Forest achieved a modest EER of 0.4432, maintaining its reputation as a consistent and robust baseline algorithm for out-of-the-box outlier detection tasks.

Conclusion

Anomaly detection often requires significant trial and error. As highlighted here, complex or specialized models (like ABOD or MoG) do not always outperform simpler distance-based models (like Weighted KNN) depending on the distribution of the dataset at hand. Understanding and utilizing the Equal Error Rate (EER) gives us a reliable lens to compare these models fairly against the heavily skewed class distributions we face in the real world.