Jump to content

User:SummerNightmare2023/Anomaly detection

From Wikipedia, the free encyclopedia

History of Intrusion Detection in Anomaly Detection

[edit]

The concept of intrusion detection, a critical component of anomaly detection, has evolved significantly over time. Initially, it was a manual process where system administrators would monitor for unusual activities, such as a vacationing user’s account being accessed or unexpected printer activity. This approach was not scalable and was soon superseded by the analysis of audit logs and system logs for signs of malicious behavior.[1]

By the late 1970s and early 1980s, the analysis of these logs was primarily used retrospectively to investigate incidents, as the volume of data made it impractical for real-time monitoring. The affordability of digital storage eventually led to audit logs being analyzed online, with specialized programs being developed to sift through the data. These programs, however, were typically run during off-peak hours due to their computational intensity.[1]

The 1990s brought the advent of real-time intrusion detection systems capable of analyzing audit data as it was generated, allowing for immediate detection of and response to attacks. This marked a significant shift towards proactive intrusion detection.[1]

As the field has continued to develop, the focus has shifted to creating solutions that can be efficiently implemented across large and complex network environments, adapting to the ever-growing variety of security threats and the dynamic nature of modern computing infrastructures.[1]

Definition of Anomalies in High Dimensional Context

[edit]

In this big data era, the focus is increasingly on methodologies capable of handling the complexity and scale of data, going beyond traditional approaches to define and detect anomalies in a way that is both effective and efficient for today's data-driven decision-making processes.[2]

  • Anomalies in high-dimensional spaces are more challenging to identify due to the sparsity of the data and the relative distance between points becoming less meaningful.[2]
  • Traditional threshold-based methods become less effective as dimensionality increases, often requiring more sophisticated, multidimensional analysis techniques.[2]
  • High dimensional anomaly detection often requires careful consideration of the feature selection to reduce dimensionality and enhance the sensitivity to true anomalies.[2]


Applications

[edit]

Anomaly detection is applicable in a very large number and variety of domains, and is an important subarea of unsupervised machine learning. As such it has applications in cyber-security, intrusion detection, fraud detection, including the evolving domain of Fintech as demonstrated by Stojanović et al.[3] fault detection, system health monitoring, event detection in sensor networks, detecting ecosystem disturbances, defect detection in images using machine vision, medical diagnosis and law enforcement.[4]

Intrusion detection

[edit]

Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986.[5] Anomaly detection for IDS is normally accomplished with thresholds and statistics, but can also be done with soft computing, and inductive learning.[6] Types of features proposed by 1999 included profiles of users, workstations, networks, remote hosts, groups of users, and programs based on frequencies, means, variances, covariances, and standard deviations.[7] The counterpart of anomaly detection in intrusion detection is misuse detection.

Fintech Fraud Detection

[edit]

Anomaly detection is vital in Fintech for fraud prevention.[3][8]

Preprocessing

[edit]

Preprocessing data to remove anomalies can be an important step in data analysis, and is done for a number of reasons. Statistics such as the mean and standard deviation are more accurate after the removal of anomalies, and the visualisation of data can also be improved. In supervised learning, removing the anomalous data from the dataset often results in a statistically significant increase in accuracy.[9][10]

Video Surveillance

[edit]

Anomaly detection has become increasingly vital in video surveillance to enhance security and safety.[11] [12]With the advent of deep learning technologies, methods using Convolutional Neural Networks (CNNs) and Simple Recurrent Units (SRUs) have shown significant promise in identifying unusual activities or behaviors in video data.[11] These models can process and analyze extensive video feeds in real-time, recognizing patterns that deviate from the norm, which may indicate potential security threats or safety violations.[11]

In IT infrastructure management, anomaly detection is crucial for ensuring the smooth operation and reliability of services.[13] Techniques like the IT Infrastructure Library (ITIL) and monitoring frameworks are employed to track and manage system performance and user experience.[13] Detection anomalies can help identify and pre-empt potential performance degradations or system failures, thus maintaining productivity and business process effectiveness.[13]

IoT Systems

[edit]

Anomaly detection is critical for the security and efficiency of Internet of Things (IoT) systems.[14] It helps in identifying system failures and security breaches in complex networks of IoT devices.[14] The methods must manage real-time data, diverse device types, and scale effectively. Garbe et al. [15]have introduced a multi-stage anomaly detection framework that improves upon traditional methods by incorporating spatial clustering, density-based clustering, and locality-sensitive hashing. This tailored approach is designed to better handle the vast and varied nature of IoT data, thereby enhancing security and operational reliability in smart infrastructure and industrial IoT systems.[15]

Petroleum Industry Applications

[edit]

Anomaly detection is crucial in the petroleum industry for monitoring critical machinery.[16] Martí et al. used a novel segmentation algorithm to analyze sensor data for real-time anomaly detection.[16] This approach helps promptly identify and address any irregularities in sensor readings, ensuring the reliability and safety of petroleum operations.[16]

Oil and Gas Pipeline Monitoring

[edit]

In the oil and gas sector, anomaly detection is not just crucial for maintenance and safety, but also for environmental protection.[17] Aljameel et al. propose an advanced machine learning-based model for detecting minor leaks in oil and gas pipelines, a task traditional methods may miss.[17]

[edit]

Many anomaly detection techniques have been proposed in literature.[18][19] The performance of methods usually depend on the data sets. For example, some may be suited to detecting local outliers, while others global, and methods have little systematic advantages over another when compared across many data sets.[20][21] Almost all algorithms also require the setting of non-intuitive parameters critical for performance, and usually unknown before application. Some of the popular techniques are mentioned below and are broken down into categories:

Statistical

[edit]

Parameter-free

[edit]

Parametric-based

[edit]

Density

[edit]

Neural networks

[edit]
  • Replicator neural networks,[34] autoencoders, variational autoencoders,[35] long short-term memory neural networks[36]
  • Bayesian networks[34]
  • Hidden Markov models (HMMs)[34]
  • Minimum Covariance Determinant[37][38]
  • Deep Learning[11]
    • Convolutional Neural Networks (CNNs): CNNs have shown exceptional performance in the unsupervised learning domain for anomaly detection, especially in image and video data analysis.[11]Their ability to automatically and hierarchically learn spatial hierarchies of features from low to high-level patterns makes them particularly suited for detecting visual anomalies. For instance, CNNs can be trained on image datasets to identify atypical patterns indicative of defects or out-of-norm conditions in industrial quality control scenarios.[39]
    • Simple Recurrent Units (SRUs): In time-series data, SRUs, a type of recurrent neural network, have been effectively used for anomaly detection by capturing temporal dependencies and sequence anomalies.[11]Unlike traditional RNNs, SRUs are designed to be faster and more parallelizable, offering a better fit for real-time anomaly detection in complex systems such as dynamic financial markets or predictive maintenance in machinery, where identifying temporal irregularities promptly is crucial.[40]

Cluster based

[edit]

Ensembles

[edit]

Others

[edit]

Anomaly Detection in Dynamic Networks

[edit]

Dynamic networks, such as those representing financial systems, social media interactions, and transportation infrastructure, are subject to constant change, making anomaly detection within them a complex task.[49]Unlike static graphs, dynamic networks reflect evolving relationships and states, requiring adaptive techniques for anomaly detection.[49]

Types of Anomalies in Dynamic Networks

[edit]
  1. Community anomalies[49]
  2. Compression anomalies[49]
  3. Decomposition anomalies[49]
  4. Distance anomalies[49]
  5. Probabilistic model anomalies[49]

References

[edit]
  1. ^ a b c d Kemmerer, R.A.; Vigna, G. (2002-04). "Intrusion detection: a brief history and overview". Computer. 35 (4): supl27 – supl30. doi:10.1109/mc.2002.1012428. ISSN 0018-9162. {{cite journal}}: Check date values in: |date= (help)
  2. ^ a b c d Thudumu, Srikanth; Branch, Philip; Jin, Jiong; Singh, Jugdutt (Jack) (2020-07-02). "A comprehensive survey of anomaly detection techniques for high dimensional big data". Journal of Big Data. 7 (1): 42. doi:10.1186/s40537-020-00320-x. ISSN 2196-1115.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  3. ^ a b Stojanović, Branka; Božić, Josip; Hofer-Schmitz, Katharina; Nahrgang, Kai; Weber, Andreas; Badii, Atta; Sundaram, Maheshkumar; Jordan, Elliot; Runevic, Joel (2021-01). "Follow the Trail: Machine Learning for Fraud Detection in Fintech Applications". Sensors. 21 (5): 1594. doi:10.3390/s21051594. ISSN 1424-8220. {{cite journal}}: Check date values in: |date= (help)CS1 maint: unflagged free DOI (link)
  4. ^ Aggarwal, Charu (2017). Outlier Analysis. Springer Publishing Company, Incorporated. ISBN 978-3319475776.
  5. ^ Denning, D. E. (1987). "An Intrusion-Detection Model" (PDF). IEEE Transactions on Software Engineering. SE-13 (2): 222–232. CiteSeerX 10.1.1.102.5127. doi:10.1109/TSE.1987.232894. S2CID 10028835. Archived (PDF) from the original on June 22, 2015.
  6. ^ Teng, H. S.; Chen, K.; Lu, S. C. (1990). "Adaptive real-time anomaly detection using inductively generated sequential patterns". Proceedings. 1990 IEEE Computer Society Symposium on Research in Security and Privacy (PDF). pp. 278–284. doi:10.1109/RISP.1990.63857. ISBN 978-0-8186-2060-7. S2CID 35632142.
  7. ^ Jones, Anita K.; Sielken, Robert S. (1999). "Computer System Intrusion Detection: A Survey". Technical Report, Department of Computer Science, University of Virginia, Charlottesville, VA. CiteSeerX 10.1.1.24.7802.
  8. ^ Ahmed, Mohiuddin; Mahmood, Abdun Naser; Islam, Md. Rafiqul (2016-02). "A survey of anomaly detection techniques in financial domain". Future Generation Computer Systems. 55: 278–288. doi:10.1016/j.future.2015.01.001. ISSN 0167-739X. {{cite journal}}: Check date values in: |date= (help)
  9. ^ Tomek, Ivan (1976). "An Experiment with the Edited Nearest-Neighbor Rule". IEEE Transactions on Systems, Man, and Cybernetics. 6 (6): 448–452. doi:10.1109/TSMC.1976.4309523.
  10. ^ Smith, M. R.; Martinez, T. (2011). "Improving classification accuracy by identifying and removing instances that should be misclassified" (PDF). The 2011 International Joint Conference on Neural Networks. p. 2690. CiteSeerX 10.1.1.221.1371. doi:10.1109/IJCNN.2011.6033571. ISBN 978-1-4244-9635-8. S2CID 5809822.
  11. ^ a b c d e f "Video anomaly detection system using deep convolutional and recurrent models". Results in Engineering. 18: 101026. 2023-06-01. doi:10.1016/j.rineng.2023.101026. ISSN 2590-1230.
  12. ^ Zhang, Tan; Chowdhery, Aakanksha; Bahl, Paramvir (Victor); Jamieson, Kyle; Banerjee, Suman (2015-09-07). "The Design and Implementation of a Wireless Video Surveillance System". Proceedings of the 21st Annual International Conference on Mobile Computing and Networking. MobiCom '15. New York, NY, USA: Association for Computing Machinery: 426–438. doi:10.1145/2789168.2790123. ISBN 978-1-4503-3619-2.
  13. ^ a b c "Anomaly Detection in Complex Real World Application Systems". ieeexplore.ieee.org. Retrieved 2023-11-08.
  14. ^ a b Chatterjee, Ayan; Ahmed, Bestoun S. (2022-08). "IoT anomaly detection methods and applications: A survey". Internet of Things. 19: 100568. doi:10.1016/j.iot.2022.100568. ISSN 2542-6605. {{cite journal}}: Check date values in: |date= (help)
  15. ^ a b Garg, Sahil; Kaur, Kuljeet; Batra, Shalini; Kaddoum, Georges; Kumar, Neeraj; Boukerche, Azzedine (2020-03-01). "A multi-stage anomaly detection scheme for augmenting the security in IoT-enabled applications". Future Generation Computer Systems. 104: 105–118. doi:10.1016/j.future.2019.09.038. ISSN 0167-739X.
  16. ^ a b c Martí, Luis; Sanchez-Pi, Nayat; Molina, José Manuel; Garcia, Ana Cristina Bicharra (2015-02). "Anomaly Detection Based on Sensor Data in Petroleum Industry Applications". Sensors. 15 (2): 2774–2797. doi:10.3390/s150202774. ISSN 1424-8220. {{cite journal}}: Check date values in: |date= (help)CS1 maint: unflagged free DOI (link)
  17. ^ a b Aljameel, Sumayh S.; Alomari, Dorieh M.; Alismail, Shatha; Khawaher, Fatimah; Alkhudhair, Aljawharah A.; Aljubran, Fatimah; Alzannan, Razan M. (2022-08). "An Anomaly Detection Model for Oil and Gas Pipelines Using Machine Learning". Computation. 10 (8): 138. doi:10.3390/computation10080138. ISSN 2079-3197. {{cite journal}}: Check date values in: |date= (help)CS1 maint: unflagged free DOI (link)
  18. ^ Chandola, V.; Banerjee, A.; Kumar, V. (2009). "Anomaly detection: A survey". ACM Computing Surveys. 41 (3): 1–58. doi:10.1145/1541880.1541882. S2CID 207172599.
  19. ^ Zimek, Arthur; Filzmoser, Peter (2018). "There and back again: Outlier detection between statistical reasoning and data mining algorithms" (PDF). Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 8 (6): e1280. doi:10.1002/widm.1280. ISSN 1942-4787. S2CID 53305944. Archived from the original (PDF) on 2021-11-14. Retrieved 2019-12-09.
  20. ^ Campos, Guilherme O.; Zimek, Arthur; Sander, Jörg; Campello, Ricardo J. G. B.; Micenková, Barbora; Schubert, Erich; Assent, Ira; Houle, Michael E. (2016). "On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study". Data Mining and Knowledge Discovery. 30 (4): 891. doi:10.1007/s10618-015-0444-8. ISSN 1384-5810. S2CID 1952214.
  21. ^ Anomaly detection benchmark data repository of the Ludwig-Maximilians-Universität München; Mirror Archived 2022-03-31 at the Wayback Machine at University of São Paulo.
  22. ^ Knorr, E. M.; Ng, R. T.; Tucakov, V. (2000). "Distance-based outliers: Algorithms and applications". The VLDB Journal the International Journal on Very Large Data Bases. 8 (3–4): 237–253. CiteSeerX 10.1.1.43.1842. doi:10.1007/s007780050006. S2CID 11707259.
  23. ^ Ramaswamy, S.; Rastogi, R.; Shim, K. (2000). Efficient algorithms for mining outliers from large data sets. Proceedings of the 2000 ACM SIGMOD international conference on Management of data – SIGMOD '00. p. 427. doi:10.1145/342009.335437. ISBN 1-58113-217-4.
  24. ^ Angiulli, F.; Pizzuti, C. (2002). Fast Outlier Detection in High Dimensional Spaces. Principles of Data Mining and Knowledge Discovery. Lecture Notes in Computer Science. Vol. 2431. p. 15. doi:10.1007/3-540-45681-3_2. ISBN 978-3-540-44037-6.
  25. ^ Breunig, M. M.; Kriegel, H.-P.; Ng, R. T.; Sander, J. (2000). LOF: Identifying Density-based Local Outliers (PDF). Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. SIGMOD. pp. 93–104. doi:10.1145/335191.335388. ISBN 1-58113-217-4.
  26. ^ Liu, Fei Tony; Ting, Kai Ming; Zhou, Zhi-Hua (December 2008). "Isolation Forest". 2008 Eighth IEEE International Conference on Data Mining. pp. 413–422. doi:10.1109/ICDM.2008.17. ISBN 9780769535029. S2CID 6505449.
  27. ^ Liu, Fei Tony; Ting, Kai Ming; Zhou, Zhi-Hua (March 2012). "Isolation-Based Anomaly Detection". ACM Transactions on Knowledge Discovery from Data. 6 (1): 1–39. doi:10.1145/2133360.2133363. S2CID 207193045.
  28. ^ Schubert, E.; Zimek, A.; Kriegel, H. -P. (2012). "Local outlier detection reconsidered: A generalized view on locality with applications to spatial, video, and network outlier detection". Data Mining and Knowledge Discovery. 28: 190–237. doi:10.1007/s10618-012-0300-z. S2CID 19036098.
  29. ^ Kriegel, H. P.; Kröger, P.; Schubert, E.; Zimek, A. (2009). Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data. Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science. Vol. 5476. p. 831. doi:10.1007/978-3-642-01307-2_86. ISBN 978-3-642-01306-5.
  30. ^ Kriegel, H. P.; Kroger, P.; Schubert, E.; Zimek, A. (2012). Outlier Detection in Arbitrarily Oriented Subspaces. 2012 IEEE 12th International Conference on Data Mining. p. 379. doi:10.1109/ICDM.2012.21. ISBN 978-1-4673-4649-8.
  31. ^ Fanaee-T, H.; Gama, J. (2016). "Tensor-based anomaly detection: An interdisciplinary survey". Knowledge-Based Systems. 98: 130–147. doi:10.1016/j.knosys.2016.01.027. S2CID 16368060.
  32. ^ Zimek, A.; Schubert, E.; Kriegel, H.-P. (2012). "A survey on unsupervised outlier detection in high-dimensional numerical data". Statistical Analysis and Data Mining. 5 (5): 363–387. doi:10.1002/sam.11161. S2CID 6724536.
  33. ^ Schölkopf, B.; Platt, J. C.; Shawe-Taylor, J.; Smola, A. J.; Williamson, R. C. (2001). "Estimating the Support of a High-Dimensional Distribution". Neural Computation. 13 (7): 1443–71. CiteSeerX 10.1.1.4.4106. doi:10.1162/089976601750264965. PMID 11440593. S2CID 2110475.
  34. ^ a b c Hawkins, Simon; He, Hongxing; Williams, Graham; Baxter, Rohan (2002). "Outlier Detection Using Replicator Neural Networks". Data Warehousing and Knowledge Discovery. Lecture Notes in Computer Science. Vol. 2454. pp. 170–180. CiteSeerX 10.1.1.12.3366. doi:10.1007/3-540-46145-0_17. ISBN 978-3-540-44123-6. S2CID 6436930.
  35. ^ J. An and S. Cho, "Variational autoencoder based anomaly detection using reconstruction probability", 2015.
  36. ^ Malhotra, Pankaj; Vig, Lovekesh; Shroff, Gautman; Agarwal, Puneet (22–24 April 2015). Long Short Term Memory Networks for Anomaly Detection in Time Series. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium).
  37. ^ Hubert, Mia; Debruyne, Michiel; Rousseeuw, Peter J. (2018). "Minimum covariance determinant and extensions". WIREs Computational Statistics. 10 (3). doi:10.1002/wics.1421. ISSN 1939-5108. S2CID 67227041.
  38. ^ Hubert, Mia; Debruyne, Michiel (2010). "Minimum covariance determinant". WIREs Computational Statistics. 2 (1): 36–43. doi:10.1002/wics.61. ISSN 1939-0068. S2CID 123086172.
  39. ^ Alzubaidi, Laith; Zhang, Jinglan; Humaidi, Amjad J.; Al-Dujaili, Ayad; Duan, Ye; Al-Shamma, Omran; Santamaría, J.; Fadhel, Mohammed A.; Al-Amidie, Muthana; Farhan, Laith (2021-03-31). "Review of deep learning: concepts, CNN architectures, challenges, applications, future directions". Journal of Big Data. 8 (1): 53. doi:10.1186/s40537-021-00444-8. ISSN 2196-1115. PMC 8010506. PMID 33816053.{{cite journal}}: CS1 maint: PMC format (link) CS1 maint: unflagged free DOI (link)
  40. ^ Belay, Mohammed Ayalew; Blakseth, Sindre Stenen; Rasheed, Adil; Salvo Rossi, Pierluigi (2023-01). "Unsupervised Anomaly Detection for IoT-Based Multivariate Time Series: Existing Solutions, Performance Analysis and Future Directions". Sensors. 23 (5): 2844. doi:10.3390/s23052844. ISSN 1424-8220. {{cite journal}}: Check date values in: |date= (help)CS1 maint: unflagged free DOI (link)
  41. ^ He, Z.; Xu, X.; Deng, S. (2003). "Discovering cluster-based local outliers". Pattern Recognition Letters. 24 (9–10): 1641–1650. Bibcode:2003PaReL..24.1641H. CiteSeerX 10.1.1.20.4242. doi:10.1016/S0167-8655(03)00003-5.
  42. ^ Campello, R. J. G. B.; Moulavi, D.; Zimek, A.; Sander, J. (2015). "Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection". ACM Transactions on Knowledge Discovery from Data. 10 (1): 5:1–51. doi:10.1145/2733381. S2CID 2887636.
  43. ^ Lazarevic, A.; Kumar, V. (2005). "Feature bagging for outlier detection". Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. pp. 157–166. CiteSeerX 10.1.1.399.425. doi:10.1145/1081870.1081891. ISBN 978-1-59593-135-1. S2CID 2054204.
  44. ^ Nguyen, H. V.; Ang, H. H.; Gopalkrishnan, V. (2010). Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces. Database Systems for Advanced Applications. Lecture Notes in Computer Science. Vol. 5981. p. 368. doi:10.1007/978-3-642-12026-8_29. ISBN 978-3-642-12025-1.
  45. ^ Kriegel, H. P.; Kröger, P.; Schubert, E.; Zimek, A. (2011). Interpreting and Unifying Outlier Scores. Proceedings of the 2011 SIAM International Conference on Data Mining. pp. 13–24. CiteSeerX 10.1.1.232.2719. doi:10.1137/1.9781611972818.2. ISBN 978-0-89871-992-5.
  46. ^ Schubert, E.; Wojdanowski, R.; Zimek, A.; Kriegel, H. P. (2012). On Evaluation of Outlier Rankings and Outlier Scores. Proceedings of the 2012 SIAM International Conference on Data Mining. pp. 1047–1058. doi:10.1137/1.9781611972825.90. ISBN 978-1-61197-232-0.
  47. ^ Zimek, A.; Campello, R. J. G. B.; Sander, J. R. (2014). "Ensembles for unsupervised outlier detection". ACM SIGKDD Explorations Newsletter. 15: 11–22. doi:10.1145/2594473.2594476. S2CID 8065347.
  48. ^ Zimek, A.; Campello, R. J. G. B.; Sander, J. R. (2014). Data perturbation for outlier detection ensembles. Proceedings of the 26th International Conference on Scientific and Statistical Database Management – SSDBM '14. p. 1. doi:10.1145/2618243.2618257. ISBN 978-1-4503-2722-0.
  49. ^ a b c d e f g Ranshous, Stephen; Shen, Shitian; Koutra, Danai; Harenberg, Steve; Faloutsos, Christos; Samatova, Nagiza F. (2015-05). "Anomaly detection in dynamic networks: a survey". WIREs Computational Statistics. 7 (3): 223–247. doi:10.1002/wics.1347. ISSN 1939-5108. {{cite journal}}: Check date values in: |date= (help)