Comparative Analysis of Supervised Learning and Unsupervised Anomaly Detection in Security Log Analysis for Post-Incident Digital Forensic Investigation
DOI:
https://doi.org/10.59261/jbt.v7i2.605Keywords:
anomaly detection, digital forensics, logistic regression, machine learning, security log analysisAbstract
Background: Attempts to perform post-incident digital forensic investigation on large-scale security logs generated by enterprise firewalls and servers introduce a range of challenges. As data grows larger and more complex, it is no longer feasible to conduct manual analysis. Methodologically, there has been only limited empirical work directly comparing supervised and unsupervised paradigms for use in a post-incident forensic framework on operational-scale, real-world logs.
Objective: This paper compares the classification performance of supervised and unsupervised machine learning methods for forensic analysis of security logs, as well as the prioritization of various security anomalies using both approaches.
Methods: Analysis of a dataset containing more than 359,000 firewall and server logs obtained over a 30-day period. Labeled events were used to implement a supervised model, Logistic Regression; Isolation Forest is an unsupervised anomaly detection method, which performs best among the models trained on normal baseline logs. Evaluation metrics included accuracy, precision, recall, ROC-AUC, and ranking-based anomaly assessment.
Results: Logistic Regression — accuracy (0.99), ROC-AUC (0.9998), precision/recall for suspicious events (1.00, 0.99) — demonstrated near-perfect discriminability of labeled behavioral features within a 24-hour period. Isolation Forest: 86% overall accuracy, 93% precision, 59% recall; excellent forensic triage property: confirmed suspicious events among the top 200 anomaly-ranked entries: 197 of 200 (92.5%). Sensitivity analysis of the contamination parameter showed that ranking precision at the top 200 remained stable within the 0.05 to 0.30 range (Fig. 7A, 7B), demonstrating the robustness of rank-based prioritization despite variability in global recall across contamination values.
Conclusion: Our results demonstrate high predictive performance for supervised classification and efficient forensic triage through low false-positive rates in unsupervised anomaly detection of both time-series logs and free-text security event logs.
References
Aggarwal, C. C. (2016). An introduction to outlier analysis. In Outlier analysis (pp. 1–34). Springer. https://doi.org/10.1007/978-3-319-47578-3
Algarni, A. M., Thayananthan, V., & Malaiya, Y. K. (2021). Quantitative assessment of cybersecurity risks for mitigating data breaches in business systems. Applied Sciences, 11(8), 3678. https://doi.org/10.3390/app11083678
Breunig, M. M., Kriegel, H.-P., Ng, R. T., & Sander, J. (2000). LOF: identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 93–104. https://doi.org/10.1145/342009.335388
Buczak, A. L., & Guven, E. (2015). A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys & Tutorials, 18(2), 1153–1176. https://doi.org/10.1109/COMST.2015.2494502
Casey, E. (2011). Digital evidence and computer crime: Forensic science, computers, and the internet. Academic press.
Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41(3), 1–58. https://doi.org/10.1145/1541880.1541882
Du, M., Li, F., Zheng, G., & Srikumar, V. (2017). Deeplog: Anomaly detection and diagnosis from system logs through deep learning. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 1285–1298. https://doi.org/10.1145/3133956.3134015
Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G., & Vázquez, E. (2009). Anomaly-based network intrusion detection: Techniques, systems and challenges. Computers & Security, 28(1–2), 18–28. https://doi.org/10.1016/j.cose.2008.08.003
Garfinkel, S. L. (2010). Digital forensics research: The next 10 years. Digital Investigation, 7, S64–S73. https://doi.org/10.1016/j.diin.2010.05.009
Goldstein, M., & Uchida, S. (2016). A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PloS One, 11(4), e0152173. https://doi.org/10.1371/journal.pone.0152173
Hariri, S., Kind, M. C., & Brunner, R. J. (2019). Extended isolation forest. IEEE Transactions on Knowledge and Data Engineering, 33(4), 1479–1489. https://doi.org/10.1109/TKDE.2019.2947676
He, C. Z., Frost, T., & Pinsker, R. E. (2020). The impact of reported cybersecurity breaches on firm innovation. Journal of Information Systems, 34(2), 187–209. https://doi.org/10.2308/isys-18-053
Liu, F. T., Ting, K. M., & Zhou, Z.-H. (2008). Isolation forest. 2008 Eighth Ieee International Conference on Data Mining, 413–422. https://doi.org/10.1109/ICDM.2008.17
Nayerifard, T., Amintoosi, H., Bafghi, A. G., & Dehghantanha, A. (2023). Machine learning in digital forensics: a systematic literature review. ArXiv Preprint ArXiv:2306.04965. https://doi.org/10.48550/arXiv.2306.04965
Pang, G., Shen, C., Cao, L., & Hengel, A. Van Den. (2021). Deep learning for anomaly detection: A review. ACM Computing Surveys (CSUR), 54(2), 1–38. https://doi.org/10.1145/3439950
Pengl, J., & Li, C.-W. (2022). Security breaches and modifications on cybersecurity disclosures. Accounting and Management Information Systems, 21(3), 452–470.
Reith, M., Carr, C., & Gunsch, G. (2002). An examination of digital forensic models. International Journal of Digital Evidence, 1(3), 1–12.
Shaikh, F. A., & Siponen, M. (2023). Information security risk assessments following cybersecurity breaches: The mediating role of top management attention to cybersecurity. Computers & Security, 124, 102974.
Sharafaldin, I., Lashkari, A. H., & Ghorbani, A. A. (2018). Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp, 1(2018), 108–116. https://doi.org/10.5220/0006639801080116
Vinayakumar, R., Alazab, M., Soman, K. P., Poornachandran, P., Al-Nemrat, A., & Venkatraman, S. (2019). Deep learning approach for intelligent intrusion detection system. IEEE Access, 7, 41525–41550. https://doi.org/10.1109/ACCESS.2019.2895334
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Iwan Indramana, Asto Purwanto

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA). that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.


