Vortragsprogramm/2011/Vortrag Web Application Forensics

Wann/When: 2013-06-11 20:00
Wo/Where: das Labor
Wer/Who: Jens
Sprache/Language: Deutsch, Englisch auf Anfrage / german, english on request
Code: https://github.com/jensvoid/lorg

summary

Logfiles are the primary source of information to reconstruct events when network services are compromised.

details

Logfiles are the primary source of information to reconstruct events when network services are compromised. However, extracting the relevant information from huge files can be a difficult task. In this talk, we compare various state of the art approaches to detect attacks against web applications within real-world HTTP traffic logs, including signature-based, statistical and machine learning techniques. Detected incidents are subsequently classified into hand-crafted and automated to distinguish whether the attacker is a man or a machine. Furthermore we present a new approach on how to quantify attacks in terms of success or failure, based on anomalies within the size of HTTP responses which can be derived from logfiles alone.

Much has been said about the latest rise of cyber attacks1 and the need for programs to reliably detect them. Post-attack forensics are fundamental for web server administrators to reconstruct process and impact of intrusions and therefore understand and fix programming errors within web applications (instead of just setting them back up). While there has been active research in the field of detecting attacks against web applications in the last years (compare [KVR05], [CAG09] or [GJ12]), few practical implementations have turned up so far. In this talk we, we share experiences made with ‘LORG’[Mül12] – a soon to be published open source framework for web log anomaly detection. We have implemented various existing approaches from the scientific community and tested them in application with log data derived from real-world intrusions. We give an overview about their advantages and drawbacks with a special focus on the comparison of signature-based methods and learning-based techniques: While blacklisting of exploit signatures or regular expressions that are supposed to be malicious has proven to be a robust practice in intrusion detection, the approach necessarily lacks in identifying attacks previously unencountered. Machine-learning based methods which could close these gaps in expectation are said to be difficult to adopt and require intensive manual training. Our observations however show that given enough log data, unsupervised learning-based methods like hidden Markov models can scale very well in terms of detection- and false positive rates and have proven to be practicable in real-world scenarios as shown in figure .1. What is definitely missing in present web log forensic tools is context: A system administrator might not want to care about yet another random scan from some botnet, but ‘a targeted attack from Seattle yesterday, 7pm’ is a reason to take a closer look. We therefore aim to focus on classification and quantification of attacks. We classify all sessions of a potentially malicious client into either automated or hand-crafted, based on multi-feature traffic pattern analysis and quantify attacks in terms of success or failure, based on the outlyingness of size of responses. Our tests with the LOF [BKNS00] algorithm show that rating of attack severity using data found solely within the web log files is possible. Besides, we present various ways of visualizing detected incidents and enhance auto-generated reports with geolocations and botnet membership information using geotargeting and DNSBL lookups.