Anomaly Detection for Security
In recent years big organizations among which Sony have increasingly become victim to attacks on their web services. These attacks are mostly originating from organized crime, often targeted to money fraud. Part of the attacks, often the ones that are large scale and appear in the news headlines, are a preparation step for the fraud, for example by stealing account credentials, credit card details, etc. The actual money collection is then done in subsequent attacks that typically have a more continuous nature. These attacks are difficult to catch on the spot. When malicious actions get detected and blocked, adversaries respond by modifying their attack strategy. If they consistently keep getting blocked on a service, they go and try their luck elsewhere.
Analysis of historical traffic data reveals that part of the malicious actions can be identified by searching for anomalies in the web traffic. These anomalies are visible as sudden changes in the time series of the subsequent requests. Such anomalies are only visible in well-chosen request subsets, e.g., all requests originating from accounts in the same email domain. We know some of such subsets, but want to identify more. We are interested in using for this feature extraction algorithms known in machine learning.
We want to learn which feature extraction techniques or algorithms work best in our problem domain. Anonymized production test data will be available for verification. We want to learn also which of the anomaly detection algorithms described in literature give good results
The primary expected outcome of the thesis is showing which feature extraction algorithms and which anomaly detection algorithms work best in this problem domain. The student will be confirming this with prototype implementations.