Master Thesis

The increasing diversity and amount of malware traffic is pushing researchers to find better detection methods. When security practitioners analyze such large amount of traffic, they are usually overwhelmed and, therefore they analyze each time less traffic with less accuracy. This overwhelming problem data happens even when companies filter out part of their outgoing traffic. Given that users inside a company mostly need web services to work, it is usual only allow web traffic is going out of the enterprise. However, malware is aware of this filtering, and in the last years, we have witnessed a shift in malware towards using web services for their connections. For analyzing HTTP/S traffic, the default unit of analysis is called a weblog, from a log for the web traffic. These weblogs are used to find threats in the network, but a significant amount of expertise is needed for doing so. The required knowledge ranges from looking for domains which have been reported as malicious, to analyzing the patterns in the URLs and using the WHOIS information of the domains. These techniques highly depend on humans. All in all, analyzing millions of weblogs with speed and accuracy, balancing the amount of information and finding threats is at least a daunting task. Security analysts need a tool to help them organize their work and a machine learning algorithm that can improve the detection and speed up the analysis. It is in this context that we researched and created a new tool to assist the network security analysts to find threats: the ManaTI project. This project has two primary goals: First, to help analysts by means of a web interface, in evaluating the weblogs to better find and process the information. Second, to create a machine learning method that can identify domains which share some similarity in their WHOIS Information. Our algorithm can work as a WHOIS classification of similar domains also called WHOIS similarity distance. The conclusions of our research are: First, ManaTI can increase the speed of the security analysts by a factor of 3.4. Second, the WHOIS information of related domains has quantifiable similarities that make possible an accurate comparison. Third, there are WHOIS fields which are more important for relating domains than others. Finally, the accuracy of finding related domains using a linear model classifier based on the WHOIS Similarity Distance algorithm is around 98% .

Download this thesis from here

MANATI: WEB ASSISTANCE FOR THE THREAT ANALYSIS SUPPORTED BY DOMAIN SIMILARITY

Protecting the civil society through high quality research