The Stratosphere Research Laboratory currently hosts many students who are working on their bachelor and master thesis in relationship with our group.
Currently Ongoing Thesis Projects
Should I click project
The great majority of attacks, including targeted attacks, start with a link in an email or chat. When you don't have time to check or you don't know how to check it, should you click on it or not? Malicious websites can be used for phishing, exploits, crypto mining, or drive-by downloads and they are difficult to detect. Meet www.shouldiclick.org
Our solution is behind the technology of our website www.shouldiclick.org . When you enter a link into the site it can tell you if it is safe to click or not due to security concerns. We researched and implemented security-based machine learning research to find if all the information and content of a webpage is safe to click.
Behind the simplicity of this web interface we use convolutional and recurrent neural networks, classical machine learning algorithms, and statistical measurements. These algorithms analyze the content of HTML files from the website, its file structure, DNS information, certificate information and all the features provided by our partner service www.urlscan.io.
The shouldiclick.org project is a tool created in Stratosphere laboratory (www.stratosphereips.org) at the Czech Technical University in Prague to help protect the Civils society for free. This research is ongoing as part of the diploma thesis of František Střasák that will be finished this year.
Analysis and comparison of the characteristics of high performance systems and botnets
The goal of this master's thesis is to study botnets as HPC systems to demonstrate that they can resolve similar problems.
To achieve this objective, the characteristics of a traditional HPC system and those of a botnet will be measured to compare them.
To perform the comparative analysis of the thesis, the study of a botnet called Geost that was discovered in the Stratosphere laboratory will be carried out. This botnet was discovered when another botnet called Htbot, that provides proxy services, was being analyzed. The discovery came up because the Geost botmasters were using the Htbot as proxy service. Through the analysis of network traffic and the Threat intelligence process, it was possible to discover the servers (domains and IPs) used by this botnet. Simultaneously, the Android applications used to distribute the botnet (APKs) were found. Geost spreads through fake APKs (Android Application Package), which are installed by their victims. By installing these APKs, attackers capture data from victims' bank accounts. Geost will be used as a main example botnet to analyze its HPC characteristics compared to other systems.
By studying the characteristics of HPC in botnets, it is intended, to measure the performance of botnets as high-performance systems and to study what characteristics can be obtained from a botnet through the Threat Intelligence process and reverse engineering of malicious applications.
Profiling and Detection of IoT Attacks in Telnet Traffic
In the last five year the prevalence of IoT devices opened the door to a myriad of different attacks on unprotected home devices. These devices came from the factory with several vulnerabilities that can not be fixed without replacing the device. The most used protocol for this IoT devices is the Telnet protocol. However, there does not exist any tool or research or methodology to protect the devices by studying the Telnet protocol.
As part of the research task, the thesis will figure it out how to analyse the telnet protocol in order to better protect the devices by profiling the behavior of the connections in the network and by building models of the users and attackers (including automatic attackers) in order to find the best way to stop the attacks by developing methodologies that rely on behavioral techniques.
The analysis of the Telnet protocol, together with the new methodologies should help improve the detection of the attacks received from the external networks and the internal networks, including rogue users and bots.
Detection of security attacks on networks using ensembling techniques
Detecting malware and attacks by analyzing network traffic remains a challenge. Although there are several well-known detection mechanisms to accurately separate the malicious behavior of the normal, it is still extremely difficult to have a detection system that can handle all the situations that arise in the network. These known algorithms include machine learning techniques, static signatures and rules based on experience. In particular, the method most used today is based on the contribution of rules by a large community of analysts. The most important impediments to good detection are that: First, normal traffic is extremely complex, diverse and changing. Second, malicious actions change continuously, adapting, migrating and hiding as normal traffic. Third, the amount of data to analyze is huge, forcing analysts to lose data in favor of speed. And fourth, detection must occur in near real time to be of some use.
To solve some of these problems, the security learning community began to implement ensemble algorithms, or ensemble learning, in their systems. These algorithms are techniques for using, adding and summarizing information about several different detectors in a single final decision. They allow analysts to use weak detectors in series, vote on the malice of a domain and decide better blocking action based on contradictory data.
Although there were some good proposals for ensembling techniques applied to the security of the network, there are two aspects of teaching algorithms that were not fully studied. First, the application of learning assembly algorithms with community Threat Intelligence data. Secondly, there are no learning assembly algorithms that work as a function of time in the detection of the same hosts. These two problems form the basis and objectives of this thesis.
Finished Thesis Projects
Identifying Malicious Hosts by Aggregation of Partial Detections
Due to the variety of possible ways to attack a computer system, network intrusion detection has been always a very complex task. The main problem of detection tools is to balance the detection ratio with the errors. The cost of generating a false alarm can be prohibitive and should be avoided when possible. The increasing amount of attacks witnessed in the last few years makes it very necessary to have a detection tool for protecting the network. Stratosphere IPS is a free-software network intrusion detection tool which uses machine learning algorithms for identification of infected devices in the network. One of the downsides of the first version of Stratosphere IPS is that it detects individual connections and it, therefore, generates a lot of false alarms. This thesis proposes to design, implement and test a machine learning improvement of Stratosphere IPS which aggregates the partial detections of hosts and classifies them using the XGBoost algorithm to improve the overall performance of the tool. Our method is based on an additional layer of abstraction called Source Address layer which collects the partial data and pre-processes it or the classifier. Compared to the first version of Stratosphere IPS proposed extension results in 40% increase in accuracy and 26% improvement in the False Positive rate.
Identification of network users by profiling their behavior
The precise identification of users in the network at different moments in time is a well known and difficult problem. Identifying users by their actions (and not their IP addresses) allows administrators to apply policy controls on users, to find intruders that are impersonating legitimate users, and to find anomalous user behaviors that could be due to malware infections. More importantly, the behavioral analysis of users actions raises important moral questions about the power to identify users in unknown networks. This thesis explores this question by trying to identify users by converting the user's behavior into user's profiles. These profiles are time-dependent and they have dozen of features. By using the traffic of known past users in our dataset, it was possible to create and store their behavioral profiles. The profiles were created by extracting features from NetFlow data, and therefore no payload was used. The decision to only use NetFlows made this research much more challenging since there were less data. After studying the behaviors, we designed a comparison model that it is a similarity metric between users profiles. The profiles are compared one-to-one and also in sequential groups. The comparison of groups of profiles is the base for the user to user classifier. These methods were verified on experiments that used one of the largest labeled datasets currently available in the area, consisting in more than one month of real traffic from 19 known and verified normal users. All our tools were published online, including the tools to visualize and compare users. Results show that we can identify our users with 60% of accuracy and 90% precision. The success of this method mostly depends on how well we can compare two user profiles. A small improvement can lead to improvement in user detection.
Graph-Based Analysis of Malware Network Behaviors
There are many malware families and every each of them has some unique features. The aim of this work is to focus on detecting malicious behavior using leaving network communication. Our hypothesis is that this malicious communication has sequential behavioral patterns. We present a new graph representation of leaving network communication using (IP address, port, protocol)-triplets as vertices. There is an edge between two vertices if they come one after the other in the record of the leaving communication of the inspected host.We think this representation might prove useful in detecting the patterns by a program and even by a naked eye. Random Forest algorithm was used for predicting. Testing was done against datasets of normal users, infected hosts and normal users that are later infected. We were able to detect malicious communication with up to 97% accuracy.
<Master Thesis Title>
<Master Thesis Title>