Conferences

Ensembling to improve infected hosts detection

In this paper we describe the main ensemble learning techniques and their application in the cybersecurity threats detection. The state of the art in the use of ensemble learning techniques is presented here as an alternative to the current intrusion detection mechanisms, analyzing their advantages and disadvantages. We propose to incorporate ensemble learning to SLIPS [3], a behavioral-based intrusion detection and prevention system that uses machine learning algorithms to detect malicious behaviors, to obtain better results, taking advantage of the benefits of the SLIPS classifiers and modules. As part of this work we extend ensembling by considering algorithms from different domains (not machine learning domains), as Thread Intelligence. As a first stage of this project, performance tests of ensemble learning algorithms were performed to detect malware from flows evaluating its accuracy. The results of these tests are presented here, as well as the conclusions obtained and the future work.

Detecting DNS Threats: A Deep Learning Model to Rule Them All

Domain Name Service is a central part of Internet regular operation. Such importance has made it a common target of different malicious behaviors such as the application of Domain Generation Algorithms (DGA) for command and control a group of infected computers or Tunneling techniques for bypassing system administrator restrictions. A common detection approach is based on training different models detecting DGA and Tunneling capable of performing a lexicographic discrimination of the domain names. However, since both DGA and Tunneling showed domain names with observable lexicographical differences with normal domains, it is reasonable to apply the same detection approach to both threats. In the present work, we propose a multi-class convolutional network (MC-CNN) capable of detecting both DNS threats. The resulting MC-CNN is able to detect correctly 99% of normal domains, 97% of DGA and 92% of Tunneling, with a False Positive Rate of 2.8%, 0.7% and 0.0015% respectively and the advantage of having 44% fewer trainable parameters than similar models applied to DNS threats detection.

Geost Botnet: Operational security failures lead to a new Android banking threat

This paper describes the rare discovery of a new Android banking botnet, named Geost, from the operational security failures of its botmaster. They made many mistakes, including using the illegal proxy network of the HtBot malware, not encrypting their Command and Control servers, re-using security services, trusting other attackers with less operational security, and not encrypting chat sessions.

Machete: Dissecting the Operations of a Cyber Espionage Group in Latin America

Reports on cyber espionage operations have been on the rise in the last decade. However, operations in Latin America are heavily under researched and potentially underestimated. In this paper we analyze and dissect a cyber espionage tool known as Machete. Our research shows that Machete is operated by a highly coordinated and organized group who focuses on Latin American targets. We describe the five phases of the APT operations from delivery to exfiltration of information and we show why Machete is considered a cyber espionage tool. Furthermore, our analysis indicates that the targeted victims belong to military, political, or diplomatic sectors. The review of almost six years of Machete operations show that it is likely operated by a single group, and their activities are possibly state-sponsored. Machete is still active and operational to this day.

Deep Convolutional Neural Networks for DGA Detection

A Domain Generation Algorithm (DGA) is an algorithm to generate domain names in a deterministic but seemly random way. Malware use DGAs to generate the next domain to access the Command & Control (C&C) communication server. Given the simplicity of the generation process and speed at which the domains are generated, a fast and accurate detection method is required. Convolutional neural network (CNN) are well known for performing real-time detection in fields like image and video recognition. Therefore, they seemed suitable for DGA detection. The present work provides an analysis and comparison of the detection performance of a CNN for DGA detection. A CNN with a minimal architecture complexity was evaluated on a dataset with 51 DGA malware families and normal domains. Despite its simple architecture, the resulting CNN model correctly detected more than 97% of total DGA domains with a false positive rate close to 0.7%.

An Analysis of Convolutional Neural Networks for detecting DGA

A Domain Generation Algorithm (DGA) is an algorithm to generate domain names in a deterministic but seemly random way. Malware use DGAs to generate the next domain to access the Command Control (C&C) communication channel. Given the simplicity and velocity associated to the domain generation process, machine learning detection methods emerged as suitable detection solution. However, since the periodical retraining becomes mandatory, a fast and accurate detection method is needed. Convolutional neural network (CNN) are well known for performing real-time detection in fields like image and video recognition. Therefore, they seem suitable for DGA detection. The present work is a preliminary analysis of the detection performance of CNN for DGA detection. A CNN with a minimal architecture complexity was evaluated on a dataset with 51 DGA malware families as well as normal domains. Despite its simple architecture, the resulting CNN model correctly detected more than 97% of total DGA domains with a false positive rate close to 0.7%.

Bringing a GAN to a Knife-Fight: Adapting Malware Communication to Avoid Detection.

Generative Adversarial Networks (GANs) have been successfully used in a large number of domains. This paper proposes the use of GANs for generating network traffic in order to mimic other types of traffic. In particular, our method modifies the network behavior of a real malware in order to mimic the traffic of a legitimate application, and therefore avoid detection. By modifying the source code of a malware to receive parameters from a GAN, it was possible to adapt the behavior of its Command and Control (C2) channel to mimic the behavior of Facebook chat network traffic. In this way, it was possible to avoid the detection of new-generation Intrusion Prevention Systems that use machine learning and behavioral characteristics. A real-life scenario was successfully implemented using the Stratosphere behavioral IPS in a router, while the malware and the GAN were deployed in the local network of our laboratory, and the C2 server was deployed in the cloud. Results show that a GAN can successfully modify the traffic of a malware to make it undetectable. The modified malware also tested if it was being blocked and used this information as a feedback to the GAN. This work envisions the possibility of self-adapting malware and self-adapting IPS.

Reliable Machine Learning for Networking: Key Issues and Approaches.

Machine learning has become one of the go-to methods for solving problems in the field of networking. This development is driven by data availability in large-scale networks and the commodification of machine learning frameworks. While this makes it easier for researchers to implement and deploy machine learning solutions on networks quickly, there are a number of vital factors to account for when using machine learning as an approach to a problem in networking and translate testing performance to real networks deployments successfully. This paper, rather than presenting a particular technical result, discusses the necessary considerations to obtain good results when using machine learning to analyze network-related data.

Detecting DGA malware traffic through behavioral models

Some botnets use special algorithms to generate the domain names they need to connect to their command and control servers. They are refereed as Domain Generation Algorithms. Domain Generation Algorithms generate domain names and tries to resolve their IP addresses. If the domain has an IP address, it is used to connect to that command and control server. Otherwise, the DGA generates a new domain and keeps trying to connect. In both cases it is possible to capture and analyze the special behavior shown by those DNS packets in the network. The behavior of Domain Generation Algorithms is difficult to automatically detect because each domain is usually randomly generated and therefore unpredictable. Hence, it is challenging to separate the DNS traffic generated by malware from the DNS traffic generated by normal computers. In this work we analyze the use of behavioral detection approaches based on Markov Models to differentiate Domain Generation Algorithms traffic from normal DNS traffic. The evaluation methodology of our detection models has focused on a real-time approach based on the use of time windows for reporting the alerts. All the detection models have shown a clear differentiation between normal and malicious DNS traffic and most have also shown a good detection rate. We believe this work is a further step in using behavioral models for network detection and we hope to facilitate the development of more general and better behavioral detection methods of malware traffic.