The goal of the NoMaD project is to collect, label, organize and make available a large, verified and labeled dataset of normal and malicious HTTPS connections. This dataset is designed to support the research team at Cisco Prague as well as to support the research activities and publications of the CVUT University. The project will give Cisco Systems an evolving dataset to generate better and faster analysis; and will give the CTU University the opportunity to research about the HTTPSbehaviors in the network as part of its Stratosphere Project.
Some days ago we finally made public two tools that were very important for starting this project. The tools are CCDetector and BotnetDetectorComparer. With these tools we created the experiments in the paper “An empirical comparison of botnet detection methods”. You can download them and use them to verify the paper and test more ideas. Please contact us if you need assistance.
After considering several request we decided to extend the previous CTU-13 dataset to include truncated versions of the original pcap files. The pcap files include now all the traffic: Normal, Botnet and Background. The pcap files where however truncated to protect the privacy of the users, but in such a way that it is still possible to read the complete TCP, UDP and ICMP headers.
This blog post is a comparison and analysis of the differences in the behavioral patterns found in the DNS traffic of malware and normal connections. We captured malware and normal traffic in the MCFP project and we extracted the DNS behavior with the stf tool. The captures correspond to DNStraffic of a SPAM malware, DGA-based malware and a normal computer. The idea is to analyze the differences in the behaviors as they are shown by the stf program. For an explanation of how the stfprogram is generating this data see this explanation.
Working as security researchers is common to create a new machine learning algorithm that we want to evaluate. It may be that we are trying to detect malware, identify attacks or analyze IDS logs, but at some point we figure it out that we need a good dataset to complete our task. But not any dataset; in fact we need a labeled dataset. The dataset will be used not only to learn the features of, for example, malware traffic, but also to verify how good our algorithm is. Since getting a dataset is difficult and time consuming, the most common solution is to get a third-party dataset; although some researchers with time and resources may create their own. Either way, most usually we obtain a dataset of malware traffic (continuing with the malware traffic detection example) and we assign the label Malware to all of its instances. This looks good, so we make our training and testing, we obtain results and we publish. However, there are important problems in this approach that can jeopardize the results of our algorithm and the verification process. Let’s analyze each problem in turn.
The Stratosphere IPS Project officially started around January 2015 with a huge effort of development and planning. We are glad to see that after these two months we were able to start the project website and social relationships, to boost the collaboration with others and to develop the first part of the project: The Stratosphere Testing Framework.
This is an analysis of the traffic generated by the APK sample 46a2468a6ee9a7740191747d1b7b16a5 that was downloaded from the CopperDroid project. Virus Total detects this sample as a malware called Kaishi (probably related with banks attacks) and although the malware is from May 2014, in Mar 12, 2015 only two Antivirus engines detect it.
This is an install guide to run the Argus Sniffer in the Raspberry PI using Raspbian for use in the Stratosphere Project.
While analyzing our capture CTU-Malware-Capture-Botnet-89-1 we found out that there were some strange issues with the periodicity of the C&C channels. In this capture there were a lot of HTTP connections, but few of them were periodic. During the analysis of the network capture we usually start looking at the NetFlows and then we move to the payload data. What we found is that several periodic HTTP connections had non-periodic NetFlows. This was strange for us so we took a deeper look.
Yesterday we found out a malware that uses a DGA algorithm to find out the domains of their C&C servers. It become interesting when we noted that the DGA communication is not using periodic requests on purpose. In fact it seems to be specifically generating the requests times of its DGA packets in order to avoid being periodic.