Datasets Overview

The Stratosphere IPS feeds itself with models created from real malware traffic captures. By using and studying how malware behaves in reality, we ensure the models we create are accurate and our measurements of performance are real. Our sister project, Malware Capture Facility Project, is in charge of continuously monitoring the threat landscape for new emerging threats, retrieving malicious samples and running them in our facilities to capture the traffic.

Screen Shot 2017-11-07 at 15.15.57.png

Malware captures

The Stratosphere IPS Project has a sister project called Malware Capture Facility Project. This project is responsible for making the long-term malware captures. We continually obtained malware and normal data to feed the Stratosphere IPS.

Screen Shot 2017-11-07 at 15.16.22.png

Normal captures

In order to perform a correct verification of the machine learning algorithms is paramount to have good datasets. The capture of normal traffic is key to accurately calculate the true values of False Positives & True Negatives.

Screen Shot 2017-11-07 at 15.18.45.png

mixed captures

The mixed captures provide a real scenario where a machine is not infected, then infected and after some time the infection is cleaned up. This type of scenario facilitates the testing of the StratosphereIPS machine learning algorithms and  models.


SPECIAL DATASET CTU-13

The CTU-13 dataset consist in a group of 13 different malware captures done in a real network environment. The captures include Botnet, Normal and Background traffic. The Botnet traffic comes from the infected hosts, the Normal traffic from the verified normal hosts and the Background traffic is all the rest of traffic that we don’t know what it is for sure. The dataset is labeled in a flow by flow basis, consisting in one of the largest and more labeled botnet datasets available. The files that can be downloaded are:

  • Binetflow files
    • For Botnet, Normal and Background traffic.
    • Text files with bidirectional flows generated by Argus.
  • Biargus files
    • For Botnet, Normal and Background traffic.
    • Binary files with bidirectional flows generated by Argus.
  • Complete Pcap files
    • For Botnet traffic.
    • Pcap files with all the payload data.
  • Truncated Pcap files
    • For Botnet, Normal and Background traffic.
    • Pcap files only with the headers information.

Download the CTU-13 Dataset

The CTU-13 dataset is published with the license Creative Commons CC-BY, and can be downloaded from the following link:

  • CTU-13-Dataset: large dataset of 13 captures with Malware, Normal and Background traffic.

Backup site for the CTU-13 dataset: in case our main repository of files is not working, you can still find the files of the CTU-13 dataset HERE.