The CTU-13 Dataset. A Labeled Dataset with Botnet, Normal and Background traffic.

The CTU-13 is a dataset of botnet traffic that was captured in the CTU University, Czech Republic, in 2011. The goal of the dataset was to have a large capture of real botnet traffic mixed with normal traffic and background traffic. The CTU-13 dataset consists in thirteen captures (called scenarios) of different botnet samples. On each scenario we executed a specific malware, which used several protocols and performed different actions. Table 2 shows the characteristics of the botnet scenarios.

Table 2. Characteristics of botnet scenarios

Each scenario was captured in a pcap file that contains all the packets of the three types of traffic. These pcap files were processed to obtain other type of information, such as NetFlows, WebLogs, etc. The first analysis of the CTU-13 dataset, that was described and published in the paper "An empirical comparison of botnet detection methods" (see Citation below) used unidirectional NetFlows to represent the traffic and to assign the labels. These unidirectional NetFlows should not be used because they were outperformed by our second analysis of the dataset, which used bidirectional NetFlows. The bidirectional NetFlows have several advantages over the directional ones. First, they solve the issue of differentiating between the client and the server, second they include more information and third they include much more detailed labels. The second analysis of the dataset with the bidirectional NetFlows is the one published here.

The relationship between the duration of the scenario, the number of packets, the number of NetFlows and the size of the pcap file is shown in Table 3. This Table also shows the malware used to create the capture, and the number of infected computers on each scenario.

Table 3. Amount of data on each botnet scenario

The distinctive characteristic of the CTU-13 dataset is that we manually analyzed and label each scenario. The labeling process was done inside the NetFlows files. Table 4 shows the relationship between the number of labels for the Background, Botnet, C&C Channels and Normal on each scenario.

Table 4. Distribution of labels in the NetFlows for each scenario in the dataset.

TYPE OF FILES AND DOWNLOAD

Each of the scenarios in the dataset was processed to obtain different files. For privacy issues the complete pcap file containing all the background, normal and botnet data is not available. However, the rest of the files is available. Each scenario contains:

The pcap file for the botnet capture only. The files have the extension .pcap.
The bidirectional NetFlow files (generated with Argus) of all the traffic, including the labels. The files have the extension .biargus
The original executable file.

The CTU-13 dataset can be downloaded as one big tar file containing all the data or it can be downloaded capture by capture. If you are accessing each capture independently remember that the bidirectional NetFlows files are in the folder detailed-bidirectional-flow-labels.

Here you can download the big file with all the dataset: CTU-13-Dataset.tar.bz2 (1.9GB)

And here you can access each scenario individually:

BACKUP DOWNLOAD

In case the main site for downloading the files is down, you can try download the files here. However keep in mind that the main site is the authoritative copy.

CITATION

To cite the dataset please cite the paper "An empirical comparison of botnet detection methods" Sebastian Garcia, Martin Grill, Jan Stiborek and Alejandro Zunino. Computers and Security Journal, Elsevier. 2014. Vol 45, pp 100-123. http://dx.doi.org/10.1016/j.cose.2014.05.011

The CTU-13 Dataset. A Labeled Dataset with Botnet, Normal and Background traffic.

TYPE OF FILES AND DOWNLOAD

BACKUP DOWNLOAD

CITATION

Protecting the civil society through high quality research