The CTU-13 Dataset. A Labeled Dataset with Botnet, Normal and Background traffic.
The CTU-13 is a dataset of botnet traffic that was captured in the CTU University, Czech Republic, in 2011. The goal of the dataset was to have a large capture of real botnet traffic mixed with normal traffic and background traffic. The CTU-13 dataset consists in thirteen captures (called scenarios) of different botnet samples. On each scenario we executed a specific malware, which used several protocols and performed different actions. Table 2 shows the characteristics of the botnet scenarios.
Each scenario was captured in a pcap file that contains all the packets of the three types of traffic. These pcap files were processed to obtain other type of information, such as NetFlows, WebLogs, etc. The first analysis of the CTU-13 dataset, that was described and published in the paper "An empirical comparison of botnet detection methods" (see Citation below) used unidirectional NetFlows to represent the traffic and to assign the labels. These unidirectional NetFlows should not be used because they were outperformed by our second analysis of the dataset, which used bidirectional NetFlows. The bidirectional NetFlows have several advantages over the directional ones. First, they solve the issue of differentiating between the client and the server, second they include more information and third they include much more detailed labels. The second analysis of the dataset with the bidirectional NetFlows is the one published here.
The relationship between the duration of the scenario, the number of packets, the number of NetFlows and the size of the pcap file is shown in Table 3. This Table also shows the malware used to create the capture, and the number of infected computers on each scenario.
The distinctive characteristic of the CTU-13 dataset is that we manually analyzed and label each scenario. The labeling process was done inside the NetFlows files. Table 4 shows the relationship between the number of labels for the Background, Botnet, C&C Channels and Normal on each scenario.
TYPE OF FILES AND DOWNLOAD
Each of the scenarios in the dataset was processed to obtain different files. For privacy issues the complete pcap file containing all the background, normal and botnet data is not available. However, the rest of the files is available. Each scenario contains:
- The pcap file for the botnet capture only. The files have the extension .pcap.
- The bidirectional NetFlow files (generated with Argus) of all the traffic, including the labels. The files have the extension .biargus
- The original executable file.
The CTU-13 dataset can be downloaded as one big tar file containing all the data or it can be downloaded capture by capture. If you are accessing each capture independently remember that the bidirectional NetFlows files are in the folder detailed-bidirectional-flow-labels.
Here you can download the big file with all the dataset: CTU-13-Dataset.tar.bz2 (1.9GB)
And here you can access each scenario individually:
In case the main site for downloading the files is down, you can try download the files here. However keep in mind that the main site is the authoritative copy.
To cite the dataset please cite the paper "An empirical comparison of botnet detection methods" Sebastian Garcia, Martin Grill, Jan Stiborek and Alejandro Zunino. Computers and Security Journal, Elsevier. 2014. Vol 45, pp 100-123. http://dx.doi.org/10.1016/j.cose.2014.05.011