New dataset, CTU-13-Extended, now includes pcap files of normal traffic

This blog post was originally published on 17 July 2015, by Sebastian Garcia, at https://stratosphereips.org/new-dataset-ctu-13-extended-now-includes-pcap-files-of-normal-traffic.html.

After considering several request we decided to extend the previous CTU-13 dataset to include truncated versions of the original pcap files. The pcap files include now all the traffic: Normal, Botnet and Background. The pcap files where however truncated to protect the privacy of the users, but in such a way that it is still possible to read the complete TCP, UDP and ICMP headers.

How the dataset was truncated

Each original pcap file was truncated following this methodology:

$ tcpdump -n -s0 -r originalcapturefile.pcap -w originalcapturefile.tcp.pcap tcp
$ editcap -s 54 originalcapturefile.tcp.pcap originalcapturefile.tcp.truncated.pcap
$ tcpdump -n -s0 -r originalcapturefile.pcap -w originalcapturefile.udp.pcap udp
$ editcap -s 42 originalcapturefile.udp.pcap originalcapturefile.udp.truncated.pcap
$ tcpdump -n -s0 -r originalcapturefile.pcap -w originalcapturefile.icmp.pcap icmp
$ editcap  -s 66 originalcapturefile.icmp.pcap originalcapturefile.icmp.truncated.pcap
$ mergecap -w originalcapturefile.truncated.pcap originalcapturefile.tcp.truncated.pcap originalcapturefile.udp.truncated.pcap originalcapturefile.icmp.truncated.pcap

The values of 54 bytes for TCP, 42 for UDP and 66 for ICMP ensured that the complete headers were present while no information about the payload was included. (Technically speaking some bytes of the payload may be included, but they are insignificant)

Content of the CTU-13-Extended dataset

The final content of this dataset are all the previous files in the CTU-13 dataset plus the truncated pcap files of the complete traffic. Remember that the CTU-13 dataset and now the CTU-13-Extended dataset are composed of 13 different experiments or scenarios. Each of these scenarios already has its own folder with all the files, and in that folder we included the new truncated pcap files of all the traffic. Therefore, you can download the complete compressed single file of the new dataset, or you can just download the new truncated pcap file from each scenario folder.

Download the Single File Compressed Version of the CTU-13-Extended Dataset

Like the CTU-13 dataset, the new CTU-13-Extended dataset is also available as a single compressed file for your convenience. The file is here:

Scenario CTU-Malware-Capture-Botnet-42

This folder included all the previous files, plus the new truncated pcap file:

Scenario CTU-Malware-Capture-Botnet-43

This folder included all the previous files, plus the new truncated pcap file:

Scenario CTU-Malware-Capture-Botnet-44

This folder included all the previous files, plus the new truncated pcap file:

Scenario CTU-Malware-Capture-Botnet-45

This folder included all the previous files, plus the new truncated pcap file:

Scenario CTU-Malware-Capture-Botnet-46

This folder included all the previous files, plus the new truncated pcap file:

Scenario CTU-Malware-Capture-Botnet-47

This folder included all the previous files, plus the new truncated pcap file:

Scenario CTU-Malware-Capture-Botnet-48

This folder included all the previous files, plus the new truncated pcap file:

Scenario CTU-Malware-Capture-Botnet-49

This folder included all the previous files, plus the new truncated pcap file:

Scenario CTU-Malware-Capture-Botnet-50

This folder included all the previous files, plus the new truncated pcap file:

Scenario CTU-Malware-Capture-Botnet-51

This folder included all the previous files, plus the new truncated pcap file:

Scenario CTU-Malware-Capture-Botnet-52

This folder included all the previous files, plus the new truncated pcap file:

Scenario CTU-Malware-Capture-Botnet-53

This folder included all the previous files, plus the new truncated pcap file:

Scenario CTU-Malware-Capture-Botnet-54

This folder included all the previous files, plus the new truncated pcap file:

What is not included in these new files?

The only traffic that is not included in these new files and that is present in the original pcap files are some ARP packets and some IPX packets, but since there was a small amount we decided to exclude them.