Team Learning: Python Introduction for Network Traffic Visualisation

At Stratosphere, we like to keep ourselves learning and sharing knowledge among team members. For this purpose, we keep regular learning sessions on different topics. Today the topic was 'Python Introduction for Network Traffic Visualisation' taught by Sebastian (aka @eldracote).

The goals of today's session were:

  • To start with python

  • To get a working template

  • To analyse binetflow files (obtained from a malware traffic capture) and plot the relationships

The template and the binetflow file used are hosted here: https://github.com/stratosphereips/Basic-Python-Learning. And the session outline can be seen here: Google Docs Basic Python Session

IMG_3295.JPG

Revisiting concepts

Before getting started we reviewed some basic concepts of python, and some changes introduced in Python 3: Functions, Conditionals, Loops, Types of data (Strings, Ints, Floats, Lists, Dictionaries), the __main__ definition, Parsing arguments, Calling functions, Opening files and reading lines, Looping through the content, String operations to split. 

Basic python template

The template is really simple, but it's designed to save time when getting started. It has a basic function and allows to read parameters from command line. 

 The source code for this template is located here: https://github.com/stratosphereips/Basic-Python-Learning

The source code for this template is located here: https://github.com/stratosphereips/Basic-Python-Learning

The new way of parsing arguments is great as it handles everything (parameters name, long format, help, possible input values of the options). That's quite useful.

IMG_3297.JPG

The DOT graph description language

DOT is a very easy and nice graph description language. Is quite simple to generate with any script and then it is possible to use in combination with other tools to actually generate an image.

A simple DOT file representing traffic connections between devices is shown below:

 

digraph graphname{

    "192.168.1.102" -> "192.168.1.2"
    "192.168.1.102" -> "239.255.255.250"
    "192.168.1.102" -> "239.255.255.250"
    "192.168.1.102" -> "239.255.255.250"
    "192.168.1.102" -> "8.8.8.8"

}

 

With this file, we can use the 'dot' program (included in the gaphviz library) to create a 'png' image with the following command: cat test.dot | dot -Tpng -o test.png

The dot program will read the relationships established in the 'graphname' and automatically generate an image with the graph. The visualisation for the previous graph is shown below.

test.png

Analysing binetflow files and plotting relationship

In this session, we worked with a binetflow file that was generated from a malware capture pcap using the Argus program. Here is an example of the first 10 lines of that file:

StartTime,Dur,Proto,SrcAddr,Sport,Dir,DstAddr,Dport,State,sTos,dTos,TotPkts,TotBytes,SrcBytes,Label 1970/01/01 01:00:00.000000,0.000000,llc,00:00:00:00:00:00,0, ->,00:00:00:00:00:00,0,INT,,,1,60,60, 1970/01/01 01:00:07.155617,2256.163086,arp,192.168.1.102,, who,192.168.1.2,,CON,,,54,2268,1134, 1970/01/01 01:00:07.337532,1.992893,arp,0.0.0.0,, who,192.168.1.102,,INT,,,3,126,126, 1970/01/01 01:00:10.346739,0.000000,igmp,192.168.1.102,, ->,239.255.255.250,,INT,0,,1,46,46, 1970/01/01 01:00:10.542317,7.711234,udp,192.168.1.102,51743, ->,239.255.255.250,1900,INT,0,,8,1400,1400, 1970/01/01 01:00:10.832908,0.000000,igmp,192.168.1.102,, ->,239.255.255.250,,INT,0,,1,46,46, 1970/01/01 01:00:13.039304,0.001188,udp,192.168.1.102,59458, <->,8.8.8.8,53,CON,0,0,2,168,76, 1970/01/01 01:00:13.041007,0.001193,udp,192.168.1.102,65071, <->,8.8.8.8,53,CON,0,0,2,180,76, 1970/01/01 01:00:18.050956,2302.088135,arp,192.168.1.1,, who,192.168.1.102,,CON,,,18,918,540,

In python, we created a parser for this file and generated a .dot graph file with the relationships between the IPs. The file is just as the one seen above but with hundreds of entries. The program is simple: reading the binetflow file, splitting the lines, taking source and destination IPs and printing them in the DOT format.

Once we generate the .dot file, we can visualise it in the same way. Below you can see an example of the big graph created for a malware capture. Pay attention that there are so many nodes, that the chart generated seems like a line delimiter, but it is actually a chart.

 There are so many nodes that the graph is totally useless. But there were more than 13000 nodes!

There are so many nodes that the graph is totally useless. But there were more than 13000 nodes!

 If we zoom to the center of the chart, we can see that DOT was actually able to plot everything, but is not the best type of visualisation we can choose.

If we zoom to the center of the chart, we can see that DOT was actually able to plot everything, but is not the best type of visualisation we can choose.

That graph has too many nodes, so we took the first 400 connections and then we use another tool similar to 'dot'. The new tool is called 'sfdp' and is used in the same way as 'dot': cat test2.dot | sfdp -Tpng -o test3.png. The tool is able to generate other types of charts more convenient for this type of data.

There are 400 connections in this graph.

There are 2000 connections in this graph.

There are 5000 connections in this graph (low resolution due file size limit on square space)

Conclusion

The session was quite good and fast paced. It certainly got all of us hooked into creating visualisations! Time to continue practicing and playing with these graphs and Python!