Stratosphere Testing Framework

The Stratosphere Testing Framework (stf) is a network security research framework to analyze the behavioral patterns of network connections in the Stratosphere Project. Its goal is to aid researchers find new malware behavior, to label those behaviors, to create their traffic models and to verify the detection algorithms. Once the best malware behavioral models are created and verified, they will be used in the Stratosphere IPS for detection. Stf works by using machine learning algorithms on the behavioral models.

The goal of the Stratosphere Project is to create a behavioral IPS (Intrusion Detection System) that can detect and block malicious behaviors in the network. As part of this project, the stf is used to generate highly confident models of malicious traffic by allowing an automated verification of the detection performance.

The current features of the alpha release of stf are:

Management of datasets.
- Load pcap files (including large pcap files of more than 5GB).
- Load biargus files.
- Load binetflow files (text flows files from argus).
- Load PE executable files.
- Add notes to datasets.
- Give information about the files (md5, capinfos integration, amount of packets in time, etc).
Extract the network connections (4-tuples connections that ignore the source port)
Generate the behavioral models of each connection.
Assist the analyst in identifying those models by:
- Visualizing the payload in the traffic.
- Plotting histograms on the features of the models.
- Visualizing the behavioral models.
- Adding notes to the models.
Interactive console of commands with auto completion.
Option to use a local database or a remote distributed database.
Concurrency of several instances of stf simultaneously on the same database (local or remote). Which allows several researchers working on the same dataset.

INSTALLATION AND DEPENDENCIES

First clone the git repository and cd into it. Before installing stf, you need to install some dependencies. So far the stf program is designed to work on Linux environments. You can use the pip program to install the dependencies:

sudo pip install -r dependencies.txt

Or you can install them one by one by hand:

prettytable: apt-get install python-prettytable
transaction: apt-get install python-transaction
zodb: apt-get install python-zodb

You also need to have argus (and argus clients tools) installed. This is for generating the netflows from the traffic.

Download argus: http://qosient.com/argus/dev/argus-latest.tar.gz
Download argus-clients: http://qosient.com/argus/dev/argus-clients-latest.tar.gz

Optionally, if you want to also use the histogram commands (which give you awesome ascii-art histogram of the data) you need to install manually the following program:

https://github.com/philovivero/distribution.

Then you are ready to use stf.py!

THE MEANING OF THE BEHAVIORAL MODELS

The core of the Stratosphere IPS is composed of what we called network behavioral models and detection algorithms. The behavioral models represent what a specific connection does in the network during its life time. The behavior is constructed by analyzing the periodicity, size and duration of each flow. Based on these features each flow is assigned a letter and the group of letters characterize the behavior of the connection. The criteria to assign the letters is the following:

For example, the connection identified with the 4-tuple 192.168.0.253-166.78.144.80-80-tcp, that was assigned the label From-Botnet-V1-TCP-CC12-HTTP had the following behavioral model:

88*y*y*i*H*H*H*y*0yy*H*H*H*y*y*y*y*H*h*y*h*h*H*H*h*H*y*y*y*H*

This chain of states that we call the behavioral model highlight some of the characteristics of the C&C channel. In this case it tell us that flows are highly periodic (letters ‘h’, ‘i’), with some lost periodicity near the beginning (letters ‘y’). The flows also have a large size with a medium duration. The symbols between the letters are related with the time elapsed between flows. In this case the ‘*’ symbol means that the flow are separated by less than one hour. Looking at the letters it can be seen that this is a rather periodic connection, and effectively checking its flows we confirm that hypothesis. Using these type of models we are able to generate the behavioral characteristics of a large number of malicious actions. The following image shows the letter assignment criteria for the behavioral models: