Tuesday, April 10, 2007

Other paper: Remodeling and Simulation of Intrusion Detection Evaluation Dataset

In proceedings of the 2006 International Conference on Security & Managment (SAM'06) I have found this paper: Remodeling and Simulation of Intrusion Detection Evaluation Dataset
In the paper, the authors describe how they simulate network traffic (both innocent and malicious traffic) for testing intrusion detection systems.

They want to improve on the MIT LL dataset that is widely thought to have major drawbacks. The drawbacks make it less useful for testing intrusion detection systems.

The paper's main contribution is to create personalized simulations of users' web browsing behavior while MIT's dataset had only rough distribution of the overall behavior. They model real users' behavior as probabilistic transition diagrams for sessions of browsing that are complemented with daily connection distributions, daily connection cumulative densities and session length distributions. Then browsing traffic is generated from the collection user models either with a one to one mapping from a user model to a simulated user or by generating more simulated users than there are user models

Email traffic is simulated using a public corpus of emails while the MIT dataset used a combination of filtered real emails and automatically generated emails. The emails are clustered into four classes but it is not clear what the classes are used for. It is neither clear if the class in the cluster relates to the classes created from the source and destination addresses mentioned earlier. As well, it is not quite clear how the emails are used in the simulation.

Then they claim to have a larger set of attacks than in the MIT datset, such as DDoS, probes, WWW attacks, RPC, etc.

Finally they show that their simulated web browser behavior more resembles their reference network than the MIT dataset simulation that lacks certain characteristics.

Comment: I would like to be able to use the generated traffic as basis for my research - too bad there is no link to a public data set.

No comments: