Showing posts with label signature-based detection. Show all posts
Showing posts with label signature-based detection. Show all posts

Tuesday, April 3, 2007

"Signatures are usually based on vulnerabilities rather than exploits"

This is interesting, when I started to read about signature-based intrusion detection systems, I thought that signatures were created by using patterns from the exploit. However, as I noticed in a previous entry and learned from the post below (that I found via TaoSecurity), this is not the case.

Errata Security: ANI 0day vs. intrusion detection providers
signatures are usually based on vulnerabilities rather than exploits
This means that learning systems, like Polygraph, that generates signatures from exploits are not automating the signature generation properly. Though, they are able to block worms exploiting unknown vulnerabilities.

Friday, March 30, 2007

Background reading: Polygraph - Automatically Generating Signatures for Polymorphic Worms

The next paper from RAID 2006 I will comment is about manipulating Polygraph. Thus it seemed natural that I looked at the original publication Polygraph: Automatic Signature Generation for Polymorphic Worms (2005).

Polygraph is a program that automatically generates signatures for Polymorphic worms; that are worms that change (obfuscate) their appearance from time to time between attacks. Existing worm blocking solutions (before 2005) assumes that worms have the same content from time to time. Thus it is easy to automatically generate signatures (simple single strings of bytes) that filter out worms. However, this assumption does not apply for polymorphic worms.

Since however, the polymorphic worms are targeting specific vulnerabilities some of the payload must be same between all worms, so Polygraph collects suspicious and innocuous payloads, classified using a simple flow classifier, and then extract content signatures from them. Instead of just extracting one single string of bytes, as in previous algorithms, Polygraph extracts sets of byte sequences.

The extracted byte sequences are used in three different ways for detecting worms :

  • All byte sequences must be present in payload to indicate an worm
  • All byte sequences must be present in correct order to indicate an worm
  • All byte sequences are weighed together using a Naïve Bayes Classifier:
    1. A byte sequence has probability being in a worm or not: P(seq | worm) and P(seq | ~worm)
    2. A score is computed for a payload being a worm were {seq} means all sequences in a payload: score = P({seq} | worm) / P({seq} | ~worm)
    3. Then the score is compared to a threshold and if true, the payload is believed to be an worm: score > tau
To handle the case that there are more than one type of polymorphic worms among the suspicious payloads, Polygraph uses hierarchical clustering of the byte sequences of the payloads. The sequences are merged into clusters by minimizing the false positives tested of the innocuous payloads.

Comment: First of all I think this an interesting paper, since I have a background in machine learning and Bayesian learning. However, the learning algorithms could probably be improved, for instance, by applying a more fully Bayesian approach than the used Naïve Bayes Classifier.

In addition I found an interesting comment at
Mohit's security blog: IPS algorithms... that is as follows:
Most signatures in good products are vulnerability based so even if you change the attack it still gets stopped.

Thus, Polygraph might not be needed! Or what should we believe?

Friday, March 9, 2007

Paper 4: Allergy Attack Against Automatic Signature Generation

This paper practically shows how to do what Can machine learning be secure? describes. In the paper, they show how to attack systems that uses Automatic Signature Generation (ASG). A typical ASG first detects an intrusion or attack, thereafter automatically generates a signature from the attack data and then filter out all future traffic matching the signature.

By using the fact that many ASG system does not use the same method to detect the attack and then create the signature they are able to fool the system into creating signatures for non-malicious traffic. Also, by not using the full context of an attack, such as the steps leading to the attack, ATG systems are easier fooled.

An ATG system seems to be a kind of unsupervised learning system, using anomaly detection to detect suspicious traffic. Then a signature is created from the traffic based on comparison between many suspicious traffic instances. The signature is often computed from the longest common byte sequence.