Friday, March 30, 2007

Background reading: Polygraph - Automatically Generating Signatures for Polymorphic Worms

The next paper from RAID 2006 I will comment is about manipulating Polygraph. Thus it seemed natural that I looked at the original publication Polygraph: Automatic Signature Generation for Polymorphic Worms (2005).

Polygraph is a program that automatically generates signatures for Polymorphic worms; that are worms that change (obfuscate) their appearance from time to time between attacks. Existing worm blocking solutions (before 2005) assumes that worms have the same content from time to time. Thus it is easy to automatically generate signatures (simple single strings of bytes) that filter out worms. However, this assumption does not apply for polymorphic worms.

Since however, the polymorphic worms are targeting specific vulnerabilities some of the payload must be same between all worms, so Polygraph collects suspicious and innocuous payloads, classified using a simple flow classifier, and then extract content signatures from them. Instead of just extracting one single string of bytes, as in previous algorithms, Polygraph extracts sets of byte sequences.

The extracted byte sequences are used in three different ways for detecting worms :

  • All byte sequences must be present in payload to indicate an worm
  • All byte sequences must be present in correct order to indicate an worm
  • All byte sequences are weighed together using a Naïve Bayes Classifier:
    1. A byte sequence has probability being in a worm or not: P(seq | worm) and P(seq | ~worm)
    2. A score is computed for a payload being a worm were {seq} means all sequences in a payload: score = P({seq} | worm) / P({seq} | ~worm)
    3. Then the score is compared to a threshold and if true, the payload is believed to be an worm: score > tau
To handle the case that there are more than one type of polymorphic worms among the suspicious payloads, Polygraph uses hierarchical clustering of the byte sequences of the payloads. The sequences are merged into clusters by minimizing the false positives tested of the innocuous payloads.

Comment: First of all I think this an interesting paper, since I have a background in machine learning and Bayesian learning. However, the learning algorithms could probably be improved, for instance, by applying a more fully Bayesian approach than the used Naïve Bayes Classifier.

In addition I found an interesting comment at
Mohit's security blog: IPS algorithms... that is as follows:
Most signatures in good products are vulnerability based so even if you change the attack it still gets stopped.

Thus, Polygraph might not be needed! Or what should we believe?


Anonymous said...

I can't comment on whether good products use vulnerability- or exploit- based signatures, but one advantage of Polygraph is that it AUTOMATICALLY generates signatures, whereas COTS products typically rely on manual generation of signatures (by human experts.) The manual generation takes significantly longer (hours to days) than the automated approaches. The faster you can get a signature out, the sooner the worm is stopped.

Tomas said...

Yes, that is true. But when creating signatures only based on exploits, we do not stop new attacks targeting the same vulnerability. However, when a vulnerability is unknown this might be a good solution.

123 123 said...

Interesting story you got here. I'd like to read something more about that topic.
BTW look at the design I've made myself Companionship in London

priya said...

This is a great tutorial …one of the best I’ve seen from you yet. I really appreciate you sharing your inside tips and tricks… Intrusion Detection