We introduce a technique for detecting
anomalous patterns in a categorical feature (one
that takes values from a finite alphabet). It
differs from most anomaly detection methods
used to date in that it does not require attackfree
training data, and it improves upon previous
methods known to us in that it is aware when it is
adequately trained to generate meaningful
alerts, and it models data not as normal and
anomalous but as falling into one of a number of
modes discovered by competitive learning. We
apply the technique to port patterns in TCP
sessions (the alphabet being the port numbers)
and highlight interesting patterns detected in
simulated and real traffic.
We propose extensions where the learned pattern
library can be seeded and some patterns of
interest can be labeled, so that certain patterns
generate an alert no matter how frequently they
are observed, while others labeled benign do not
generate alerts even if rarely seen. Finally, we
outline a hybrid system approach to closely
integrate anomaly and misuse detection, arguing
that the historical dichotomy with which many
researchers approach these techniques is now
artificial.