From Fall 1996 through Summer 1999, I was at the Parallel and Distributed Intelligent Systems Laboratory (PI: Sal Stolfo), Computer Science Department, Columbia University. We developed JAM (Java Agents for Meta-learning), which is an infrastructure to support collaborative learning over distributed database. We applied JAM technologies to fraud and intrusion detection.
Ph.D. Thesis: A Data Mining
Framework for Constructing Features and Models for Intrusion Detection
Systems
My thesis research automates the development
process for Intrusion Detection Systems (IDSs). I designed and
developed a data mining framework for adaptively building intrusion
detection models. The central idea is to use system audit programs to
extract an extensive set of features that describe each network
connection or host session, and apply data mining programs to learn
rules that accurately capture the behavior of intrusions and normal
activities. These rules are then automatically converted into
executable modules for real-time intrusion detection. Detection models
for new intrusions or specific (new) components of a network system
are incorporated into an existing IDS through a meta-learning (or
co-operative learning) process, which produces a meta detection model
that combines evidence from multiple models. To efficiently compute
only the "useful" patterns from the large amount of audit data, I
modified the basic association rules and frequent episodes algorithms
to use axis attribute(s) and reference attribute(s)
as forms of item constraints to encode domain knowledge, and an
iterative level-wise approximate mining procedure as a means
to uncover the low frequency but important patterns.
We participated in the 1998 DARPA Intrusion Detection Evaluation program. The results showed that our system was one of the best IDSs among those submitted to the evaluation. It performed comparably well with the best knowledge engineered system. The detection models (classification rules) automatically constructed by our data mining framework were very effective (with high detection rates and low false positive rates) in detecting "known" intrusions (with instances in the training data) and "new" intrusions (with no instance seen in the training data) in several attack categories.
In Summer 1997, I was at IBM T. J. Watson Research Center, doing research in Information Economy. I implemented a prototype multi-agent system to simulate the market dynamics of information filtering.
In Summer 1996, I was at the Network Services Research Lab, AT&T Labs - Research, Murray Hill, New Jersey, where I did research in distributed data visualization environments. I designed and implemented a Java-based DAGs drawing and viewing system.
From Fall 1994 through Spring 1996, I was at the Programming Systems Laboratory (PI: Gail Kaiser), Computer Science Department, Columbia University. I did research in software development environments and collaborative workflow systems. I developed several modules of Oz, a workflow system, and applied Oz technologies to healthcare.