Masquerading User Data

We have collected a data set with seeded masquerading users to compare various intrusion detection methods. The data set is available here .

The data consist of 50 files corresponding to one user each. Each file contains 15,000 commands (audit data generated with acct). The first 5000 commands for each user do not contain any masqueraders and are intended as training data. The next 10,000 commands can be thought of as 100 blocks of 100 commands each. They are seeded with masquerading users, i.e. with data of another user not among the 50 users.
At any given block after the initial 5000 commands a masquerade starts with a probability of 1%. If the previous block was a masquerade, the next block will also be a masquerade with a probability of 80%. About 5% of the test data contain masquerades.

This data set is used in an article in Statistical Science (see publications on the left). For further information please consult this article or contact me.

Partial list of theses/ Papers based on this data set :

If you have published a paper that has used this data set please let me know - I 'd like to put a reference here.


