Addressing Insider Threat through Cost-Sensitive Document Classification

  • Young-Woo Seo
  • Katia Sycara
Part of the Integrated Series In Information Systems book series (ISIS, volume 18)

Most organizations use computerized security systems to manage and protect their confidential information. While security is mostly concerned with prevention of attacks from outsiders, security breaches by insiders have recently gained increasing attention from the security community. In this chapter, we describe a cost-sensitive document classification scheme which forms the basis for determining the legitimacy of confidential access by insiders. Our scheme enforces compliance with the “need to know” security principle, namely that the requests for access are authorized only if the content of the requested information is relevant to the requester’s current information analysis project. First, we formulate such content-based authorization, i.e., whether to accept or reject access requests as a binary classification problem. Second, we implement this problem in a costsensitive learning framework in which the cost caused by incorrect decision is different according to the relative importance of the error types; false positive and false negative. In particular, the cost for a false positive (i.e., accepting a security violating request) is considered more expensive than that of false negative (i.e., rejecting a valid request). The former is a serious security problem because confidential information, which should not be revealed, can be accessed. We experimentally compared various costsensitive classifiers with conventional error-minimizing classifiers. Our results indicate that costing using logistic regression showed the best performance, in terms of the smallest cost paid, the lowest false positive rate, and the relatively low false negative rate.


Confidential Information Document Classification Misclassification Cost Rejection Sampling Insider Threat 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aleman-Meza, B., Burns, P., Eavenson, M., Palaniswami, D., and Sheth, A., An ontological approach to the document access problem of insiders threat, In Proceedings of IEEE International Conference on Intelligence and Security Informatics (ISI-05), pp. 486-491, 2005.Google Scholar
  2. Ankolekar, A., Seo, Y.-W., and Sycara, K. Investigating semantic knowledge for text learning, In Proceedings of SIGIR-2003 Workshop on Semantic Web, pp. 9-17, 2003.Google Scholar
  3. Domingos, P., Metacost: A general method for making classifiers cost-sensitive, In Proceedings of International Conference on Knowledge Discovery and Data Mininig, pp. 154-164, 1999.Google Scholar
  4. Drummond, C. and Holte, R.C., Exploiting the cost (in)sensitivity of decision tree splitting criteria, In Proceedings of International Conference on Machine Learning (ICML-00), pp. 239-246, 2000.Google Scholar
  5. Duda, R.O., Hart, P.E., Stork, D.G., Pattern Classification, Wiley-Interscience, 2001.Google Scholar
  6. Elkan, C., The foundations of cost-sensitive learning, In Proceedings of International Joint Conference on Artificial Intelligence (IJCAI-01), pp. 973-978, 2001Google Scholar
  7. Fan, W., Stolfo, S.J., Zhang, J., and Chan, P.K., Adacost: Misclassification cost-sensitive boosting, In Proceedings of International Conference on Machine Learning (ICML-99), pp. 97-105, 1999.Google Scholar
  8. Fawcett, T., ROC graphs: Notes and practical considerations for researchers, HP Lab Palo Alto, HPL-2003-4, 2003.Google Scholar
  9. Giuri, L. and Iglio, P., Role templates for content-based access control, In Proceedings of ACM Workshop on Role Based Access Control, pp. 153-159, 1997.Google Scholar
  10. Lee, W., Miller, M., Stolfo, S., Jallad, K., Park, C., Zadok, E., and Prabhakar, V., Toward cost-sensitive modeling for intrusion detection, ACM Journal of Computer Society, Vol. 10, No. 1-2, pp. 5-22, 2002.Google Scholar
  11. Joachims, T., Text categorization with support vector machines: Learning with many relevant features, In Proceedings of European Conference on Machine Learning (ECML-98), 1998.Google Scholar
  12. Ng, A.Y. and Jordan, M.I., On discriminative vs. generative classifiers: A comparison of logistic regression and naïve Bayes, In Proceedings of Neural Information Processing Systems (NIPS-01), pp. 841-848, 2001.Google Scholar
  13. Seo, Y.-W., Giampapa, J., and Sycara, K., A multi-agent system for enforcing need-to-know security policies, In Proceedings of International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) Workshop on Agent Oriented Information Systems (AOIS-04), pp. 163-179, 2004.Google Scholar
  14. Seo, Y.-W. and Sycara, K., Cost-sensitive access control for illegitimate confidential access by insiders, In Proceedings of IEEE Intelligence and Security Informatics (ISI-06), pp. 117-128, 2006.Google Scholar
  15. Schutze, H., Hull, D.A., Pedersen, J.O., A comparison of classifiers and document representations for the routing problem, In Proceedings of International ACM Conference on Research and Development in Information Retrieval (SIGIR-95), pp 229-237, 1995.Google Scholar
  16. Symonenko, S., Liddy, E.D., and Yilmazel, O., Semantic analysis for monitoring insider threats, In Proceedings of Symposium on Intelligence and Security Informatics, 2004.Google Scholar
  17. Torkkola, T., Linear discriminant analysis in document classification, In IEEE Workshop on TextMining, 2001.Google Scholar
  18. Weippl, E. and Ibrahim, K., Content-based management of document access control, In Proceedings of the 14th International Conference on Applications of Prolog, 2001. Google Scholar
  19. Zadrozny, B., Langford, J., and Abe, N., A simple method for cost-sensitive learning, Technical report, IBM Tech Report, 2002.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Young-Woo Seo
    • 1
  • Katia Sycara
    • 2
  1. 1.Robotics InstituteCarnegie Mellon UniversityPittsburghUSA
  2. 2.Robotics InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations