Advertisement

Data Loss Prevention Using Document Semantic Signature

  • Hanan AlhindiEmail author
  • Issa Traore
  • Isaac Woungang
Conference paper
Part of the Lecture Notes on Data Engineering and Communications Technologies book series (LNDECT, volume 27)

Abstract

Data protection and insider threat detection and prevention are significant steps that organizations should take to enhance their internal security. Data loss prevention (DLP) is an emerging mechanism that is currently being used by organizations to detect and block unauthorized data transfers. Existing DLP approaches, however, face several practical challenges that limit their effectiveness. In this chapter, by extracting and analyzing document content semantic, we present a new DLP approach that addresses many existing challenges. We introduce the notion of a document semantic signature as a summarized representation of the document semantic. We show that the semantic signature can be used to detect a data leak by experimenting on a public dataset, yielding very encouraging detection effectiveness results including on average a false positive rate (FPR) of 6.71% and on average a detection rate (DR) of 84.47%.

Keywords

Data loss prevention Document semantic Document semantic signature Insider threat detection Ontology 

Abbreviations

BM

Boyer Moore algorithm

CBSD

Component-based software development

CF

Concept vector file

CM

Concept map

CS

Cosine similarity

CT

Concept tree

DCT

Document concept tree

DL

Ontology description logics

DLP

Data loss prevention

DR

Detection rate

DSS

Document semantic signature

FDR

False discovery rate

FIBO

Financial Industry Business Ontology

FNR

False negative rate

FPR

False Positive rate

IDF

Inverse document frequency

IDS

Intrusion detection systems

KB

Knowledge base

KDE

Kernel density estimation

NIDS

Network-based intrusion detection system

NTAC

National Threat Assessment Center

OWL

Ontology web language

RDF

Resource description framework

RNCVM

Relevancy nodes-based concept vector model

SIDD

Sensitive information dissemination detection

SVM

Support vector machines

SW

Smith–Waterman algorithm

TF

Term frequency

TF-IDF

Term frequency inverse document frequency

References

  1. 1.
    E. Kowalski, D. Cappelli, A. Moore, U.S. Secret Service and CERT/SEI Insider Threat Study: Illicit Cyber Activity in the Information Technology and Telecommunications Sector (Carnegie Mellon Software Engineering Institute, Pittsburgh, 2008)Google Scholar
  2. 2.
    D.L. Costa, M.L. Collins, S.J. Perl, et al., An Ontology for Insider Threat Indicators Development and Applications (Carnegie-Mellon University, Pittsburgh, Software Engineering Inst, 2014)Google Scholar
  3. 3.
    M. Kandias, A. Mylonas, N. Virvilis, et al., An insider threat prediction model, in International Conference on Trust, Privacy and Security in Digital Business, (Springer, Cham, 2010), pp. 26–37CrossRefGoogle Scholar
  4. 4.
    A.W. Udoeyop, Cyber Profiling for Insider Threat Detection. Master’s Thesis, University of Tennessee (2010)Google Scholar
  5. 5.
    Y. Liu, C. Corbett, K. Chiang, et al., SIDD: A framework for detecting sensitive data exfiltration by an insider attack. System Sciences, 2009. HICSS’09. 42nd Hawaii International Conference on IEEE 2009, pp. 1–10Google Scholar
  6. 6.
    H. Ragavan, Insider threat mitigation models based on thresholds and dependencies (University of Arkansas, Fayetteville, 2012)Google Scholar
  7. 7.
    P. Raman, H.G. Kayacık, A. Somayaji, Understanding data leak prevention, in 6th Annual Symposium on Information Assurance (ASIA’11) (2011), pp. 27–3Google Scholar
  8. 8.
    S. Liu, R. Kuhn, Data loss prevention. IT Professional 12(2), 10–13 (2010)CrossRefGoogle Scholar
  9. 9.
    M. Hart, P. Manadhata, R. Johnson, Text classification for data loss prevention, ed. by S. Fischer-Hübner, N. Hopper. PETS 2011. LNCS, vol. 6794 (2011), p 18–37Google Scholar
  10. 10.
    V. Stamati-Koromina, C. Ilioudis, R. Overill, et al., Insider threats in corporate environments: a case study for data leakage prevention, in Proceedings of the Fifth Balkan Conference in Informatics, (ACM, New York, 2012), pp. 271–274CrossRefGoogle Scholar
  11. 11.
    Y. Canbay, H. Yazici, S. Sagiroglu, A Turkish language based data leakage prevention system. in Digital Forensic and Security (ISDFS), 5th International Symposium (IEEE, April 2017), pp. 1–6Google Scholar
  12. 12.
    S. Vodithala, S. Pabboju, A keyword ontology for retrieval of software components. Int. J. Control Theory Appl. 10(19), 177–182 (2017)Google Scholar
  13. 13.
    M. Fernández, I. Cantador, V. López, et al., Semantically enhanced information retrieval: an ontology-based approach. Web Semant. Sci. Serv. Agents World Wide Web 9(4), 434–452 (2011)CrossRefGoogle Scholar
  14. 14.
    K. Doing-Harris, Y. Livnat, S. Meystre, Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system. J. Biomed. Semant. 6, 15 (2015)CrossRefGoogle Scholar
  15. 15.
    H.Z. Liu, H. Bao, D. Xu, Concept vector for similarity measurement based on hierarchical domain structure. Comput. Inform. 30(5), 881–900 (2012)zbMATHGoogle Scholar
  16. 16.
    C. Corley, R. Mihalcea, Measuring the semantic similarity of texts. in Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, Association for Computational Linguistics, 2003, p 13–18Google Scholar
  17. 17.
    Onix, Onix Text Retrieval Toolkit API Reference (2017), http://www.lextek.com/manuals/onix/stopwords1.html, Accessed 14 Nov 2017
  18. 18.
    B. Klimt, Y. Yang, The Enron Corpus: a new dataset for email classification research, in Machine learning, ECML 2004, (Springer, Berlin, 2004), pp. 217–226CrossRefGoogle Scholar
  19. 19.
    FIBO, Financial Industry Business Ontology (2017), https://www.edmcouncil.org/financialbusiness. Accessed 20 Oct 2017
  20. 20.
    Business Balls (2017), http://www.businessballs.com/business-thesaurus.htm. Accessed 19 Oct 2017
  21. 21.
    Enron Email Dataset (2017), http://www-2.cs.cmu.edu/~enron/. Accessed 20 Oct 2017
  22. 22.
    A. Mahajan, S. Sharma, The malicious insiders threat in the cloud. Int. J. Eng. Res. Gen. Sci. 3(2), 245–256 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Electrical and Computer Engineering DepartmentUniversity of VictoriaVictoriaCanada
  2. 2.Department of Computer ScienceRyerson UniversityTorontoCanada

Personalised recommendations