Identification of Sensitive Unclassified Information

Taghva, Kazem

doi:10.1007/978-3-642-01141-2_6

Identification of Sensitive Unclassified Information

Kazem Taghva³

Chapter
First Online: 01 January 2009

615 Accesses
3 Citations

Summary

Sensitive Unclassified information is defined as any unclassified information that may cause adverse consequences against the government facilities. In this chapter, we explore the use of categorization techniques and information extraction to discover this kind of information in scanned documents.

We show here that the combined use of a K-Dependence Bayesian categorization engine and a semi-automated review application reduce by nearly 95% the number of man hours required to redact sensitive unclassified information. We also discuss and provide statistics on how OCR errors can affect the information extraction tasks.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Kohavi, R., B. Becker, and D. Sommerfield. 1997. Improving simple bayes. In Proceedings of ECML-97. http://robotics.stanford.edu/users/ronnyk/ronnyk-bib.html
Lewis, D.D. 1991. Evaluating text categorization. In Proceedings of the Speech and Language Workshop. http://robotics.stanford.edu/users/ronnyk/ronnyk-bib.html
Maron, M.E. 1967. Automatic indexing: An experimental inquiry. Journal of the ACM, 8:404–417.
Google Scholar
Maron, M.E. and J.L. Kuhns. 1960. On relevance, probabilistic indexing and information retrieval. Journal of the ACM, 7(3): 216–240.
Article Google Scholar
McCallum, A. and K. Nigam. 1998. A comparison of event models for naive bayes text classification. In Proceedings of AAAI-98 Workshop on Learning for Text Categorization. URL citeseer.nj.nec.com/mccallum98comparison.html
Google Scholar
Miller, D., S. Boisen, R. Schwartz, R. Stone, and R. Weischedel. 2000. Named entity extraction from noisy input: Speech and OCR. In Proceedings of the Sixth Conference on Applied Natural Languae Processing, pp. 316–324.
Google Scholar
Sahami, M. 1996. Learning limited dependence Bayesian classifiers. In Second International Conference on Knowledge Discovery in Databases. http://robotics.stanford.edu/users/sahami/papers.html
Taghva, K., J. Borsack, and A. Condit. 1996. Evaluation of model-based retrieval effectiveness with OCR text. ACM Transaction on Information Systems, pp. 64–93.
Google Scholar
Taghva, K., R. Beckley, and J. Coombs. 2006. The effects of OCR error on the extraction of private information. In Proceedings of 7th IAPR Workshop on Document Analysis Systems (DAS 2006), pp. 348–357.
Google Scholar
U.S. Government. 2004. The Freedom of Information Act (FOIA), 5 USC Section 552(b)(6). http://www.usdoj.gov/oip/exemption6.html
U.S. Department of Energy. 2001. Licensing support network baselined design requirements. http://www.lsnnet.gov/

Download references

Author information

Authors and Affiliations

Information Science Research Institute University of Nevada, Las Vegas, FL 32902, USA
Kazem Taghva

Authors

Kazem Taghva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kazem Taghva .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Taghva, K. (2009). Identification of Sensitive Unclassified Information. In: Argamon, S., Howard, N. (eds) Computational Methods for Counterterrorism. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01141-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-01141-2_6
Published: 20 May 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01140-5
Online ISBN: 978-3-642-01141-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics