Skip to main content

Private Data Discovery for Privacy Compliance in Collaborative Environments

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5220))

Abstract

With the growing use of computers and the Internet, it has become difficult for organizations to locate and effectively manage sensitive personally identifiable information (PII). This problem becomes even more evident in collaborative computing environments. PII may be hidden anywhere within the file system of a computer. As well, in the course of different activities, via collaboration or not, personally identifiable information may migrate from computer to computer. This makes meeting the organizational privacy requirements all the more complex. Our particular interest is to develop technology that would automatically discover workflow across organizational collaborators that would include private data. Since in this context, it is important to understand where and when the private data is discovered, in this paper, we focus on PII discovery, i.e. automatically identifying private data existant in semi-structured and unstructured (free text) documents. The first part of the process involves identifying PII via named entity recognition. The second part determines relationships between those entities based upon a supervised machine learning method. We present test results of our methods using publicly-available data generated from different collaborative activities to provide an assessment of scalability in cooperative computing environment.

National Research Council Paper Number 50386.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Korba, L., Song, R., Yee, G., Patrick, A.S., Buffett, S., Wang, Y., Geng, L.: Private data management in collaborative environments. In: Luo, Y. (ed.) CDVE 2007. LNCS, vol. 4674, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  2. Aura, T., Kuhn, T.A., Roe, M.: Scanning electronic documents for personally identifiable information. In: Proc. of the Workshop on Privacy in the Electronic Society (WPES 2006), Washington, DC, October 2006, pp. 41–49 (2006)

    Google Scholar 

  3. Agichtein, E., Cucerzan, S.: Predicting accuracy of extracting information from unstructured text collections. In: CIKM 2005, Bremen, Germany, pp. 413–420 (2005)

    Google Scholar 

  4. Kambhatla, N.: Combining lexical, syntactic, and semantic features with maximum entropy models for extracting relations. In: Proc. of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain, July 21-26 (2004)

    Google Scholar 

  5. Miller, S., Fox, H., Ramshaw, L., et al.: Description of the SIFT system used for MUC-7. In: Proc. of the 7th Message Understanding Conference (MUC-7) (1998)

    Google Scholar 

  6. Luhn’s Algorithm on Wikipedia (last accessed: March 20, 2007), http://en.wikipedia.org/wiki/Luhn_algorithm

  7. Han, H., Giles, C.L., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic document metadata extraction using support vector machines. In: Proceedings of the 2003 Joint Conference on Digital Libraries (JCDL 2003), Houston, Texas, May 27-31, pp. 37–48 (2003)

    Google Scholar 

  8. Chang, C.H., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems. IEEE Transactions on Knowledge and Data Engineering 18(10), 1411–1428 (2006)

    Article  Google Scholar 

  9. Turmo, J., Ageno, A., Catala, N.: Adaptive information extraction. ACM Computing Surveys 38(2), 4 (2006)

    Article  Google Scholar 

  10. Headers data, http://www.cs.cmu.edu/~kseymore/ie.html

  11. Job posting data, http://www.cs.utexas.edu/users/ml/index.cgi?page=resourcesrepo

  12. Enron random subset, http://www.cs.cmu.edu/~wcohen/

  13. Weka, http://www.cs.waikato.ac.nz/ml/weka/

  14. Song, R., Korba, L., Yee, G.: An Efficient Privacy-Preserving Data Mining Platform. In: The 4th Int. Conf. on Data Mining (DMIN 2008), Las Vegas, Nevada, July 14-17 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Yuhua Luo

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Korba, L. et al. (2008). Private Data Discovery for Privacy Compliance in Collaborative Environments. In: Luo, Y. (eds) Cooperative Design, Visualization, and Engineering. CDVE 2008. Lecture Notes in Computer Science, vol 5220. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88011-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88011-0_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88010-3

  • Online ISBN: 978-3-540-88011-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics