Skip to main content

Name Extraction and Formal Concept Analysis

  • Conference paper
Conceptual Structures for Discovering Knowledge (ICCS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6828))

Included in the following conference series:

Abstract

Many applications of Formal Concept Analysis (FCA) start with a set of structured data such as objects and their properties. In practice, most of the data which is readily available are in the form of unstructured or semistructured text. A typical application of FCA assumes the extraction of objects and their properties by some other methods or techniques. For example, in the 2003 Los Alamos National Lab (LANL) project on Advanced Knowledge Integration In Assessing Terrorist Threats, a data extraction tool was used to mine the text for the structured data. In this paper, we provide a detailed description of our approach to extraction of personal names for possible subsequent use inFCA. Our basic approach is to integrate statistics on names and other words into an adaptation of a Hidden Markov Model (HMM). We use lists of names and their relative frequencies compiled from U.S. Census data. We also use a list of non-name words along with their frequencies in a training set from our collection of documents. These lists are compiled into one master list to be used as a part of the design.

International Workshop on the Concept Formation and Extraction in Under-Traversed Domains (CFEUTD-2011).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ganter, B., Wille: Formal concept analysis. Springer, Heidelberg (1999)

    Book  MATH  Google Scholar 

  2. U.S. Government. Frequently occurring first names and surnames from the 1990 census, http://www.census.gov/genealogy/www/freqnames.html (viewed August 2005)

  3. U.S. Government. The freedom of information act 5 U.S.C. sec. 552 as amended in 2002, http://www.usdoj.gov/oip/foiaupdates/VolXVII4/page2.htm (viewed June 30, 2004)

  4. U.S. Government. The privacy act of 1974 5 u.s.c. sec. 552a, http://www.usdoj.gov/04foia/privstat.htm (viewed August 22, 2005)

  5. Miller, D., Boisen, S., Schwartz, R., Stone, R., Weischedel, R.: Named entity extraction from noisy input: Speech and OCR. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 316–324 (2000)

    Google Scholar 

  6. Rocha, L.M.: Proximity and semi-metric analysis of social networks. Report of Advanced Knowledge Integratio In Assessing Terrorist Threats LDRD-DR Network Analysis Component. LAUR 02-6557

    Google Scholar 

  7. Taghva, K., Beckley, R., Coombs, J.: The effects of OCR error on the extraction of private information. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 348–357. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  8. Taghva, K., Beckley, R., Coombs, J., Borsack, J., Pereda, R., Nartker, T.: Automatic redaction of private information using relational information extraction. In: Proc. IS&T/SPIE 2006 Intl. Symp. on Electronic Imaging Science and Technology (2006)

    Google Scholar 

  9. Taghva, K., Borsack, J., Nartker, T.: A process flow for realizing high accuracy for ocr text. In: SDIUT 2006 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Taghva, K., Beckley, R., Coombs, J. (2011). Name Extraction and Formal Concept Analysis. In: Andrews, S., Polovina, S., Hill, R., Akhgar, B. (eds) Conceptual Structures for Discovering Knowledge. ICCS 2011. Lecture Notes in Computer Science(), vol 6828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22688-5_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22688-5_28

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22687-8

  • Online ISBN: 978-3-642-22688-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics