Skip to main content

Pair Hidden Markov Model for Named Entity Matching

  • Conference paper
  • First Online:
Innovations and Advances in Computer Sciences and Engineering

Abstract

This paper introduces a pair-Hidden Markov Model (pair-HMM) for the task of evaluating the similarity between bilingual named entities. The pair-HMM is adapted from Mackay and Kondrak [1] who used it on the task of cognate identification and was later adapted by Wieling et al. [5] for Dutch dialect comparison. When using the pair-HMM for evaluating named entities, we do not consider the phonetic representation step as is the case with most named-entity similarity measurement systems. We instead consider the original orthographic representation of the input data and introduce into the pair-HMM representation for diacritics or accents to accommodate for pronunciation variations in the input data. We have first adapted the pair-HMM on measuring the similarity between named entities from languages (French and English) that use the same writing system (the Roman alphabet) and languages (English and Russian) that use a different writing system. The results are encouraging as we propose to extend the pair-HMM to more application oriented named-entity recognition and generation tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. W. Mackay and G. Kondrak, “Computing Word Similarity and Identifying Cognates with Pair Hidden Markov Models,” Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL), pp. 40-47, Ann Arbor, Michigan, 2005.

    Google Scholar 

  2. W. Lam, S-K. Chan and R. Huang, “Named Entity Translation Matching and Learning: With Application for Mining Unseen Translations,” ACM Transactions on Information Systems, vol. 25, issue 1, article 2, 2007.

    Google Scholar 

  3. N. Chinchor, “MUC-7 Named Entity Task Definition,” Proceedings of the 7 th Message Understanding Conference (MUC-7), 2007.

    Google Scholar 

  4. C-J. Lee, J.S. Chang and J-S.R. Juang, “Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Models and Multiple Knowledge Sources,” ACM Transactions on Asian Language Information Processing, vol. 5, issue 2, 2006, pp. 121-145.

    Article  Google Scholar 

  5. M. Wieling, T. Leinonen and J. Nerbonne, “Inducing Sound Segment Differences using Pair Hidden Markov Models. In J. Nerbonne, M. Ellison and G. Kondrak (eds.), Computing and Historical Phonology: 9 th Meeting of ACL Special Interest Group for Computational Morphology and Phonology Workshop, Prague, pp. 48-56, 2007.

    Google Scholar 

  6. C-C. Hsu., C-H. Chen, T-T. Shih and C-K. Chen, “Measuring Similarity between Transliterations against Noise and Data,” ACM Transactions on Asian Language Information Processing, vol. 6, issue 2, article 5, 2005.

    Google Scholar 

  7. C.M. Grinstead and J.L. Snell, Introduction to Probability, 2nd Edition, AMS, 1997.

    Google Scholar 

  8. B. Poliquen, R. Steinberger, C. Ignat, I. Temnikova, A. Widiger, W. Zaghouani and J. Žižka, “Multilingual Person Name Recognition and Transliteration. Revue CORELA, Cognition, Represéntation, Language, 2005.

    Google Scholar 

  9. L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, vol. 77, issue 2, pp. 257-286, 1989.

    Article  Google Scholar 

  10. W. Mackay, Word Similarity using Pair Hidden Markov Models, Masters Thesis, University of Alberta, 2004.

    Google Scholar 

  11. R. Durbin, S.R. Eddy, A. Krogh and G. Mitchison, Biological Sequence Analysis: Probabilistic Models of Protein and Nucleic Acids. Cambridge University Press, 1998.

    Google Scholar 

  12. A. Arribas-Gil, E. Gassiat and C. Matias, “Parameter Estimation in Pair-hidden Markov Models,” Scandinavian Journal of Statistics, vol. 33, issue 4, pp. 651-671, 2006.

    Article  MATH  MathSciNet  Google Scholar 

  13. E.M. Voorhees and D.M. Tice. The TREC-8 Question Answering Track Report. In English Text Retrieval Conference (TREC-8), 2000.

    Google Scholar 

  14. D. Jurafsky and H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd Edition, Pearson Edn Inc., Prentice Hall, 2009.

    Google Scholar 

  15. C-J. Lee, J.S. Chang and J-S.R. Juang. A Statistical Approach to Chinese-to-English Back Transliteration. In Proceedings of the 17th Pacific Asia Conference, 2003.

    Google Scholar 

  16. G. Kondrak and T. Sherif. Evaluation of Several Phonetic Algorithms on the Task of Cognate Identification. In Proceedings of the Workshop on Linguistic Distances, pages 43-50, Association for Computational Linguistics, Sydney, 2006.

    Google Scholar 

  17. D. Durand and R. Hoberman. HMM Lecture Notes, Carnegie Mellon School of Computer Science. Retrieved fromhttp://www.cs.cmu.edu/~durand/03711/Lectures/hmm3-05.pdf on14th Oct. 2008.

  18. S. Bergsma and G. Kondrak, “Alignment-Based Discriminative String Similarity”. In Proceedings of the 45 th Annual Meeting of the Association of Computational Linguistics, pp. 656-663, Czech Republic, June 2007.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Nabende .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this paper

Cite this paper

Nabende, P., Tiedemann, J., Nerbonne, J. (2010). Pair Hidden Markov Model for Named Entity Matching. In: Sobh, T. (eds) Innovations and Advances in Computer Sciences and Engineering. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-3658-2_87

Download citation

  • DOI: https://doi.org/10.1007/978-90-481-3658-2_87

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-3657-5

  • Online ISBN: 978-90-481-3658-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics