Syntactic and Semantic Disambiguation of Numeral Strings Using an N-Gram Method

  • Kyongho Min
  • William H. Wilson
  • Yoo-Jin Moon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3809)


This paper describes the interpretation of numerals, and strings including numerals, composed of a number and words or symbols that indicate whether the string is a SPEED, LENGTH, or whatever. The interpretation is done at three levels: lexical, syntactic, and semantic. The system employs three interpretation processes: a word trigram constructor with tokeniser, a rule-based processor of number strings, and n-gram based disambiguation of meanings. We extracted numeral strings from 378 online newspaper articles, finding that, on average, they comprised about 2.2% of the words in the articles. We chose 287 of these articles to provide unseen test data (3251 numeral strings), and used the remaining 91 articles to provide 886 numeral strings for use in manually extracting n-gram constraints to disambiguate the meanings of the numeral strings. We implemented six different disambiguation methods based on category frequency statistics collected from the sample data and on the number of word trigram constraints of each category. Precision ratios for the six methods when applied to the test data ranged from 85.6% to 87.9%.


Semantic Category Sample Dataset Lexical Category Precision Ratio Disambiguation Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Asahara, M., Matsumoto, Y.: Japanese Named Entity Extraction with Redundant Morphological Analysis. In: Proceedings of HLT-NAACL 2003, pp. 8–15 (2003)Google Scholar
  2. 2.
    Bikel, D., Schwartz, R., Weischedel, R.: An Algorithm that Learns What’s in a Name. Machine Learning 34, 211–231 (1999)zbMATHCrossRefGoogle Scholar
  3. 3.
    Black, W., Rinaldi, F., Mowatt, D.: FACILE: Description of the NE system used for MUC-7. In: Proceedings of Message Uunderstanding Conference (MUC-7) (1998)Google Scholar
  4. 4.
    Dale, R.: A Framework for Complex Tokenisation and its Application to Newspaper Text. In: Proceedings of the second Australian Document Computing Symposium (1997)Google Scholar
  5. 5.
    Earley, J.: An Efficient Context-Free Parsing Algorithm. CACM 13(2), 94–102 (1970)zbMATHGoogle Scholar
  6. 6.
    Maynard, D., Tablan, V., Ursu, C., Cunningham, H., Wilks, Y.: Named Entity Recognition from Diverse Text Types. In: Proceedings of Recent Advances in NLP (2001)Google Scholar
  7. 7.
    Nelson, G., Wallis, S., Aarts, B.: Exlporing Natural Language - Working with the British Component of the International Corpus of English. John Benjamins, The Netherlands (2002)Google Scholar
  8. 8.
    Polanyi, L., van den Berg, M.: Logical Structure and Discourse Anaphora Resolution. In: Proceedings of ACL 1999 Workshop on The Relation of Discourse/Dialogue Structure and Reference, pp. 10–117 (1999)Google Scholar
  9. 9.
    Siegel, M., Bender, E.M.: Efficient Deep Processing of Japanese. In: Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization (2002)Google Scholar
  10. 10.
    Torii, M., Kamboj, S., Vijay-Shanker, K.: An investigation of Various Information Sources for Classifying Biological Names. In: Proceedings of ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 113–120 (2003)Google Scholar
  11. 11.
    Wang, H., Yu, S.: The Semantic Knowledge-base of Contemporary Chinese and its Apllication in WSD. In: Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp. 112–118 (2003)Google Scholar
  12. 12.
    Zhou, G., Su, J.: Named Entity Recognition using an HMM-based Chunk Tagger. In: Proceedings of ACL 2002, pp. 473–480 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Kyongho Min
    • 1
  • William H. Wilson
    • 2
  • Yoo-Jin Moon
    • 3
  1. 1.School of Computer and Information SciencesAUTAucklandNew Zealand
  2. 2.School of Computer Science and EngineeringUNSWSydneyAustralia
  3. 3.Department of Management Information SystemsHUFSYongIn, KyonggiKorea

Personalised recommendations