Web-Assisted Detection and Correction of Joint and Disjoint Malapropos Word Combinations

  • Igor A. Bolshakov
  • Sofia N. Galicia-Haro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3513)


An experiment on Web-assisted detection and correction of malapropism is reported. Malapropos words semantically destroy collocations they are in, usually with retention of syntactical links with other words. A hundred English malapropisms were gathered, each supplied with its correction candidates, i.e. word combinations with one word equal to an editing variant of the corresponding word in the malapropism. Google statistics of occurrences and co-occurrences were gathered for each malapropism and correcting candidate. The collocation components may be adjacent or separated by other words in a sentence, so statistics were accumulated for the most probable distance between them. The raw Google occurrence statistics are then recalculated to numeric values of a specially defined Semantic Compatibility Index (SCI). Heuristic rules are proposed to signal malapropisms when SCI values are lower than a predetermined threshold and to retain a few highly SCI-ranked correction candidates. Within certain limitations, the experiment gave promising results.


Content Word Editing Operation Semantic Error Word Combination Probable Distance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bolshakov, I.A.: Getting one’s first million..Collocations. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 229–242. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  2. 2.
    Bolshakov, I.A., Gelbukh, A.: On Detection of Malapropisms by Multistage Collocation Testing. In: Düsterhöft, A., Talheim, B. (eds.) Proc. 8th Int. Conference on Applications of Natural Language to Information Systems NLDB 2003, Burg, Germany, June 2003, vol. V. P-29, Bonn, pp. 28–41 (2003)Google Scholar
  3. 3.
    Bolshakov, I.A., Gelbukh, A.: Paronyms for Accelerated Correction of Semantic Errors. International Journal on Information Theories & Applications 10, 198–204 (2003)Google Scholar
  4. 4.
    Gelbukh, A., Bolshakov, I.A.: On Correction of Semantic Errors in Natural Language Texts with a Dictionary of Literal Paronyms. In: Favela, J., Menasalvas, E., Chávez, E. (eds.) AWIC 2004. LNCS (LNAI), vol. 3034, pp. 105–114. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Keller, F., Lapata, M.: Using the Web to Obtain Frequencies for Unseen Bigram. Computational linguistics 29(3), 459–484 (2003)CrossRefGoogle Scholar
  6. 6.
    Kilgarriff, A., Grefenstette, G.: Introduction to the Special Issue on the Web as Corpus. Computational linguistics 29(3), 333–347 (2003)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Hirst, G., St-Onge, D.: Lexical Chains as Representation of Context for Detection and Corrections of Malapropisms. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 305–332. MIT Press, Cambridge (1998)Google Scholar
  8. 8.
    Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  9. 9.
    Mel’čuk, I.: Dependency Syntax: Theory and Practice. SONY Press, NY (1988)Google Scholar
  10. 10.
    Oxford Collocations Dictionary for Students of English. Oxford University Press (2003) Google Scholar
  11. 11.
    Sekine, S., Carrol, J.J., Ananiadou, S., Tsujii, J.: Automatic Learning for Semantic Collocation. In: Proc. 3rd Conf. ANLP, Trento, Italy, pp. 104–110 (1992)Google Scholar
  12. 12.
    Wermter, J., Hahn, U.: Collocation Extraction Based on Modifiability Statistics. In: Proc. 20th Int. Conf. on Computational Linguistics Coling 2004, Geneva, Switzerland, August 2004, pp. 980–986 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Igor A. Bolshakov
    • 1
  • Sofia N. Galicia-Haro
    • 2
  1. 1.Center for Computing Research (CIC)National Polytechnic Institute (IPN)Mexico CityMexico
  2. 2.Faculty of SciencesNational Autonomous University of Mexico (UNAM)Mexico CityMexico

Personalised recommendations