Advertisement

Hascheck - The Croatian Academic Spelling Checker

  • Sandor Dembitz
  • Petar Knezevic
  • Mladen Sokele

Abstract

The Croatian Academic Spelling Checker, or Hascheck, is a telematic service embedded in E-mail. The user sends his/her text to an address and waits for an automatic reply in the form of a Hascheck report. As a program, Hascheck is a learning semiautomaton. First, it evaluates unrecognised strings from a text in a fuzzy manner: some of them are extremely peculiar, others are very or moderately peculiar, and the rest are almost non-peculiar strings, i.e. almost certainly words. Then, less peculiar strings are processed by a tagger. Last, after a minor human intervention, a collection of words to be learned is obtained. In this paper we describe in short the string classifying algorithm and its selectivity. We also describe the tagging algorithm and its efficiency. Experience gained during four years of service operation, accomplished with two analytic functions describing the learning process, are also presented. Finally, we discuss project costs and benefits.

Keywords

Word Type Unknown Word Spell Checker Learning Index Inflected Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Batnozic, S., Ranilovic, B. & Silic, J. (1996). Croatian Computerised Orthography (in Croatian), Matica Hrvatska, Zagreb.Google Scholar
  2. 2.
    Bentley, J. (1985). A Spelling Checker, Commun. of the ACM 28(5):456–462.MathSciNetCrossRefGoogle Scholar
  3. 3.
    Blaskovic, B. (1996). Signalling Protocols Synthesis for Telecommunication Service Switching (in Croatian), PhD thesis, Faculty of Electrical Engineering and Computing, University of Zagreb.Google Scholar
  4. 4.
    Bratanic, M. (1975). English-Croatian Lexicographic Corpus (in Croatian), Bulletin of the Institute of Linguistics in Zagreb, no. 1, pp. 71–73.Google Scholar
  5. 5.
    Chomsky, N. (1957). Syntactic Structures, Mouton&Co., The Hague.Google Scholar
  6. 6.
    Dembitz, S. (1982). Word Generator and its Applicability, Proc. of the International Zurich Seminar on Digital Communications: Man-Machine Interaction, pp. 59-64, Zürich, March 11-12, 1982.Google Scholar
  7. 7.
    Dembitz, S. (1991). Automatic Correction of Spelling and Typing Errors in Croatian Words (in Croatian), Proc. of the Conference INTERGRAFIKA’91, pp. 59–64, Zagreb, Croatia, Oct.7–8, 1991.Google Scholar
  8. 8.
    Dembitz, S. (1993). Automatic Misspelling Detection and Telecommunication Services (in Croatian), PhD Thesis, Faculty of Electrical Engineering, University of Zagreb.Google Scholar
  9. 9.
    Dembitz, S. & Sokele, M. (1997). Comparison of Croatian Spelling Checkers, Proc. of the Conference on Software in Telecommunication and Computer Networks - SoftCOM’97, pp. 191–200, Oct. 17–19, 1997.Google Scholar
  10. 10.
    Dembitz, S. & Sokele, M. (1998). Computational Proofreading of the Croatian Lexicon, Proc. of the 9th Mediterranean Electrotechnical Conference, pp. 1370–1374, Tel-Aviv, Israel, May 18–20, 1998.Google Scholar
  11. 11.
    Dolgopov, A. S. (1986). Automatic Spelling Correction, Cybernetics 22(3):332–339.Google Scholar
  12. 12.
    Kukich, K. (1992). Techniques for Automatically Correcting Words in Text, ACM Computing Surveys 24(4):377–439.CrossRefGoogle Scholar
  13. 13.
    Lucchesi, C. L. & Kowaltowski, T. (1993). Applications of Finite Automata Representing Large Vocabularies, Software - Practice and Experience, 23(1):15–30.CrossRefGoogle Scholar
  14. 14.
    Martinet, A. (1960). Eléments de linguistique générale, Armand Colin, Paris.Google Scholar
  15. 15.
    McIllroy, M. D. (1982). Development of a Spelling List, IEEE Trans. Commun. COM-30(1):91–99.CrossRefGoogle Scholar
  16. 16.
    Mikheev, A. (1996). Unsupervised Learning of Word-Category Guessing Rules, Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, University of California, Santa Cruz, pp. 62–70, 1996.Google Scholar
  17. 17.
    Morris, R. & Cherry, L. L. (1975). Computer Detection of Typographical Errors, IEEE Transactions on Professional Communications, PC-18(1):54–64Google Scholar
  18. 18.
    Riseman, E. M. & Hanson, A. R. (1974). A Contextual Postprocessing System for Error Correction Using Binary n-Grams, IEEE Trans. Comput., C-23(5):480–493CrossRefGoogle Scholar
  19. 19.
    Shannon, C. E. (1951). Prediction and Entropy of Printed English, Bell System Technical Journal, 30(1):50–64.MATHGoogle Scholar
  20. 20.
    Sokele, M. (1997). Croatian Spelling Checkers (in Croatian), WIN.INI 6(3):38–49Google Scholar
  21. 21.
    Ullmann, J. R. (1977). A Binary n-Gram Technique for Automatic Correction of Substitution, Deletion, Insertion and Reversal Errors in Words, Computer Journal, 20(2):141–147.MATHCrossRefGoogle Scholar
  22. 22.
    Weischedel, R., Meteer, M., Schwartz, R., Ramshow, L. & Palmucci, J. (1993). Coping with Ambiguity and Unknown Words through Probabilistic Models, Computational Linguistics, 19(2):359–383.Google Scholar
  23. 23.
    Zamora, E. M., Pollock, J. J. & Zamora, A. (1981). The Use of Trigram Analysis for Spelling Error Detection, Information Processing and Management, 17(6):305–316.CrossRefGoogle Scholar
  24. 24.
    Zipf, G. K. (1949). Human Behavior and the Principle of Least Effort, Cambridge: Addison-Wesley.Google Scholar

Copyright information

© Springer-Verlag London 1999

Authors and Affiliations

  • Sandor Dembitz
    • 1
  • Petar Knezevic
    • 1
  • Mladen Sokele
    • 2
  1. 1.Faculty of Electrical Engineering and ComputingUniversity of ZagrebZagrebCroatia
  2. 2.Croatian Post and TelecommunicationZagrebCroatia

Personalised recommendations