Efficient text summarization method for blind people using text mining techniques

Abstract

Owing to the phenomenal growth in communication technology, most of us hardly have time to read books. This habit of reading is slowly diminishing because of the busy lives of people. For visually challenged people, the situation is even worse. In order to address this impedes, we develop a better and more accurate methodology than the existing ones. In this work, in order to save the efforts for reading the complete text every time, we modify the Weighted TF_IDF (Term Frequency Inverse Document Frequency) algorithm to summarize books into relevant keywords. Then, we compare the modified algorithm with that of the existing algorithms of TextRank Algorithm, Luhn’s Algorithm, LexRank Algorithm, Latent Semantic Analysis(LSA). From the comparative analysis, we find that Weighted TF_IDF is an efficient algorithm to automate text summarization and produce an effective summary which is then converted from text to speech. Thus, the proposed algorithm would highly be useful for blind people.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

References

  1. Aliguliyev, R. M. (2007). Automatic document summarization by sentence extraction. Computing Technology, 12(5), 5–15.

    MATH  Google Scholar 

  2. Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). Text summarization techniques: a brief survey. arXiv:1707.02268.

  3. Aone, C., Okurowski, M. E., & Gorlinsky, J. (1998, August). Trainable, scalable summarization using robust NLP and machine learning. In Proceedings of the 17th international conference on Computational linguistics-Volume 1 (pp. 62–66). Association for Computational Linguistics.

  4. Barzilay, R., & Elhadad, N. (2002). Inferring strategies for sentence ordering in multi document news summarization. Journal of Artificial Intelligence Research, 17, 35–55.

    MATH  Google Scholar 

  5. Barzilay, R., & Lee, L. (2003, May). Learning to paraphrase: An unsupervised approach using multiple-sequence alignment. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1 (pp. 16–23). Association for Computational Linguistics.

  6. Basheer, S., Bivi, S., Aysha, M., Jayakumar, S., Rathore, A., & Jeyakumar, B. (2019). Machine learning based classification of cervical cancer using K-nearest neighbour, random forest and multilayer perceptron algorithms. Journal of Computational and Theoretical Nanoscience, 16(5–6), 2523–2527. (5).

    Google Scholar 

  7. Baxendale, P. B. (1958). Machine-made index for technical literature-an experiment. IBM Journal of Research and Development, 2(4), 354–361.

    Google Scholar 

  8. Bouguettaya, A., Gao, Y., Klimenko, A., Chen, L., Zhang, X., Dzerzhinskiy, F., et al. (2017). Web information systems engineering-WISE 2017. Cham: International Publishing AG.

    Google Scholar 

  9. Brandow, R., Mitze, K., & Rau, L. F. (1995). Automatic condensation of electronic publications by sentence selection. Information Processing & Management, 31(5), 675–685.

    Google Scholar 

  10. Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM (JACM), 16(2), 264–285.

    MATH  Google Scholar 

  11. Eisner, J. (2007, June). In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL).

  12. Erkan, G., & Radev, D. R. (2004). Lexpagerank: Prestige in multi-document text summarization. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 365–371).

  13. Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457–479.

    Google Scholar 

  14. Freitas, D., & Kouroupetroglou, G. (2008). Speech technologies for blind and low vision persons. Technology and Disability, 20(2), 135–156.

    Google Scholar 

  15. Gillick, D., & Favre, B. (2009, June). A scalable global model for summarization. In Proceedings of the workshop on integer linear programming for natural langauge processing (pp. 10–18). Association for Computational Linguistics.

  16. Gillick, D., Favre, B., & Hakkani-Tür, D. (2008). The ICSI Summarization System at TAC 2008. In Tac.

  17. Gunning, R. (1952). The technique of clear writing. New York: McGraw-Hill.

    Google Scholar 

  18. Hadjadj, D., & Burger, D. (1999). Braillesurf: An HTML browser for visually handicapped people. In Proceedings of Tech. and Persons with Disabilities Conf.

  19. Hahn, U., & Mani, I. (2000). The challenges of automatic summarization. Computer, 33(11), 29–36.

    Google Scholar 

  20. Kadam, S., Jadhav, V., Babar, S., Pise, S., & Davane, P. (2013). Text summarization: An overview.

  21. Karthik, S., & Sudha, M. (2020). Predicting bipolar disorder and schizophrenia based on non-overlapping genetic phenotypes using deep neural network. Evolutionary Intelligence, 11, 1–16. https://doi.org/10.1007/s12065-019-00346-y.

    Google Scholar 

  22. Karthikeyan, T., Sekaran, K., Ranjith, D., & Balajee, J. M. (2019). Personalized content extraction and text classification using effective web scraping techniques. International Journal of Web Portals (IJWP), 11(2), 41–52.

    Google Scholar 

  23. Kincaid, J. P., Fishburne Jr, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel.

  24. Lapata, M. (2003, July). Probabilistic text structuring: Experiments with sentence ordering. In Proceedings of the 41st annual meeting on association for computational linguistics-volume 1 (pp. 545–552). Association for Computational Linguistics.

  25. Li, C., Qian, X., & Liu, Y. (2013). Using supervised bigram-based ILP for extractive summarization. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (Vol. 1, pp. 1004-1013).

  26. Lin, C. Y., & Hovy, E. (1997). Identifying topics by position. In fifth conference on applied natural language processing (pp. 283–290).

  27. Lin, C. Y., & Hovy, E. (2002). From single to multi-document summarization. In Proceedings of the 40th annual meeting of the association for computational linguistics (pp. 457-464).

  28. Linvill, J. G., & Bliss, J. C. (1966). A direct translation reading aid for the blind. Proceedings of the IEEE, 54(1), 40–51.

    Google Scholar 

  29. MacQueen, J. (1967, June). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281–297).

  30. Mahajan, M., Nimbhorkar, P., & Varadarajan, K. (2009, February). The planar k-means problem is NP-hard. In International Workshop on Algorithms and Computation (pp. 274–285). Springer, Berlin

  31. Mahmud, J. U., Borodin, Y., & Ramakrishnan, I. V. (2007, May). Csurf: a context-driven non-visual web-browser. In Proceedings of the 16th international conference on World Wide Web (pp. 31–40). ACM.

  32. Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404–411).

  33. Minel, J. L., Nugier, S., & Piat, G. (1997). How to appreciate the quality of automatic text summarization?. In Intelligent Scalable Text Summarization: Examples of FAN and MLUCE protocols and their results on SERAPHIN.

  34. Morris, A. H., Kasper, G. M., & Adams, D. A. (1992). The effects and limitations of automated text condensing on reading comprehensionperformance. Information Systems Research, 3(1), 17–35.

    Google Scholar 

  35. Nandhini, K., & Balasundaram, S. R. (2012, December). Significance of learner dependent features for improving text readability using extractive summarization. In 2012 4th international conference on intelligent human computer interaction (IHCI) (pp. 1–5). IEEE.

  36. Nenkova, A., & McKeown, K. (2012). A survey of text summarization techniques. Mining text data (pp. 43–76). Boston: Springer.

    Google Scholar 

  37. Neto, J. L., Freitas, A. A., & Kaestner, C. A. (2002, November). Automatic text summarization using a machine learning approach. In Brazilian symposium on artificial intelligence (pp. 205–215). Springer, Berlin.

  38. Ouyang, Y., Li, S., & Li, W. (2007, November). Developing learning strategies for topic-based summarization. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management (pp. 79–86). ACM.

  39. Paice, C. D. (1990). Constructing literature abstracts by computer: Techniques and prospects. Information Processing & Management, 26(1), 171–186.

    Google Scholar 

  40. Radev, D. R., Jing, H., Styś, M., & Tam, D. (2004). Centroid-based summarization of multiple documents. Information Processing & Management, 40(6), 919–938.

    MATH  Google Scholar 

  41. Rantala, J., Raisamo, R., Lylykangas, J., Surakka, V., Raisamo, J., Salminen, K., et al. (2009). Methods for presenting braille characters on a mobile device with a touchscreen and tactile feedback. IEEE Transactions on Haptics, 2(1), 28–39.

    Google Scholar 

  42. Schilder, F., & Kondadadi, R. (2008, June). FastSum: Fast and accurate query-based multi-document summarization. In Proceedings of the 46th annual meeting of the association for computational linguistics on human language technologies: Short papers (pp. 205–208). Association for Computational Linguistics.

  43. Sekaran, K., & Sudha, M. (2019). Prediction of lipopolysaccharides simulation responsiveness on gene expression profiles of major depression disorder affected cases using machine learning. International Journal of Scientific & Technology Research, 8(11), 21–24.

    Google Scholar 

  44. Sekaran, K., & Sudha, M. (2020). Predicting drug responsiveness with deep learning from the effects on gene expression of Obsessive-Compulsive Disorder affected cases. Computer Communications, 151, 386–394.

    Google Scholar 

  45. Shen, C., & Li, T. (2010, August). Multi-document summarization via the minimum dominating set. In: Proceedings of the 23rd international conference on computational linguistics (pp. 984–992). Association for Computational Linguistics.

  46. Shen, D., Sun, J. T., Li, H., Yang, Q., & Chen, Z. (2007). Document summarization using conditional random fields. In IJCAI (vol. 7, pp. 2862–2867)

  47. Shinohara, M., Shimizu, Y., & Mochizuki, A. (1998). Three-dimensional tactile display for the blind. IEEE Transactions on Rehabilitation Engineering, 6(3), 249–256.

    Google Scholar 

  48. Sidorov, G., & Gelbukh, A. (2001, October). Automatic detection of semantically primitive words using their reachability in an explanatory dictionary. In 2001 IEEE international conference on systems, man and cybernetics. e-systems and e-man for cybernetics in cyberspace (Cat. No. 01CH37236) (vol. 3, pp. 1683–1687). IEEE.

  49. Sultana, H., Parveen, S., Nirvishi, D., Durai, D., Nalini, N., & Balajee, J. M. (2019). Comparison of machine learning algorithms to build optimized network intrusion detection system. Journal of Computational and Theoretical Nanoscience, 16(5–6), 2541–2549. (9).

    Google Scholar 

  50. Villatoro-Tello, E., Villaseñor-Pineda, L., & Montes-y-Gómez, M. (2006, September). Using word sequences for text summarization. In International conference on text, speech, and dialogue (pp. 293–300). Springer, Berlin.

  51. Wan, X., Li, H., & Xiao, J. (2010, July). Cross-language document summarization based on machine translation quality prediction. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 917–926). Association for Computational Linguistics.

  52. Wong, K. F., Wu, M., & Li, W. (2008, August). Extractive summarization using supervised and semi-supervised learning. In Proceedings of the 22nd international conference on computational linguistics-volume 1 (pp. 985–992). Association for Computational Linguistics.

  53. Wu, H. C., Luk, R. W. P., Wong, K. F., & Kwok, K. L. (2008). Interpreting tf-idf term weights as making relevance decisions. ACM Transactions on Information Systems (TOIS), 26(3), 13.

    Google Scholar 

  54. Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In the 33rd annual meeting of the association for computational linguistics (pp. 189–196).

  55. Yeh, J. Y., Ke, H. R., Yang, W. P., & Meng, I. H. (2005). Text summarization using a trainable summarizer and latent semantic analysis. Information Processing & Management, 41(1), 75–95.

    Google Scholar 

Download references

Acknowledgements

This research was supported by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University through the Fast-track Research Funding Program.

Author information

Affiliations

Authors

Corresponding author

Correspondence to M. Anbarasi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Basheer, S., Anbarasi, M., Sakshi, D.G. et al. Efficient text summarization method for blind people using text mining techniques. Int J Speech Technol (2020). https://doi.org/10.1007/s10772-020-09712-z

Download citation

Keywords

  • Summarizer
  • Text
  • Text ranking algorithm
  • Text-to-speech