Skip to main content

Towards the Development of an Integrated Framework for Enhancing Enterprise Search Using Latent Semantic Indexing

  • Conference paper
Conceptual Structures for Discovering Knowledge (ICCS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6828))

Included in the following conference series:

Abstract

While we have seen significant success in web search, enterprise search has not yet been widely investigated and as a result the benefits that can otherwise be brought to the enterprise are not fully realized. In this paper, we present an integrated framework for enhancing enterprise search. This framework is based on open source technologies which include Apache Hadoop, Tika, Solr and Lucene. Importantly, the framework also benefits from a Latent Semantic Indexing (LSI) algorithm to improve the quality of search results. LSI is a mathematical model used to discover the semantic relationship patterns in a documents collection. We envisage that the proposed framework will benefit various enterprises, improving their productivity by meeting information needs effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mangold, C., Schwarz, H., Mitschang, B.: u38: A Framework for Database-Supported Enterprise Document-Retrieval. In: 10th International Database Engineering and Applications Symposium (IDEAS 2006), IEEE, Los Alamitos (2006)

    Google Scholar 

  2. Hawking, D.: Challanges in Entrerprise Search. In: 5th Australasian Database Conference (ADC 2004), Dunedin, NZ, Conferences in Research and Practice in Information Technology, vol. 27 (2004)

    Google Scholar 

  3. Feldman, S.: Sherman. C.:The cost of not finding Information. IDC (2003)

    Google Scholar 

  4. Dmitriev, P., Serdyukov, P., Chernov, S.: Enterprise and desktop search. In: WWW 2010, pp. 1345–1346 (2010)

    Google Scholar 

  5. Owens, L.: The Forrester WaveTM: Enterprise Search, Q2 (2008)

    Google Scholar 

  6. Dmitriev, P., Eiron, N., Fontoura, M., Shekita, E.: Using Annotations in Enterprise Search. In: WWW 2006. ACM, Edinburgh (2006)

    Google Scholar 

  7. Zhu, H., Raghavan, S., Vaithyanathan, S., Löser, N.A.: The intranet with high precision. In: 16th international conference on World Wide Web, pp. 491–500 (2007)

    Google Scholar 

  8. Li, H., Cao, Y., Xu, J., Hu, Y., Li, S., Meyerzon, D.: A new approach to intranet search based on information extraction. In: 14th ACM International Conference on Information and Knowledge Management, pp. 460–468 (2005)

    Google Scholar 

  9. Xue, G., Zeng, H., Chen, Z., Zhang, H., Lu, C.: Implicit link analysis for small web search. In: 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 56–63 (2003)

    Google Scholar 

  10. Fisher, M., Sheth, A.: Semantic Enterprise Content Management. Practical Handbook of Internet Computing (2004)

    Google Scholar 

  11. Demartini, G.: Leveraging Semantic echnologies for Enterprise Search. In: PIKM 2007. ACM, Lisboa (2007) 978-1-59593-832-9/07/001

    Google Scholar 

  12. Mukherjee, R., Mao. J.: Enterprise search: tough stuff. Qeue 2 (2004)

    Google Scholar 

  13. Telcordia Technologies, http://lsi.research.telcordia.com

  14. Berry, W., Dumais, T., Brien, W.: Using Linear Algebra for Intelligent Information Retrieval. SIAM Review 37(4), 573–595 (1994/1995)

    Article  MathSciNet  MATH  Google Scholar 

  15. Brand, M.: Fast Low-Rank Modifications of the Thin Singular Value Decomposition. Linear Algebra and Its Applications 415, 20–30 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  16. Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. J.of the Society for Information Science 41(6) (1990)

    Google Scholar 

  17. Chen, C., Stoffel, N., Post, M., Basu, C., Bassu, D., Behrens, C.: Telcordia LSI Engine: Implementation and Scalability Issues. In: 11th Int. Workshop on Research Issues in Data Engineering (RIDE 2001): Document Management for Data Intensive Business and Scientific Applications, Heidelberg (2001)

    Google Scholar 

  18. Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by latent semantic analysis. J. of the Society for Information Science 41(6) (1990)

    Google Scholar 

  19. Landauer, T.: Learning Human-like Knowledge by Singular Value Decomposition: A Progress Report, pp. 45–51. MIT Press, Cambridge (1998)

    Google Scholar 

  20. Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representations For Text Categorization. In: ACM-CIKM 1998, Maryland (1998)

    Google Scholar 

  21. Zukas, A., Price, R.J.: Document Categorization Using Latent Semantic Indexing. White Paper, Content Analyst Company, LLC (2003)

    Google Scholar 

  22. Homayouni, R., Heinrich, K., Wei, L., Berry, W.: Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts. Bioinformatics 21, 104–115 (2004)

    Article  Google Scholar 

  23. Ding, C.: A Similarity-based Probability Model for Latent Semantic Indexing. In: 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval, California, pp. 59–65 (1999)

    Google Scholar 

  24. Bartell, B., Cottrell, G., Belew, R.: Latent Semantic Indexing is an Optimal Special Case of Multidimensional Scaling. In: Proceedings, ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 161–167 (1992)

    Google Scholar 

  25. Fagin, R., Kumar, R., McCurley, K., Novak, J., Sivakumar, D., Tomlin, J., Williamson, D.: Searching the workplace web. In: 12th World Wide Web Conference, Budapest (2003) 1581136803/03/0005

    Google Scholar 

  26. McCandless, M., Hatcher, E., Mccandless, M.: Lucene in Action. Manning Publications (2009)

    Google Scholar 

  27. Smiley, D., Pugh, E.: Solr 1.4 Enterprise Search Server. Packt Publishing (2009)

    Google Scholar 

  28. Apache Hadoop, http://hadoop.apache.org/

  29. Apache Lucene, http://lucene.apache.org/solr/

  30. Apache Tika, http://tika.apache.org/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Alhabashneh, O., Iqbal, R., Shah, N., Amin, S., James, A. (2011). Towards the Development of an Integrated Framework for Enhancing Enterprise Search Using Latent Semantic Indexing. In: Andrews, S., Polovina, S., Hill, R., Akhgar, B. (eds) Conceptual Structures for Discovering Knowledge. ICCS 2011. Lecture Notes in Computer Science(), vol 6828. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22688-5_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22688-5_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22687-8

  • Online ISBN: 978-3-642-22688-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics