Skip to main content

A Search Engine Development Utilizing Unsupervised Learning Approach

  • Conference paper
  • 2031 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 516))

Abstract

This article reports a software development of a generic search engine utilizing an unsupervised learning approach. This learning approach has become apparently important due to the growth rate of data which has increased tremendously and challenge our capacity to write software algorithm and implementation around it. This was advocated as a mean to understand better the flow of algorithm in an uncontrolled environment setting. It uses the Depth-First-Search (DFS) algorithm retrieval strategy to retrieve pages with topical searching. Subsequently, an inverted indexing technique is applied to store mapping from contents to its location in a database. Subsequently, these techniques require proper approach to avoid flooding of irrelevant links which can constitute a poor design and constructed search engine to crash. The main idea of this research is to learn the concept of how to crawl, index, search and rank the output accordingly in an uncontrolled environment. This is a contrast as compared to a supervised learning conditions which could lead to information less overloading.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yahoo! Inc., http://www.yahoo.com

  2. Bing Inc., http://www.bing.com

  3. Google Inc., http://www.google.com

  4. Mishra, A.A., Kamat, C.: Migration of Search Engine Process into the Cloud. International Journal of Computer Application 19(1) (April 2011)

    Google Scholar 

  5. Glover, E.J., Lawrence, S., Gordon, M.D., Birmingham, W.P., Giiles, C.L.: Web Search Your Way. Communications of the ACM 44(12), 97–102 (2001)

    Article  Google Scholar 

  6. Chen, H., Buntin, P., Sutjahjo, S., Sommer, C., Neely, D.: Expert, prediction, symbolic learning and neural networks: An experiment on Greyhound racing. IEEE Expert. 9(21), 21–27 (1994)

    Article  Google Scholar 

  7. Fumas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The Vocabulary Problem in Human-System Communication. Communications of the ACM 30(11), 964–971 (1987)

    Article  Google Scholar 

  8. Kumar, G., Duhan, N., Sharma, A.K.: Page Ranking Based on Number of Visits of Links of Web Pages. In: Computer & Communication Technology (ICCT), pp. 11–14 (2011)

    Google Scholar 

  9. Brin, S., Page, L.: The anatomy of a Large-Scale Hypertextual Web Search Engine. In: World Wide Web Conference (WWW 1998), pp. 107–117 (1998)

    Google Scholar 

  10. Koster, M.: Robots in the web: threat or treat. ConneXions 4(4) (April 1995)

    Google Scholar 

  11. Berry, D.C., Dienes, Z.: Implicit learning: Theoretical and empirical issues. Erlbaum, Hillsdale (1993)

    Google Scholar 

  12. Hayes, N., Broadbent, D.E.: Two modes of learning for interactive tasks. Cognition 24, 249–276 (1988)

    Article  Google Scholar 

  13. Hock, H.S., Malcus, L., Hasher, L.: Frequency discrimination: Assessing global and elemental letter units in memory. Journal of Experimental Psychology, Learning, Memory & Cognition 12, 232–240 (1986)

    Article  Google Scholar 

  14. Kellog, R.T.: When can we introspect accurately about mental processes. Memory & Cognition 10, 141–144 (1982)

    Article  Google Scholar 

  15. Bowman, C.M., Danzig, P.B., Manber, U., Schwartz, F.: Scalable Internet Resource Discovery: Research Problems and Approaches. Communications of the ACM 37(8), 98–107 (1994)

    Article  Google Scholar 

  16. Croft, W.B., Metzler, D., Strohman, T.: Search Engines: Information Retrieval in Practice, p. 344. Pearson Education Inc. (2010)

    Google Scholar 

  17. Love, B.C., Markman, A.B., Yamauchi, T.: Modeling classification and inference learning. In: Fifteenth National Conference on Artificial Intelligence, pp. 136–141. MIT Press, MA (2000)

    Google Scholar 

  18. Yamauchi, T., Love, B.C., Markman, A.B.: Learning non-linearly separable categories by inference and classification. Journal of Experimental Psychology: Leraning, Memory & Cognition 28, 585–593 (2002)

    Google Scholar 

  19. Sheperd, R.N., Hoyland, C.L., Jenkims, J.M.: Learning and memorization of classifications. Psychological Monographs 75(13, Whole No. 517)

    Google Scholar 

  20. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall PTR (1998)

    Google Scholar 

  21. Kotsiantis, S.: Supervised Machine Learning: A Review of Classification Techniques. Informatica Journal 31, 249–268 (2007)

    MathSciNet  MATH  Google Scholar 

  22. Rahman, M.N., Seyal, A.H., Maidin, S.A.: Search engine development: Adaptation from supervised learning methodology. In: Fourth International Conference on Digital Information Processing & Communications, March 18 -20, pp. 35–42 (2014)

    Google Scholar 

  23. Chau, M., Wong, C.H.: Designing the user interface and functions of a search engine development tool. Decision Support Systems 48, 369–382 (2010)

    Article  Google Scholar 

  24. Salton, G.: Automatic Text Processing. Addison-Wesley, Reading (1989)

    Google Scholar 

  25. Faloutsos, C.: Access Methods for Text. ACM Computing Surveys 17(1), 48–74 (1985)

    Article  MathSciNet  Google Scholar 

  26. Bar-Ilan, J.: Search engine results over time: A case study on search engine stability, http://www.cindoc.csis.es/cybermetrics/articles/v2i1p1.html (retrieved January 26, 2014)

  27. Bar-Ilan, J.: Search engine ability to cope with the changing web. In: Levene, M., Poulovasilis, A. (eds.) Web Dynamics. Springer, Berlin (2004)

    Google Scholar 

  28. Mettrop, W., Nieuwenhuysen, P.: Internet search engines - fluctuations in document accessibility. Journal of Documentation 57(5), 623–651 (2001)

    Article  Google Scholar 

  29. Rousseau, R.: Daily time series of common single word searches in AltaVista and NorthernLight. Cybermetrics 2(3) (1999)

    Google Scholar 

  30. Frants, V.I., Shapiro, J., Taksa, I., Voiskunskii, V.G.: Boolean Search: Current State and Perspectives. Journal of the American Society of Information Science 50(1), 86–95 (1999)

    Article  Google Scholar 

  31. Jung, S., Herlocker, J.L., Webster, J.: Click Data as Implicit Relevance Feedback in Web Search. Information Processing & Management 33, 791–807 (2007)

    Article  Google Scholar 

  32. Chau, M., Chen, H., Qin, J., Zhou, J.Y., Qin, Y., Sung, W.K., McDonald, D.: Comparison of two approaches to building a vertical search tool: a case study in the nanotechnology domain. In: 2nd ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 135–144. ACM (July 2002)

    Google Scholar 

  33. Yi, J.: The Research of Search Engine Based on Semantic Web. In: International Symposium on Intelligent Information Technology Workshop (IITAW), pp. 360–363 (2008)

    Google Scholar 

  34. Brassard, G., Bratley, P.: Fundamentals of Algorithms, 1st edn., pp. 303–305. PHI Publications, New Delhi (2008)

    MATH  Google Scholar 

  35. Russel, S., Norvig, P.: Artificial Intelligence: A Modern Approach. Prentice Hall, Upper Saddle River (1995)

    MATH  Google Scholar 

  36. Segaran, T.: Programming Collective Intelligence: Building Smart Web 2.0 Applications, 1st edn. O’Reilly Media Inc. (2007)

    Google Scholar 

  37. Rahman, M.N., Seyal, A.H., Mohamed, H.Y., Mashud, I.: A theoretical framework on the use of database management systems. Journal of Technology & Management 5(1), 36–48 (2007)

    Google Scholar 

  38. Najork, M.: Web Crawler Architecture. Encyclopedia of Database Systems (2009)

    Google Scholar 

  39. Love, B.C.: Comparing supervised and unsupervised category learning. Psychonomic Bulletin & Review 9(4), 829–835 (2002)

    Article  Google Scholar 

  40. Rahman, M.N., Seyal, A.H., Mohamed, H.A.Y.: An empirical framework of DBMS usage in Brunei Darussalam. In: The Fifth Annual Global Information Technology Management World Conference (GITM), San Diego, California, USA, June 13-15, pp. 189–192 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohd Noah Abdul Rahman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rahman, M.N.A., Seyal, A.H., Omar, M.S., Maidin, S.A. (2015). A Search Engine Development Utilizing Unsupervised Learning Approach. In: Intan, R., Chi, CH., Palit, H., Santoso, L. (eds) Intelligence in the Era of Big Data. ICSIIT 2015. Communications in Computer and Information Science, vol 516. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46742-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46742-8_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46741-1

  • Online ISBN: 978-3-662-46742-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics