Advertisement

Domain-Specific Crawler Design

  • Debajyoti MukhopadhyayEmail author
  • Sukanta Sinha
Chapter
Part of the Cognitive Intelligence and Robotics book series (CIR)

Abstract

Domain-specific crawler creates a domain-specific Web-page repository by collecting domain-specific resources from the Internet [1, 2, 3, 4]. Domain-specific Web search engine basically searches domain-specific Web-pages from the domain-specific Web-page repository.

References

  1. 1.
    T. Berners-Lee, M. Fischetti, Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor (HarperBusiness, New York, 1999)Google Scholar
  2. 2.
    B.M. Leiner, V.G. Cerf, D.D. Clark, R.E. Kahn, L. Kleinrock, D.C. Lynch, J. Postel, L.G. Roberts, S. Wolff, A brief history of internet. ACM Comput. Commun. 35(1), 22–31 (2009).  https://doi.org/10.1145/1629607.1629613CrossRefGoogle Scholar
  3. 3.
    W. Willinger, R. Govindan, S. Jamin, V. Paxson, S. Shenker, Scaling phenomena in the internet, in Proceedings of the National Academy of Sciences (New York, 2002), pp. 2573–2580Google Scholar
  4. 4.
    J.J. Rehmeyer, Mapping a Medusa: the internet spreads its tentacles. Sci. News 171(25), 387–388 (2007).  https://doi.org/10.1002/scin.2007.5591712503CrossRefGoogle Scholar
  5. 5.
    M.E. Bates, D. Anderson, Free, fee-based and value-added information services Factiva, in The Factiva 2002 White Paper Series (Dow-Jones Reuters Business Interactive, LLC, 2002)Google Scholar
  6. 6.
    D. Hawking, N. Craswell, P. Bailey, K. Griffihs, Measuring search engine quality. Inf. Retrieval 4(1), 33–59 (2001) (Elsevier)CrossRefGoogle Scholar
  7. 7.
    T. Joachims, Optimizing search engines using clickthrough data, in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD02 (Edmonton, Alberta, Canada, 2002), pp. 133–142Google Scholar
  8. 8.
    D. Mukhopadhyay, S.R. Singh, Two Novel Methodologies for Searching the Web: Confidence Based and Hyperlink-Content Based. Haldia Institute of Technology, Department of Computer Science & Engineering Research Report (2003)Google Scholar
  9. 9.
    R. Baeza-Yates, C. Hurtado, M. Mendoza, G. Dupret, Modeling user search behavior, in Proceedings of the Third Latin American Web Congress, LA-WEB2005 (Buenos Aires, Argentina, 2005), pp. 242–251Google Scholar
  10. 10.
    O. Hoeber, Web information retrieval support systems: the future of Web search, in IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT08 (IEEE Computer Society, 2008), pp. 29–32Google Scholar
  11. 11.
    T.P.C. Silva, E.S. de Moura, J.M.B. Cavalcanti, A.S. da Silva, M.G. de Carvalho, M.A. Gonc-alves, An evolutionary approach for combining different sources of evidence in search engines. Inf. Syst. 34, 276–289 (2009) (Elsevier)CrossRefGoogle Scholar
  12. 12.
    J.L. Hong, E.G. Siew, S. Egerton, Information extraction for search engines using fast heuristic techniques. Data Knowl. Eng. 69, 169–196 (2010) (Elsevier)CrossRefGoogle Scholar
  13. 13.
    M. Zimmer, Web search studies: multidisciplinary perspectives on web search engines, in International Handbook of Internet Research (Springer, 2010), pp. 507–521Google Scholar
  14. 14.
    R. Ozcan, I.S. Altingovde, Ö. Ulusoy, Exploiting navigational queries for result presentation and caching in Web search engines. J. Am. Soc. Inform. Sci. Technol. 62(4), 714–726 (2011)CrossRefGoogle Scholar
  15. 15.
    B.B. Cambazoglu, I.S. Altingovde, R. Ozcan, O. Ulusoy, Cache-based query processing for search engines. ACM Trans. Web 6(4), 24 (2012) (Article 14).  https://doi.org/10.1145/2382616.2382617CrossRefGoogle Scholar
  16. 16.
    A. Papagelis, C. Zaroliagis, A collaborative decentralized approach to web search. IEEE Trans. Syst. Man Cybern. Part A: Syst. Humans 42(5), 1271–1290 (2012)CrossRefGoogle Scholar
  17. 17.
    E. Manica, C.F. Dorneles, R. Galante, Handling temporal information in Web search engines. SIGMOD Rec. 41(3), 15–23 (2012)CrossRefGoogle Scholar
  18. 18.
    D. Fuentes-Lorenzo, N. Fernández, J.A. Fisteus, L. Sánchez, Improving large-scale search engines with semantic annotations. Exp. Syst. Appl. 40, 2287–2296 (2013) (Elsevier)CrossRefGoogle Scholar
  19. 19.
    J.C. Prates, E. Fritzen, S.W.M. Siqueira, M.H.L.B. Braz, L.C.V. de Andrade, Contextual web searches in Facebook using learning materials and discussion messages. Comput. Hum. Behav. 29, 386–394 (2013) (Elsevier)CrossRefGoogle Scholar
  20. 20.
    J.B. Killoran, How to use search engine optimization techniques to increase Website visibility. IEEE Trans. Pers. Commun. 56(1), 50–66 (2013)CrossRefGoogle Scholar
  21. 21.
    H. Yan, J. Wang, X. Li, L. Gu, Architectural design and evaluation of an efficient Web-crawling system. J. Syst. Softw. 60(3), 185–193 (2002)CrossRefGoogle Scholar
  22. 22.
    J.Y. Yang, J.B. Kang, J.M. Choi, A focused crawler with document segmentation, in Intelligent Data Engineering and Automated Learning Ideal. Lecture Notes in Computer Science, vol. 3578 (2005), pp. 94–101Google Scholar
  23. 23.
    P. Srinivasan, F. Menczer, G. Pant, A general evaluation framework for topical crawlers. Inf. Retrieval 8(3), 417–447 (2005).  https://doi.org/10.1007/s10791-005-6993-5 (Elsevier)CrossRefGoogle Scholar
  24. 24.
    D. Mukhopadhyay, S. Mukherjee, S. Ghosh, S. Kar, Y. Kim, Architecture of a scalable dynamic parallel WebCrawler with high speed downloadable capability for a web search engine, in The 6th International Workshop MSPT 2006 Proceedings (Youngil Publication, Republic of Korea, 2006), pp. 103–108Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Web Intelligence and Distributed Computing Research Lab, Computer Engineering DepartmentNHITM of Mumbai UniversityKavesar, Thane (W)India
  2. 2.Wipro LimitedBrisbaneAustralia

Personalised recommendations