Skip to main content

Domain-Specific Crawler Design

  • Chapter
  • First Online:
Book cover Web Searching and Mining

Part of the book series: Cognitive Intelligence and Robotics ((CIR))

Abstract

Domain-specific crawler creates a domain-specific Web-page repository by collecting domain-specific resources from the Internet [1,2,3,4]. Domain-specific Web search engine basically searches domain-specific Web-pages from the domain-specific Web-page repository.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. T. Berners-Lee, M. Fischetti, Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor (HarperBusiness, New York, 1999)

    Google Scholar 

  2. B.M. Leiner, V.G. Cerf, D.D. Clark, R.E. Kahn, L. Kleinrock, D.C. Lynch, J. Postel, L.G. Roberts, S. Wolff, A brief history of internet. ACM Comput. Commun. 35(1), 22–31 (2009). https://doi.org/10.1145/1629607.1629613

    Article  Google Scholar 

  3. W. Willinger, R. Govindan, S. Jamin, V. Paxson, S. Shenker, Scaling phenomena in the internet, in Proceedings of the National Academy of Sciences (New York, 2002), pp. 2573–2580

    Google Scholar 

  4. J.J. Rehmeyer, Mapping a Medusa: the internet spreads its tentacles. Sci. News 171(25), 387–388 (2007). https://doi.org/10.1002/scin.2007.5591712503

    Article  Google Scholar 

  5. M.E. Bates, D. Anderson, Free, fee-based and value-added information services Factiva, in The Factiva 2002 White Paper Series (Dow-Jones Reuters Business Interactive, LLC, 2002)

    Google Scholar 

  6. D. Hawking, N. Craswell, P. Bailey, K. Griffihs, Measuring search engine quality. Inf. Retrieval 4(1), 33–59 (2001) (Elsevier)

    Article  Google Scholar 

  7. T. Joachims, Optimizing search engines using clickthrough data, in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD02 (Edmonton, Alberta, Canada, 2002), pp. 133–142

    Google Scholar 

  8. D. Mukhopadhyay, S.R. Singh, Two Novel Methodologies for Searching the Web: Confidence Based and Hyperlink-Content Based. Haldia Institute of Technology, Department of Computer Science & Engineering Research Report (2003)

    Google Scholar 

  9. R. Baeza-Yates, C. Hurtado, M. Mendoza, G. Dupret, Modeling user search behavior, in Proceedings of the Third Latin American Web Congress, LA-WEB2005 (Buenos Aires, Argentina, 2005), pp. 242–251

    Google Scholar 

  10. O. Hoeber, Web information retrieval support systems: the future of Web search, in IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT08 (IEEE Computer Society, 2008), pp. 29–32

    Google Scholar 

  11. T.P.C. Silva, E.S. de Moura, J.M.B. Cavalcanti, A.S. da Silva, M.G. de Carvalho, M.A. Gonc-alves, An evolutionary approach for combining different sources of evidence in search engines. Inf. Syst. 34, 276–289 (2009) (Elsevier)

    Article  Google Scholar 

  12. J.L. Hong, E.G. Siew, S. Egerton, Information extraction for search engines using fast heuristic techniques. Data Knowl. Eng. 69, 169–196 (2010) (Elsevier)

    Article  Google Scholar 

  13. M. Zimmer, Web search studies: multidisciplinary perspectives on web search engines, in International Handbook of Internet Research (Springer, 2010), pp. 507–521

    Google Scholar 

  14. R. Ozcan, I.S. Altingovde, Ö. Ulusoy, Exploiting navigational queries for result presentation and caching in Web search engines. J. Am. Soc. Inform. Sci. Technol. 62(4), 714–726 (2011)

    Article  Google Scholar 

  15. B.B. Cambazoglu, I.S. Altingovde, R. Ozcan, O. Ulusoy, Cache-based query processing for search engines. ACM Trans. Web 6(4), 24 (2012) (Article 14). https://doi.org/10.1145/2382616.2382617

    Article  Google Scholar 

  16. A. Papagelis, C. Zaroliagis, A collaborative decentralized approach to web search. IEEE Trans. Syst. Man Cybern. Part A: Syst. Humans 42(5), 1271–1290 (2012)

    Article  Google Scholar 

  17. E. Manica, C.F. Dorneles, R. Galante, Handling temporal information in Web search engines. SIGMOD Rec. 41(3), 15–23 (2012)

    Article  Google Scholar 

  18. D. Fuentes-Lorenzo, N. Fernández, J.A. Fisteus, L. Sánchez, Improving large-scale search engines with semantic annotations. Exp. Syst. Appl. 40, 2287–2296 (2013) (Elsevier)

    Article  Google Scholar 

  19. J.C. Prates, E. Fritzen, S.W.M. Siqueira, M.H.L.B. Braz, L.C.V. de Andrade, Contextual web searches in Facebook using learning materials and discussion messages. Comput. Hum. Behav. 29, 386–394 (2013) (Elsevier)

    Article  Google Scholar 

  20. J.B. Killoran, How to use search engine optimization techniques to increase Website visibility. IEEE Trans. Pers. Commun. 56(1), 50–66 (2013)

    Article  Google Scholar 

  21. H. Yan, J. Wang, X. Li, L. Gu, Architectural design and evaluation of an efficient Web-crawling system. J. Syst. Softw. 60(3), 185–193 (2002)

    Article  Google Scholar 

  22. J.Y. Yang, J.B. Kang, J.M. Choi, A focused crawler with document segmentation, in Intelligent Data Engineering and Automated Learning Ideal. Lecture Notes in Computer Science, vol. 3578 (2005), pp. 94–101

    Google Scholar 

  23. P. Srinivasan, F. Menczer, G. Pant, A general evaluation framework for topical crawlers. Inf. Retrieval 8(3), 417–447 (2005). https://doi.org/10.1007/s10791-005-6993-5 (Elsevier)

    Article  Google Scholar 

  24. D. Mukhopadhyay, S. Mukherjee, S. Ghosh, S. Kar, Y. Kim, Architecture of a scalable dynamic parallel WebCrawler with high speed downloadable capability for a web search engine, in The 6th International Workshop MSPT 2006 Proceedings (Youngil Publication, Republic of Korea, 2006), pp. 103–108

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debajyoti Mukhopadhyay .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mukhopadhyay, D., Sinha, S. (2019). Domain-Specific Crawler Design. In: Mukhopadhyay, D. (eds) Web Searching and Mining. Cognitive Intelligence and Robotics. Springer, Singapore. https://doi.org/10.1007/978-981-13-3053-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-3053-7_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-3052-0

  • Online ISBN: 978-981-13-3053-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics