Domain-Specific Crawler Design

Mukhopadhyay, Debajyoti; Sinha, Sukanta

doi:10.1007/978-981-13-3053-7_7

Debajyoti Mukhopadhyay⁴ &
Sukanta Sinha⁵

Part of the book series: Cognitive Intelligence and Robotics ((CIR))

789 Accesses
1 Citations

Abstract

Domain-specific crawler creates a domain-specific Web-page repository by collecting domain-specific resources from the Internet [1,2,3,4]. Domain-specific Web search engine basically searches domain-specific Web-pages from the domain-specific Web-page repository.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

T. Berners-Lee, M. Fischetti, Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor (HarperBusiness, New York, 1999)
Google Scholar
B.M. Leiner, V.G. Cerf, D.D. Clark, R.E. Kahn, L. Kleinrock, D.C. Lynch, J. Postel, L.G. Roberts, S. Wolff, A brief history of internet. ACM Comput. Commun. 35(1), 22–31 (2009). https://doi.org/10.1145/1629607.1629613
Article Google Scholar
W. Willinger, R. Govindan, S. Jamin, V. Paxson, S. Shenker, Scaling phenomena in the internet, in Proceedings of the National Academy of Sciences (New York, 2002), pp. 2573–2580
Google Scholar
J.J. Rehmeyer, Mapping a Medusa: the internet spreads its tentacles. Sci. News 171(25), 387–388 (2007). https://doi.org/10.1002/scin.2007.5591712503
Article Google Scholar
M.E. Bates, D. Anderson, Free, fee-based and value-added information services Factiva, in The Factiva 2002 White Paper Series (Dow-Jones Reuters Business Interactive, LLC, 2002)
Google Scholar
D. Hawking, N. Craswell, P. Bailey, K. Griffihs, Measuring search engine quality. Inf. Retrieval 4(1), 33–59 (2001) (Elsevier)
Article Google Scholar
T. Joachims, Optimizing search engines using clickthrough data, in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’02 (Edmonton, Alberta, Canada, 2002), pp. 133–142
Google Scholar
D. Mukhopadhyay, S.R. Singh, Two Novel Methodologies for Searching the Web: Confidence Based and Hyperlink-Content Based. Haldia Institute of Technology, Department of Computer Science & Engineering Research Report (2003)
Google Scholar
R. Baeza-Yates, C. Hurtado, M. Mendoza, G. Dupret, Modeling user search behavior, in Proceedings of the Third Latin American Web Congress, LA-WEB’2005 (Buenos Aires, Argentina, 2005), pp. 242–251
Google Scholar
O. Hoeber, Web information retrieval support systems: the future of Web search, in IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT’08 (IEEE Computer Society, 2008), pp. 29–32
Google Scholar
T.P.C. Silva, E.S. de Moura, J.M.B. Cavalcanti, A.S. da Silva, M.G. de Carvalho, M.A. Gonc-alves, An evolutionary approach for combining different sources of evidence in search engines. Inf. Syst. 34, 276–289 (2009) (Elsevier)
Article Google Scholar
J.L. Hong, E.G. Siew, S. Egerton, Information extraction for search engines using fast heuristic techniques. Data Knowl. Eng. 69, 169–196 (2010) (Elsevier)
Article Google Scholar
M. Zimmer, Web search studies: multidisciplinary perspectives on web search engines, in International Handbook of Internet Research (Springer, 2010), pp. 507–521
Google Scholar
R. Ozcan, I.S. Altingovde, Ö. Ulusoy, Exploiting navigational queries for result presentation and caching in Web search engines. J. Am. Soc. Inform. Sci. Technol. 62(4), 714–726 (2011)
Article Google Scholar
B.B. Cambazoglu, I.S. Altingovde, R. Ozcan, O. Ulusoy, Cache-based query processing for search engines. ACM Trans. Web 6(4), 24 (2012) (Article 14). https://doi.org/10.1145/2382616.2382617
Article Google Scholar
A. Papagelis, C. Zaroliagis, A collaborative decentralized approach to web search. IEEE Trans. Syst. Man Cybern. Part A: Syst. Humans 42(5), 1271–1290 (2012)
Article Google Scholar
E. Manica, C.F. Dorneles, R. Galante, Handling temporal information in Web search engines. SIGMOD Rec. 41(3), 15–23 (2012)
Article Google Scholar
D. Fuentes-Lorenzo, N. Fernández, J.A. Fisteus, L. Sánchez, Improving large-scale search engines with semantic annotations. Exp. Syst. Appl. 40, 2287–2296 (2013) (Elsevier)
Article Google Scholar
J.C. Prates, E. Fritzen, S.W.M. Siqueira, M.H.L.B. Braz, L.C.V. de Andrade, Contextual web searches in Facebook using learning materials and discussion messages. Comput. Hum. Behav. 29, 386–394 (2013) (Elsevier)
Article Google Scholar
J.B. Killoran, How to use search engine optimization techniques to increase Website visibility. IEEE Trans. Pers. Commun. 56(1), 50–66 (2013)
Article Google Scholar
H. Yan, J. Wang, X. Li, L. Gu, Architectural design and evaluation of an efficient Web-crawling system. J. Syst. Softw. 60(3), 185–193 (2002)
Article Google Scholar
J.Y. Yang, J.B. Kang, J.M. Choi, A focused crawler with document segmentation, in Intelligent Data Engineering and Automated Learning Ideal. Lecture Notes in Computer Science, vol. 3578 (2005), pp. 94–101
Google Scholar
P. Srinivasan, F. Menczer, G. Pant, A general evaluation framework for topical crawlers. Inf. Retrieval 8(3), 417–447 (2005). https://doi.org/10.1007/s10791-005-6993-5 (Elsevier)
Article Google Scholar
D. Mukhopadhyay, S. Mukherjee, S. Ghosh, S. Kar, Y. Kim, Architecture of a scalable dynamic parallel WebCrawler with high speed downloadable capability for a web search engine, in The 6th International Workshop MSPT 2006 Proceedings (Youngil Publication, Republic of Korea, 2006), pp. 103–108
Google Scholar

Download references

Author information

Authors and Affiliations

Web Intelligence and Distributed Computing Research Lab, Computer Engineering Department, NHITM of Mumbai University, Kavesar, Thane (W), 400615, India
Debajyoti Mukhopadhyay
Wipro Limited, Brisbane, Australia
Sukanta Sinha

Authors

Debajyoti Mukhopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Sukanta Sinha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debajyoti Mukhopadhyay .

Editor information

Editors and Affiliations

NHITM, University of Mumbai, Thane (West), Maharashtra, India
Debajyoti Mukhopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mukhopadhyay, D., Sinha, S. (2019). Domain-Specific Crawler Design. In: Mukhopadhyay, D. (eds) Web Searching and Mining. Cognitive Intelligence and Robotics. Springer, Singapore. https://doi.org/10.1007/978-981-13-3053-7_7

Download citation

DOI: https://doi.org/10.1007/978-981-13-3053-7_7
Published: 13 December 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3052-0
Online ISBN: 978-981-13-3053-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics