Skip to main content

Applied Methods and Techniques for Modeling and Control on Micro-Blog Data Crawler

  • Chapter
  • First Online:
Book cover Applied Methods and Techniques for Mechatronic Systems

Part of the book series: Lecture Notes in Control and Information Sciences ((LNCIS,volume 452))

Abstract

Models can provide mechanisms to improve system performance. This chapter presents the applied methods and techniques for modeling and controlling on micro-blog crawler. With the rapid development of social studies and social network, millions of people present or comment or share their opinions on the platform everyday, and as a result, produce or spread their opinions and sentiments on different topics. The microblog has been an effective platform to know or mine social opinions. In order to do so, crawling the relevant microblog data is necessary. But it is hard for a traditional web crawler to crawl micro-blog data as usual, as by using Web 2.0 techniques such as AJAX, the micro-blog data is dynamically generated rapidly. As most microblogs’ official platforms cannot offer some suitable tools or RPC interface to collect the big data effectively and efficiently, we present an algorithm on modeling and controlling on micro-blog data crawler based on simulating browsers’ behaviors. This needs to analyze the simulated browsers’ behaviors in order to obtain the requesting URLs to simulate and parse and analyze the sending URL requests according to the order of data sequence. The experimental results and the analysis show the feasibility of the approach. Further works are also presented at the end.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: 19th international conference on world wide web. ACM Press, USA, pp 591–600

    Google Scholar 

  2. Weng J, Lim EP, Jiang J, He Q (2010) TwitterRank: finding topic-sensitive influential twitterers. In: 3rd international conference on web search and web data mining. ACM Press, USA, pp 261–270

    Google Scholar 

  3. Cristian DNM, Lee L, Bo P, Kleinberg J (2012) Echoes of power: language effects and power differences in social interaction. In: 21th international conference on world wide web. ACM Press, France, pp 699–708

    Google Scholar 

  4. Wu S, Hofman JM, Mason WA, Watts DJ (2011) Who says what to whom on Twitter. In: 20th international conference on the world wide web. ACM Press, India, pp 705–714

    Google Scholar 

  5. Abel F, Gao Q, Houben GJ, Tao K (2011) Analyzing user modeling on Twitter for personalized news recommendations. In: International conference on user modeling, adaptation and personalization. LNCS, vol 6787. Springer, Spain, pp 1–12

    Google Scholar 

  6. Chen J, Nairn R, Nelson L, Bernstein M, Chi E (2010) Short and tweet: experiments on recommending content from information streams. In: 28th international conference on human factors in computing systems. ACM Press, USA, pp 1185–1194

    Google Scholar 

  7. Bakshy E, Hofman JM, Mason WA, Watts DJ (2011) Everyone’s an influencer: quantifying influence on Twitter. In: 3rd international conference on web search and data mining. ACM Press, Hong Kong, pp 65–74

    Google Scholar 

  8. Bakshy E, Rosenn I, Marlow C, Marlow C (2012) The role of social networks in information diffusion. In: International conference on world wide web. ACM Press, France, pp 519–528

    Google Scholar 

  9. Sachan M, Contractor D, Tanveer AF, Subramaniam LV (2012) Using content and interactions for discovering communities in social networks. In: International conference on world wide web. ACM Press, France, pp 331–340

    Google Scholar 

  10. Dan C, Shipman FM (2009) Capturing on-line social network link dynamics using event-driven sampling. In: International conference on computational science and engineering, vol 4. Vancouver, Canada, pp 284–291

    Google Scholar 

  11. Goyal A, Bonchi F, Lakshmanan LV (2010) Learning influence probabilities in social networks. In: 3th international conference on web search and data mining. ACM Press, USA, pp 241–250

    Google Scholar 

  12. Agarwal A, Durgesh S, Pandey AKA, Goel V (2012) Design of a parallel migrating web crawler. J Adv Res Comput Sci Softw Eng 2(4):147–153

    Google Scholar 

  13. Kim KS, Kim KY, Lee KH, Kim TK, Cho WS (2012) Design and implementation of web crawler based on dynamic web collection cycle. In: International conference on information networking (ICOIN). Bali, Indonesia, pp 562–566

    Google Scholar 

  14. Chandramouli A, Gauch S, Eno J (2012) A cooperative approach to web crawler URL ordering, human–computer systems interaction: backgrounds and applications. J Adv Intell Soft Comput 98:343–357

    Article  Google Scholar 

  15. Lu G, Liu S, Lü K (2013) MBCrawler: a software architecture for micro-blog crawler. In: International conference on information technology and software engineering. Lecture Notes in Electrical Engineering, vol 212. Springer, Berlin, Heidelberg, pp 119–127

    Google Scholar 

  16. Gao K, Li SW (2010) The cooperation model for multi agents and the identification on replicated collections for web crawler. Int J Model Identif Control 11(3–4):224–231

    Article  Google Scholar 

  17. Garg A, Tai K (2013) Comparison of statistical and machine learning methods in modelling of data with multicollinearity. Int J Model Identif Control 18(4):295–312

    Article  Google Scholar 

  18. Han G, Zhu H, Ge J (2013) Effective search space reduction for human pose estimation with Viterbi recurrence algorithm. Int J Model Identif Control 18(4):341–348

    Article  Google Scholar 

  19. Singh S, Mittal P, Kahlon KS (2013) Empirical model for predicting high, medium and low severity faults using object oriented metrics in Mozilla Firefox. Int J Comput Appl Technol 47(2/3):110–124

    Article  Google Scholar 

  20. HttpWatch: Introduction to HttpWatch 8.x (2013). http://help.httpwatch.com/#introduction.html

  21. Ajax: Introduction to Ajax (2013). http://api.jquery.com/category/ajax/

  22. Json: Introduction to Json (2013). http://www.json.org/index.html

Download references

Acknowledgments

Some earlier works were done in Beijing Institute of Technology with the help of Dr. Hua-ping Zhang and Prof. Yin-ping Zhao. This work is sponsored by the National Science Foundation of Hebei Province (No. F2013208105) and the National Science Foundation of China (No. 61272362). It is also sponsored by Hebei Province Scientific and Technical Key Task (No. 12213516D).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Gao, K., Zhou, EL., Grover, S. (2014). Applied Methods and Techniques for Modeling and Control on Micro-Blog Data Crawler. In: Liu, L., Zhu, Q., Cheng, L., Wang, Y., Zhao, D. (eds) Applied Methods and Techniques for Mechatronic Systems. Lecture Notes in Control and Information Sciences, vol 452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36385-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36385-6_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36384-9

  • Online ISBN: 978-3-642-36385-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics