Method of Deep Web Collection for Mobile Application Store Based on Category Keyword Searching
With the rapid development of mobile Internet, mobile Internet has come into the era of big data. The demand for data analysis of mobile applications has become more and more obvious, which puts forward higher requirements for the standard of mobile application information collection. Due to the large number of applications, almost all third-party app stores display only a small number of applications, and most of the information is hidden in the Deep Web database behind the query form. The existing crawler strategy cannot meet the demand. In order to solve the above problems, this paper proposes a collection method based on category keywords query to improve the crawl rate and integrity of the mobile app stores information collection. Firstly, get the information of application interfaces that include various kinds of applications by using the vertical crawler. Then extract the keywords that represent each category of applications by TF-IDF algorithm from the application name and description information. Finally, incremental crawling is performed by using keyword query-based acquisition method. Results show that this collection method effectively promoted information integrity and acquisition efficiency.
KeywordsDeep Web TF-IDF algorithm Incremental crawling
This research is supported by National Key R&D Program of China (No. 2018YFC0806900), Beijing Engineering Laboratory For security emulation & Hacking and Defense of IoV; This research is supported by National Secrecy Scientific Research Program of China (No. BMKY2018802-1) too.
- 1.iiMedia Research. http://www.iimedia.cn/c400/47250.html. Accessed 23 Dec 2016
- 2.Navigli, R., Velardi, P.: An analysis of ontology-based query expansion strategies. In: Proceedings of the 14th European Conference on Machine Learning, Croatia, pp. 42–49 (2003)Google Scholar
- 7.Mahale, V.V., Dhande, M.T., Pandit, A.V.: Advanced web crawler for deep web interface using binary vector & page rank. In: 2nd International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 30–31 August 2018Google Scholar
- 8.Brightplanet. https://brightplanet.com/2013/03/whitepaper-understanding-the-deep-web-in-10-minutes. Accessed 12 Mar 2013
- 12.Ntoulas, A., Zerfos, P., Cho, J.: Downloading textual hidden web content through key-word queries. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 100–109. ACM (2005)Google Scholar
- 13.Zifei, D.: Design and Implementation of an Ajax Supported Deep Web Crawler Sys-tem. South China University of Technology, Guangdong (2015)Google Scholar