Abstract
Almost everything is stored on the Internet nowadays, and relying data on the Internet has become usual over the last years, directly increasing the value of data retrieval. Via Internet, data scientist can now find a way to access all the available data that is stored on the Internet, so they can turn that data into useful information. As people rely a lot of data on the Internet, they sometimes ignore the fact that all that data can be easily extracted, even when people think their information is safe or unavailable. In this article, we propose a system in where some data extraction techniques are going to be analysed in order to have an overview of the amount of data of a person that can be extracted from the Internet, and how that data is turned into information with an additional value in order to make data useful. The proposed system is going to be capable of retrieving huge loads of data from a person and process it using Artificial Intelligence, in order to classify its content to generate a personal profile containing all the information once its analysed. This research is based on personal profile generation of people from Spain, but it could be implemented for any other country. The proposed system has been implemented and tested on different people, and the results were quite satisfactory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Olston, C., Najork, M., et al.: Web crawling. Found. Trends® Inf. Retrieval 4(3), 175–246 (2010)
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., Kochut, K.: A brief survey of text mining: classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919 (2017)
Moreno, A., Redondo, T.: Text analytics: the convergence of big data and artificial intelligence. IJIMAI 3(6), 57–64 (2016)
Bahrami, M., Singhal, M., Zhuang, Z.: A cloud-based web crawler architecture. In: 2015 18th International Conference on Intelligence in Next Generation Networks, pp. 216–223. IEEE (2015)
Jose, B., Abraham, S.: Exploring the merits of NoSQL: a study based on MongoDB. In: 2017 International Conference on Networks & Advances in Computational Technologies (NetACT), pp. 266–271. IEEE (2017)
Sun, S., Gong, J., Zomaya, A.Y., Wu, A.: A distributed incremental information acquisition model for large-scale text data. Cluster Comput. 20, 1–12 (2017)
Roy, D., Ganguly, D., Mitra, M., Jones, G.J.F.: Representing documents and queries as sets of word embedded vectors for information retrieval. arXiv preprint arXiv:1606.07869 (2016)
Ali, N., Bajwa, K.B., Sablatnig, R., Mehmood, Z.: Image retrieval by addition of spatial information based on histograms of triangular regions. Comput. Electr. Eng. 54, 539–550 (2016)
Rivas, A., Martín, L., Sittón, I., Chamoso, P., Martín-Limorti, J.J., Prieto, J., González-Briones, A.: Semantic analysis system for industry 4.0. In: International Conference on Knowledge Management in Organizations, pp. 537–548. Springer (2018)
Binkheder, S., Wu, H.-Y., Quinney, S., Li, L.: Analyzing patterns of literature-based phenotyping definitions for text mining applications. In: 2018 IEEE International Conference on Healthcare Informatics (ICHI), pp. 374–376. IEEE (2018)
Shah, J.H., Sharif, M., Yasmin, M., Fernandes, S.L.: Facial expressions classification and false label reduction using LDA and threefold SVM. Pattern Recogn. Lett. (2017)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)
Kasar, M.M., Bhattacharyya, D., Kim, T.H.: Face recognition using neural network: a review. Int. J. Secur. Appl. 10(3), 81–100 (2016)
Amos, B., Ludwiczuk, B., Satyanarayanan, M., et al.: OpenFace: a general-purpose face recognition library with mobile applications. CMU School of Computer Science (2016)
Acknowledgements
This research has been partially supported by the European Regional Development Fund (FEDER) within the framework of the Interreg program V-A Spain-Portugal 2014–2020 (PocTep) under the IOTEC project grant 0123 IOTEC 3 E.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Bartolomé, Á., García-Retuerta, D., Pinto-Santos, F., Chamoso, P. (2020). Internet Data Extraction and Analysis for Profile Generation. In: Novais, P., Lloret, J., Chamoso, P., Carneiro, D., Navarro, E., Omatu, S. (eds) Ambient Intelligence – Software and Applications –,10th International Symposium on Ambient Intelligence. ISAmI 2019. Advances in Intelligent Systems and Computing, vol 1006 . Springer, Cham. https://doi.org/10.1007/978-3-030-24097-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-24097-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24096-7
Online ISBN: 978-3-030-24097-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)