Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1006 ))

Included in the following conference series:

Abstract

Almost everything is stored on the Internet nowadays, and relying data on the Internet has become usual over the last years, directly increasing the value of data retrieval. Via Internet, data scientist can now find a way to access all the available data that is stored on the Internet, so they can turn that data into useful information. As people rely a lot of data on the Internet, they sometimes ignore the fact that all that data can be easily extracted, even when people think their information is safe or unavailable. In this article, we propose a system in where some data extraction techniques are going to be analysed in order to have an overview of the amount of data of a person that can be extracted from the Internet, and how that data is turned into information with an additional value in order to make data useful. The proposed system is going to be capable of retrieving huge loads of data from a person and process it using Artificial Intelligence, in order to classify its content to generate a personal profile containing all the information once its analysed. This research is based on personal profile generation of people from Spain, but it could be implemented for any other country. The proposed system has been implemented and tested on different people, and the results were quite satisfactory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Olston, C., Najork, M., et al.: Web crawling. Found. Trends® Inf. Retrieval 4(3), 175–246 (2010)

    Article  Google Scholar 

  2. Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., Kochut, K.: A brief survey of text mining: classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919 (2017)

  3. Moreno, A., Redondo, T.: Text analytics: the convergence of big data and artificial intelligence. IJIMAI 3(6), 57–64 (2016)

    Article  Google Scholar 

  4. Bahrami, M., Singhal, M., Zhuang, Z.: A cloud-based web crawler architecture. In: 2015 18th International Conference on Intelligence in Next Generation Networks, pp. 216–223. IEEE (2015)

    Google Scholar 

  5. Jose, B., Abraham, S.: Exploring the merits of NoSQL: a study based on MongoDB. In: 2017 International Conference on Networks & Advances in Computational Technologies (NetACT), pp. 266–271. IEEE (2017)

    Google Scholar 

  6. Sun, S., Gong, J., Zomaya, A.Y., Wu, A.: A distributed incremental information acquisition model for large-scale text data. Cluster Comput. 20, 1–12 (2017)

    Article  Google Scholar 

  7. Roy, D., Ganguly, D., Mitra, M., Jones, G.J.F.: Representing documents and queries as sets of word embedded vectors for information retrieval. arXiv preprint arXiv:1606.07869 (2016)

  8. Ali, N., Bajwa, K.B., Sablatnig, R., Mehmood, Z.: Image retrieval by addition of spatial information based on histograms of triangular regions. Comput. Electr. Eng. 54, 539–550 (2016)

    Article  Google Scholar 

  9. Rivas, A., Martín, L., Sittón, I., Chamoso, P., Martín-Limorti, J.J., Prieto, J., González-Briones, A.: Semantic analysis system for industry 4.0. In: International Conference on Knowledge Management in Organizations, pp. 537–548. Springer (2018)

    Google Scholar 

  10. Binkheder, S., Wu, H.-Y., Quinney, S., Li, L.: Analyzing patterns of literature-based phenotyping definitions for text mining applications. In: 2018 IEEE International Conference on Healthcare Informatics (ICHI), pp. 374–376. IEEE (2018)

    Google Scholar 

  11. Shah, J.H., Sharif, M., Yasmin, M., Fernandes, S.L.: Facial expressions classification and false label reduction using LDA and threefold SVM. Pattern Recogn. Lett. (2017)

    Google Scholar 

  12. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 886–893. IEEE (2005)

    Google Scholar 

  13. Kasar, M.M., Bhattacharyya, D., Kim, T.H.: Face recognition using neural network: a review. Int. J. Secur. Appl. 10(3), 81–100 (2016)

    Google Scholar 

  14. Amos, B., Ludwiczuk, B., Satyanarayanan, M., et al.: OpenFace: a general-purpose face recognition library with mobile applications. CMU School of Computer Science (2016)

    Google Scholar 

Download references

Acknowledgements

This research has been partially supported by the European Regional Development Fund (FEDER) within the framework of the Interreg program V-A Spain-Portugal 2014–2020 (PocTep) under the IOTEC project grant 0123 IOTEC 3 E.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Álvaro Bartolomé .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bartolomé, Á., García-Retuerta, D., Pinto-Santos, F., Chamoso, P. (2020). Internet Data Extraction and Analysis for Profile Generation. In: Novais, P., Lloret, J., Chamoso, P., Carneiro, D., Navarro, E., Omatu, S. (eds) Ambient Intelligence – Software and Applications –,10th International Symposium on Ambient Intelligence. ISAmI 2019. Advances in Intelligent Systems and Computing, vol 1006 . Springer, Cham. https://doi.org/10.1007/978-3-030-24097-4_14

Download citation

Publish with us

Policies and ethics