Abstract
The Internet has established itself as the universal data and information infrastructure. It is used not only for providing and retrieving data in a diverse range of application domains involving human and machine actors, but also as a distributed processing platform. We investigate Internet data and information with a number of technological questions in mind: In which format should data be represented? Where and how should it be stored? How can data be managed? How can the data relevant to a specific need be found in the vast space of the Internet? How can it be accessed by human and machine users, and how can it be processed into information? We cover the Web including the processing of textual, user-generated and Linked Data, as well as the Internet as a processing platform with data exchange between services and for handling Big Data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Google. https://www.google.com/. Accessed: 2016-09-12.
- 2.
Facebook. https://www.facebook.com/. Accessed: 2016-09-12.
- 3.
Amazon. https://www.amazon.com/. Accessed: 2016-09-12.
- 4.
Please note that we follow the widely used distinction between data and information, where information is structured and interpreted data (Manning et al. 2008).
- 5.
In a very strict sense unstructured data should rather be called semi-structured because some structure is inevitably present, e.g., natural language text follows the language’s inherent grammatical rules (Manning et al. 2008).
- 6.
CERN. https://home.cern/. Accessed: 2016-09-12.
- 7.
W3C. https://www.w3.org/. Accessed: 2016-09-12.
- 8.
Apache Tomcat. http://tomcat.apache.org/. Accessed: 2016-09-12.
- 9.
Bing. https://www.bing.com/. Accessed: 2016-09-12.
- 10.
Wikipedia. https://en.wikipedia.org/. Accessed: 2016-09-12.
- 11.
Text REtrieval Conference (TREC). http://trec.nist.gov/. Accessed: 2016-09-12.
- 12.
WordPress. https://wordpress.com/. Accessed: 2016-09-12.
- 13.
IMDb. http://www.imdb.com/. Accessed: 2016-09-12.
- 14.
Stack Overflow. http://stackoverflow.com/. Accessed: 2016-09-12.
- 15.
Twitter. https://twitter.com/. Accessed: 2016-09-12.
- 16.
Google Translate. https://translate.google.com/. Accessed: 2016-09-12.
- 17.
WolframAlpha. https://www.wolframalpha.com/. Accessed: 2016-09-12.
- 18.
Siri. http://www.apple.com/ios/siri/. Accessed: 2016-09-12.
- 19.
Cortana. https://www.microsoft.com/en-us/windows/cortana. Accessed: 2016-09-12.
- 20.
Hadoop. https://hadoop.apache.org/. Accessed: 2016-09-12.
- 21.
Spark. http://spark.apache.org/. Accessed: 2016-09-12.
- 22.
Open Data, USA. https://www.data.gov/. Accessed: 2016-09-12.
- 23.
Open Data, EU. http://open-data.europa.eu. Accessed: 2016-09-12.
- 24.
Open Data, UK. https://data.gov.uk. Accessed: 2016-09-12.
- 25.
Open Data, Switzerland. https://opendata.swiss/. Accessed: 2016-09-12.
- 26.
W3C, semantic Web. https://www.w3.org/standards/semanticweb/ontology. Accessed: 2016-09-12.
- 27.
Protege Editor. http://protege.stanford.edu. Accessed: 2016-09-12.
References
Agichtein E, Castillo C, Donato D, Gionis A, Mishne G (2008) Finding high-quality content in social media. In: Proceedings of the 2008 International conference on web search and data mining (WSDM’08). ACM, New York, pp 183–194
Agichtein E, Carmel D, Harman D, Pelleg D, Pinter Y (2015) Overview of the TREC 2015 LiveQA track. In: Proceedings of the twenty-fourth text retrieval conference (TREC’15). National Institute of Standards and Technology (NIST). Available via http://trec.nist.gov/pubs/trec24/trec2015.html
Armstrong A, Hagel J (2000) The real value of online communities. Knowl Commun 74(3):85–95
Auer S, Bühmann L, Dirschl C, Erling O, Hausenblas M, Isele R, Lehmann J, Martin M, Mendes PN, van Nuffelen B (2012) Managing the life-cycle of linked data with the LOD2 stack. In: Proceedings of the 11th International semantic web conference. Springer, Berlin, pp 1–16
Baeza-Yates R, Hurtado C, Mendoza M (2004) Query recommendation using query logs in search engines. In: Lindner W, Mesiti M, Türker C, Tzitzikas Y, Vakali AI (eds) Current trends in database technology–EDBT 2004 workshops. Springer, Berlin/Heidelberg, pp 588–596
Barrett DJ (2008) MediaWiki (Wikipedia and Beyond). O’Reilly Media, Farnham
Barth A (2011) HTTP State Management Mechanism (Internet Request For Comments, IETF, RFC-6265). Available via https://tools.ietf.org/html/rfc6265
Bauer F, Kaltenböck M (2011) Linked open data: the essentials. Edition mono/monochrom, Vienna
Berners-Lee T (1989) Information management: a proposal. (Historical document written in March 1989 by Tim Berners-Lee). Available via https://www.w3.org/History/1989/proposal.html
Berners-Lee T, Hendler J, Lassila O (2001) The semantic Web. Sci Am 284(5):28–37
Berners-Lee T, Chen Y, Chilton L, Connolly D, Dhanaraj R, Hollenbach J, Lerer A, Sheets D (2006) Tabulator: exploring and analyzing linked data on the semantic Web. In: Proceedings of the 3rd International semantic web user interaction workshop, Athens, p 159
Berners-Lee T, Fielding RT, Masinter L (2015) Uniform Resource Identifier (URI): Generic Syntax (Internet Request For Comments, IETF, RFC-398). Available via https://tools.ietf.org/html/rfc3986
Bing L (2015) Sentiment analysis: mining opinions, sentiments, and emotions. Cambridge University Press, Cambridge
Bobadilla J, Ortega F, Hernando A, Gutiérrez A (2013) Recommender systems survey. Knowl Bas Syst 46:109–132
Bojar O, Buck C, Federmann C, Haddow B, Koehn P, (8 additional authors not shown) (2014) Findings of the 2014 workshop on statistical machine translation. In: Proceedings of the 9th workshop on statistical machine translation. Association for Computational Linguistics, Baltimore, pp 12–58
Bonchi F, Castillo C, Gionis A, Jaimes A (2011) Social network analysis and mining for business applications. ACM Trans Intell Syst Tech 2(3):22:1–22:37
Bray T (2014) The JavaScript Object Notation (JSON) Data Interchange Format (Internet Request For Comments, IETF, RFC-7159). Available via https://tools.ietf.org/html/rfc7159
Broder A (2002) A taxonomy of web search. SIGIR Forum 36(2):3–10
Brooks DR (2011) Guide to HTML, JavaScript and PHP. Springer, London
Bruza PD, Dennis S (1997) Query reformulation on the Internet: empirical data and the hyperindex search engine. In: Proceeding of computer-assisted information searching on Internet (RIAO’97). Le Centre De Hautes Etudes Internationale D’Informatique Documentaire, pp 488–499
Brynjolfsson E, Hu YJ, Smith MD (2006) From niches to riches: anatomy of the long tail. Sloan Manag Rev 47(4):67–71
Burner M (1997) Crawling towards eternity: building an archive of the World Wide Web. Web Tech Mag 2(5):37–40
Ceri S, Bozzon A, Brambilla M, Valle ED, Fraternali P, Quarteroni S (2013) Web information retrieval. Data-centric systems and applications. Springer, Heidelberg
Coroama V, Langheinrich M (2006) Personalized vehicle insurance rates–a case for client-side personalization in ubiquitous computing. In: Proceedings of human factors in computing systems. Workshop on privacy-enhanced personalization (CHI’06). ACM, pp 56–59
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Eckersley P (2010) How unique is your web browser? In: Proceedings of the 10th International conference on privacy enhancing technologies (PETS’10). Springer, Berlin/Heidelberg, pp 1–18
Elmasri R, Navathe SB (2016) Fundamentals of database systems. Pearson Education, Hoboken
Erl T, Puttini R, Mahmood Z (2013) Cloud computing: concepts, technology & architecture. Prentice Hall Press, Upper Saddle River
Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4): 82–89
Fensel D, Facca FM, Simperl E, Toma I (2011) Semantic web services. Springer, Heidelberg/ New York
Fielding RT, Reschke JF (2014) Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing (Internet Request For Comments, IETF, RFC-7230). Available via https://tools.ietf.org/html/rfc7230
Flynn P (2016) The XML FAQ. Silmaril, Edition v.6.4. Available via http://xml.silmaril.ie/
Forte A, Bruckman A (2005) Why do people write for Wikipedia? Incentives to contribute to open–content publishing. In: Proceedings of GROUP – International conference on supporting group work, vol 5, pp 6–9
Frischmuth P, Klímek J, Auer S, Tramp S, Unbehauen J, Holzweissig K, Marquardt CM (2012) Linked data in enterprise information integration. Semantic Web. IOS Press, Amsterdam, NL, pp 1–17
Garcia-Gomez S, Escriche-Vicente M, Arozarena-Llopis P, Lelli F, Taher Y, (12 additional authors not shown) (2012) 4CaaSt: comprehensive management of cloud services through a PaaS. In: Proceedings of the 2012 IEEE 10th International symposium on parallel and distributed processing with applications, Washington, DC, pp 494–499
Guy I (2015) Social recommender systems. In: Ricci F, Rokach L, Shapira B (eds) Recommender systems handbook. Springer, New York, pp 511–543
Halfaker A, Geiger RS, Morgan JT, Riedl J (2012) The rise and decline of an open collaboration system: how Wikipedia’s reaction to popularity is causing its decline. Am Behav Sci 57(5): 664–688
Herlocker JL, Konstan JA, Terveen LG, Riedl JT (2004) Evaluating collaborative filtering recommender systems. ACM Trans Inf Syst 22(1):5–53
Hitzler P, Krotzsch M, Rudolph S (2009) Foundations of semantic web technologies. Chapman and Hall/CRC, Boca Raton
ITU (2016) ICT Facts and Figures 2016. Estimates for key telecommunication/ICT indicators. Available via http://www.itu.int/en/ITU-D/Statistics/Documents/facts/ICTFactsFigures2016.pdf
Joshua J, Talha K, Ricardo Z (2016) Web developer’s reference guide. Packt Publishing, Birmingham
Jurafsky D, Martin JH (2009) Speech and language processing. Prentice Hall, Upper Saddle River
Lawrence DB (2012) The economic value of information. Springer, New York
Lecue F, Mehandjiev N, Vogel J, Un P, Neu B (2012) KPI-based service composition modeling and optimization with design time user interaction. In: Proceedings of the 2012 IEEE 9th International conference on services computing (SCC). IEEE, Los Alamitos, pp 692–693
Lerman K, Ghosh R (2010) Information contagion: an empirical study of the spread of news on Digg and Twitter social networks. arXiv:1003.2664 [cs.CY]. Available via https://arxiv.org/abs/1003.2664
Leskovec J, Rajaraman A, Ullman J (2014) Mining of massive datasets. Cambridge University Press, Cambridge
Linden G, Smith B, York J (2003) Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Comput 7(1):76–80
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
McCallum A, Nigam K (1998) A comparison of event models for naive Bayes text classification. In: Proceedings of AAAI/ICML–98 workshop on learning for text categorization. AAAI Press, pp 41–48
McSherry F, Mironov I (2009) Differentially private recommender systems: building privacy into the net. In: Proceedings of the 15th ACM SIGKDD International conference on knowledge discovery and data mining (KDD’09). ACM, New York, pp 627–636
Menychtas A, Vogel J, Giessmann A, Gatzioura A, Gomez SG, Moulos V, Junker F, Müller M, Kyriazis D, Stanoevska-Slabeva K (2014) 4CaaSt marketplace: an advanced business environment for trading cloud services. Futur Gener Comput Syst 41:104–120
Miner D, Shook A (2012) MapReduce design patterns: building effective algorithms and analytics for Hadoop and other systems. O’Reilly Media, Beijing
Mislove A, Marcon M, Gummadi KP, Druschel P, Bhattacharjee B (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement (IMC’07). ACM, New York, pp 29–42
Muller M, Ehrlich K, Matthews T, Perer A, Ronen I, Guy I (2012) Diversity among enterprise online communities: collaborating, teaming, and innovating through social media. In: Proceedings of the ACM SIGCHI conference on human factors in computing systems (CHI’12). ACM, New York
Nadeau D, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes. 30(1):3–26
OASIS (2006) Web Services Security: SOAP Message Security 1.1 (Technical Report). Available via http://docs.oasis-open.org/wss/v1.1/
Ohm JR (2016) Multimedia content analysis. Signals and communication technology. Springer, Berlin
O’Reilly T (2007) What is Web 2.0: design patterns and business models for the next generation of software. Commun Strateg 1:17
Page L, Brin S, Motwani R, Winograd T (1999) The PageRank citation ranking: Bringing order to the Web. (Technical Report, 1999-66) Stanford InfoLab. Available via http://ilpubs.stanford.edu:8090/422/
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing (EMNLP’02), vol 10. Association for Computational Linguistics, Stroudsburg, pp 79–86
Papazoglou M (2012) Web services and SOA: principles and technology, 2nd edn. Pearson Education, Harlow
Pautasso C, Zimmermann O, Leymann F (2008) Restful Web services vs. “Big” Web services: making the right architectural decision. In: Proceedings of the 17th International conference on World Wide Web (WWW’08). ACM, New York, pp 805–814
Pedrinaci C, Domingue J (2010) Toward the next wave of services: linked services for the Web of data. J Univers Comput Sci 16(13):1694–1719
Prahalad CK, Ramaswamy V (2004) Co-creation experiences: the next practice in value creation. J Interact Mark 18(3):5–14
Resnick P, Varian HR (1997) Recommender systems. Commun ACM 40(3):56–58
Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) GroupLens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on computer supported cooperative work (CSCW’94), pp 175–186. ACM, New York
Roesner F, Kohno T, Wetherall D (2012) Detecting and defending against third-party tracking on the Web. In: Proceedings of the 9th USENIX conference on networked systems design and implementation (NSDI’12). USENIX Association, Berkeley, p 12
Rose J, Rehse O, Röber B (2012) The value of our digital identity. The Boston Consulting Group. Available via http://www.libertyglobal.com/PDF/public-policy/The-Value-of-Our-Digital-Identity.pdf
Ruppenhofer J, Somasundaran S, Wiebe J (2008) Finding the sources and targets of subjective expressions. In: Proceedings of the 6th International language resources and evaluation (LREC’08)
Russell S, Norvig P (2009) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall Press, Upper Saddle River
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. IP&M 24(5):513–523
Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th International conference on World Wide Web (WWW’01). ACM, New York, pp 285–295
Schein AI, Popescul A, Ungar LH, Pennock DM (2002) Methods and metrics for cold-start recommendations. In: Proceedings of the 25th annual International ACM SIGIR conference on research and development in information retrieval (SIGIR’02). ACM, New York, pp 253–260
Schmidt A, Otto B, Österle H (2010) Integrating information systems: case studies on current challenges. Electron Mark 20(2):161–174
Sebesta RW (2015) Programming the World Wide Web. Pearson Education, Boston
Sedgewick R, Wayne K (2011) Algorithms. Pearson Education, Boston
Sommerville I (2016) Software engineering. Pearson, Boston
Soriano J, Heitz C, Hutter HP, Fernández R, Hierro JJ, Vogel J, Edmonds A, Bohnert TM (2013) Internet of services. In: Bertin E, Crespi N, Magedanz T (eds) Evolution of telecommunication services: the convergence of telecom and Internet: technologies and ecosystems. Springer, Berlin/Heidelberg, pp 283–325
Tache N (ed) (2016) Big data now, 2015 edn. O’Reilly, Sebastopol
Tanenbaum AS, Wetherall DJ (2011) Computer networks. Pearson Prentice Hall, Boston
Tobin A (2015) Is Google translate good enough for commercial websites? A machine translation evaluation of text from English websites into four different languages. Reitaku Rev 21:94–116
Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with Twitter: what 140 characters reveal about political sentiment. In: Proceedings of the fourth International AAAI conference on weblogs and social media, vol 10, pp 178–185
Ullrich C, Borau K, Luo H, Tan X, Shen L, Shen R (2008) Why Web 2.0 is good for learning and for research: principles and prototypes. In: Proceedings of the 17th International conference on World Wide Web (WWW’08). ACM, New York, pp 705–714
Van der Aalst WMP, Ter Hofstede AHM, Weske M (2003) Business process management: a survey. In: Proceedings of the International conference on business process management. Springer, Berlin/New York, pp 1–12
Vetter RJ, Spell C, Ward C (1994) Mosaic and the World Wide Web. Computer 27(10):49–57
Viégas FB, Wattenberg M, Kushal D (2004) Studying cooperation and conflict between authors with history flow visualizations. In: Proceedings of the ACM SIGCHI conference on human factors in computing systems (CHI’04). ACM, New York, pp 575–582
Vogel J, Widmer J (2008) Robustness in network protocols and distributed applications of the Internet. In: Schuster A (ed) Robust intelligent systems. Springer, London, pp 61–86
Wahlster W (2007) Smartweb: multimodal web services on the road. In: Proceedings of the 15th ACM International conference on multimedia (MM’07), ACM, New York, pp 16–16
Weiss SM, Indurkhya N, Zhang T (2015) Fundamentals of predictive text mining. Texts in computer science. Springer, London
White T (2015) Hadoop: the definitive guide. O’Reilly Media, Sebastopol
Whitmore A, Agarwal A, Da Xu L (2015) The Internet of things: a survey of topics and trends. Inf Syst Front 17(2):261–274
Wiegand M, Balahur A, Roth B, Klakow D, Montoyo A (2010) A survey on the role of negation in sentiment analysis. In: Proceedings of the workshop on negation and speculation in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 60–68
Wilson C, Boe B, Sala A, Puttaswamy KPN, Zhao BY (2009) User interactions in social networks and their implications. In: Proceedings of the 4th ACM European conference on computer systems (EuroSys’09). ACM, New York, pp 205–218
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco
W3C (2001) Web Services Description Language (WSDL) (W3C, Technical Report). Available via https://www.w3.org/TR/wsdl
W3C (2006) Extensible Markup Language (XML) (W3C, Technical Report). Available via https://www.w3.org/TR/xml11/
W3C (2007) SOAP Version 1.2 Part 1: Messaging Framework, 2nd edn. (W3C, Technical Report). Available via https://www.w3.org/TR/soap12/
W3C (2011) Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification (W3C, Technical Report). Available via https://www.w3.org/TR/CSS2/
W3C (2012) XML Schema Definition Language (XSD) (W3C, Technical Report). Available via https://www.w3.org/TR/xmlschema11-1/
W3C (2013) SPARQL 1.1 Overview (W3C, Technical Report). Available via https://www.w3.org/TR/sparql11-overview/
W3C (2014a) HTML5 (W3C, Technical Report). Available via https://www.w3.org/TR/html5/
W3C (2014b) RDF 1.1 Primer (W3C, Technical Report). Available via https://www.w3.org/TR/rdf11-primer/
W3C (2014c) RDF Schema 1.1 (W3C, Technical Report). Available via https://www.w3.org/TR/rdf-schema/
W3C (2015) RDFa 1.1 Primer (W3C, Technical Report). Available via https://www.w3.org/TR/rdfa-primer/
Zhu H, Kraut R, Kittur A (2012) Effectiveness of shared leadership in online communities. In: Proceedings of the ACM 2012 conference on computer supported cooperative work (CSCW’12). ACM, New York, pp 407–416
Ziegler CN, McNee SM, Konstan JA, Lausen G (2005) Improving recommendation lists through topic diversification. In: Proceedings of the 14th International conference on World Wide Web (WWW’05). ACM, New York, pp 22–32
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Vogel, J. (2017). Distributed and Connected Information in the Internet. In: Schuster, A. (eds) Understanding Information. Advanced Information and Knowledge Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-59090-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-59090-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59089-9
Online ISBN: 978-3-319-59090-5
eBook Packages: Computer ScienceComputer Science (R0)