Big Network Data
The rise of big data brings extraordinary benefits and opportunities to businesses and governments. Enterprise users can analyze their generated data in almost real time and infer the business value obtained timely, such as useful correlations, customer preferences, and hidden patterns. Such big data is usually generated or collected from different networks varying from social networks, communication networks, transportation networks, the World Wide Web (WWW), biological networks, citation networks, etc. To make sure such big network data be processed in real time, big data analytics need to be performed in networks of computing nodes, such as Hadoop and TensorFlow. In this entry, we give the definition of big network data. We then describe a historical background of big network data, which is in line with the evolving of large-scale distributed systems. We then elaborate on the foundations of big network data in networking technologies, such as wireless networks, cloud networks, social networks, and network monitoring. We finally present key applications of big network data in the areas of Internet of Things, network and cloud services, trading promotion, and next-generation networks.
Big network data refers to big data that is generated or collected in different networks varying from social networks, communication networks, transportation networks, the World Wide Web (WWW), biological networks, citation networks, etc., that is, datasets in networks that are so voluminous and complex that traditional data processing methods are inadequate to deal with them, while advanced data analytics methods are needed to extract value from them. The five dimensions to big data known as Volume, Variety, Velocity, Veracity, and Value still apply to big network data.
The evolution of big network data is in line with the evolving of large-scale distributed systems (Akyildiz et al., 2002; Armbrust et al., 2009; Fernando et al., 2013), the Internet, and networking technologies, which turned the processing of big data and obtaining valuable information from the big data increasingly challenging and difficult. Namely, the scale and complexity of network data begin increasing with the invention of computer networks designed to exchange data in the 1950s. From the establishment of Wide Area Networks (WAN) in 1960s, the wide adoption of TCP/IP in 1970s to the most recent development in 5G communication networks, volumes, types, and generation speeds of network data have been increasing dramatically.
One of the first milestones in the history of network data was the arrival of Salesforce.com in 1999, which is a pioneer by delivering enterprise applications via a simple Internet website hosted in servers. This allows the data being generated in the servers with rationale databases. The next development of network data was Amazon Web Services in 2002, which provided a series of cloud-based services including storage and computation. Meanwhile, by 2006, Facebook and Twitter both became available to users throughout the world, which accelerates the generation of not only user data but also large-scale graph data that characterize user relations. With the increasing number and advancement of such cloud services and online social networks, more and more enterprise, governments, and individual users are replying on cloud services for either high efficiency for their daily businesses or low cost for expenditure reduction. With the continuous running of such cloud services, large volumes of data generated at unprecedented rates from sensors, emails, Internet transactions, click streams, user preference of purchases, etc. are getting attention of many enterprises as it poses valuable patterns and business insights.
In the past decade, the availability and growing expectation of ubiquitous connectivity is one of the most transformative technology driving big network data. Whether it is for checking email, carrying a voice conversation, or browsing web pages, mobile users can access these online services regardless of location, time, or circumstance: at the office, on a subway, while in flight, etc. Such ubiquitous mobile accesses to services allow the users to generate data anywhere anytime at a unprecedented rate.
Most recently, the techniques of 5G, Software-Defined Networking (SDN) and Network Function Virtualization are pushing the big network data into a new age. On one hand, these techniques further enlarges the capacity of networks by allowing more users, thereby future contributing to the volumes and diversity of big network data. On the other hand, they enhance the efficiency of big data analytics by allowing more efficient data transfers in networks. For example, SDN that separates the network control logic (the control plane) from the data forwarding plane is a new promising technology to enable efficient data transfers via the flexible management of network resources.
Big data in wireless networks: Given the myriad different use cases and applications of wireless networks, there exist dozens of wireless networking technologies, each of which has its own performance characteristics and optimized for a specific task and context. For example, there are over a dozen of widespread wireless technologies in use: WiFi, Bluetooth, ZigBee, NFC, WiMAX, LTE, HSPA, EV-DO, 3G standards, satellite services, etc. Big network data in different wireless networks is diverse in terms of collecting, storing, and processing methods, as they are designed for different usage scenarios and handle user data in different ways. For example, wireless sensor networks focus on the collection of multidimensional (location, time) sensing data in an energy-efficient and bandwidth-saving manner (Akyildiz et al., 2002; Cardei et al., 2002; Liang, 2006), and thus various data collection, data compression, data transmission, and coding methods are proposed. Internet-of-Things (IoT) networks are another major source of big network data. Given that sensors are used in nearly all industries, the IoT is expected to produce a huge amount of data (Ahmed et al., 2017). The volume and diversity depend on specific application of the IoT.
Big data in cloud networks: With an exponential increase in cloud services for smart device users, there is an increase in the bulk amount of data generated by the services. Generally, most service providers, including Google, Amazon, Microsoft, and so on, have deployed a large number of geographically distributed data centers that are interconnected by wide-area networks – known as cloud networks, to process huge amounts of big network data so that users can get quick response. There are two directions related to big data processing in such cloud networks. One direction is the design of efficient parallel processing frameworks and tools to achieve real-time process of big data within one data center. Hadoop and SPARK are examples for this purpose, which are widely used by cloud service providers (Singh and Reddy, 2014; Reyes-Ortiz et al., 2015; Zaharia et al., 2016). Another direction is the design of novel network architectures, communication protocols, routing algorithms, etc. to alleviate the bottleneck of underlying network infrastructures in terms of transfer congestions due to big data transfers (Gu et al., 2014; Jiao et al., 2014; Xia et al., 2016, 2017a,b).
Big graph data: Social networking sites (e.g., Facebook, Twitter, and LinkedIn) keep expanding in terms of the online active users and the data generated by the users. The data in such social networking sites can usually modeled as graphs, with their nodes representing users and edges representing relations. Such graph data are being generated at an unprecedented rate; they are now measured in terabytes and heading toward petabytes, with more than billions of nodes and edges. For example, it is reported that WWW now contains more than 50 billion web pages and more than one trillion URLs (WWW, 2018). A recent snapshot of the friendship network Facebook contains 800 million nodes and over 100 billion links (Facebook, 2018). One of the most fundamental problems related to such big graph data is finding cohesive groups that can be regarded as communities such as friends, co-workers, neighbors, etc. (Zhou et al., 2009). For example, in web graphs, they are likely groups of web pages about the same or related topics. Another fundamental problem is to identify highly “influential” communities and individuals from such large networks as they govern the entire network behaviors and thus bring enormous social and economic benefits (Ding et al., 2007).
Big monitoring data: The complexity of large-scale networks is constantly increasing. With more services being offered and the continuous growth of bandwidth-hungry video-streaming services, network and server infrastructures are becoming extremely difficult to monitor, since the volume and complexity of monitoring data are increasing in an unprecedented rate. Network Traffic Monitoring and Analysis (NTMA) is to process big, heterogeneous, and high-speed data related to various status of the networks (Bär et al., 2014). Although big monitoring data preserve all the “V”s of big data, the traditional techniques for general purpose big data cannot be directly applied to NTMA. The reason behind is that traditional big data technologies lack efficient and effective methods for real-time traffic analysis (Bär et al., 2014; Liu et al., 2014).
Efficient and effective analysis of big network data has wide applications in many areas spanning from Internet-of-Things (IoT) industrial services to every economic aspect of people’s daily lives. The key applications of big network data and its corresponding analytic techniques are summarized as follows, from the perspectives of Internet of Things, communication and cloud services, trading promotion and decision-making, and innovative networking technologies.
Internet of Things: The efficient and continuous processing of big network data in IoTs can contribute to the correct operation and development of IoT applications. For example, data collection efficiency in IoTs can be promoted dramatically if analytic results on big network data show a smart distribution of sensor locations where important values are obtained. In addition, to generate benefits from IoT, enterprises need analytic techniques for big network data, via which they can collect, manage, and analyze a massive volume of data from sensors in a scalable and cost-effective manner. In this context, big network data can assist in consuming, reading, and analyzing diverse enterprise data on information that is collected by enterprises.
Network and cloud services: Big network data is a significant part of network and cloud services. The efficient analysis of big network data in return can contribute to the advancement of such services. For example, the users’ preference of cloud services can be obtained through analyzing user access information, thereby contributing new cloud services. From the views of network service providers, by analyzing the locations of their mobile users based on big network data, strategic deployments of their infrastructures can be performed to maximize the revenue of network service providers.
Trading promotion and decision-making: Identifying highly influential communities and individuals based on big graph data in large-scale networks is significant in terms of social and economic benefits. For example, identifying highly influential communities in a large collaboration network can help to find world-leading research teams and potential research directions. Also, identifying highly influential communities in trading and economic networks will enable enterprises to better promote products and enhance business opportunities. On the other hand, identifying most influential individuals is significant to disparate applications, including accelerating information propagation, controlling rumors and diseases, designing search engines, and understanding hierarchical organization of social and biological networks.
Innovative networking technologies: Big monitoring data in networks characterize the running status of a network that is supported by a specific networking technology. For example, by consistently monitoring and analyzing security attacks in networks, novel security protocols can be designed to prevent future security breaches. Also, analytic results based on the monitoring data related to network resource allocations will have a major impact on the design of network resource allocation strategies and mechanisms.
- Ahmed E, Yaqoob I, Hashem IAT, Khan I, Ahmed AIA, Imran M, Vasilakos AV (2017) The role of big data analytics in internet of things. Comput Netw 129:459–471. https://doi.org/10.1016/j.comnet.2017.06.013, http://www.sciencedirect.com/science/article/pii/S1389128617302591. Special Issue on 5G Wireless Networks for IoT and Body Sensors
- Akyildiz I, Su W, Sankarasubramaniam Y, Cayirci E (2002) Wireless sensor networks: a survey. Comput Netw 38(4):393–422. https://doi.org/10.1016/S1389-1286(01)00302-4, http://www.sciencedirect.com/science/article/pii/S1389128601003024
- Armbrust M, Fox A, Griffith R, Joseph AD, Katz RH, Konwinski A, Lee G, Patterson DA, Rabkin A, Zaharia M (2009) Above the clouds: a berkeley view of cloud computing. Technical reportGoogle Scholar
- Bär A, Finamore A, Casas P, Golab L, Mellia M (2014) Large-scale network traffic monitoring with dbstream, a system for rolling big data analysis. In: 2014 IEEE international conference on big data (Big Data), pp 165–170. https://doi.org/10.1109/BigData.2014.7004227
- Cardei M, MacCallum D, Cheng MX, Min M, Jia X, Li D, Du DZ (2002) Wireless sensor networks with energy efficient organization. J Interconnection Netw 03(03n04):213–229. https://doi.org/10.1142/S021926590200063X
- Ding B, Yu JX, Wang S, Qin L, Zhang X, Lin X (2007) Finding top-k min-cost connected trees in databases. In: 2007 IEEE 23rd international conference on data engineering, pp 836–845. https://doi.org/10.1109/ICDE.2007.367929
- Facebook (2018) Facebook. http://www.facebook.com/press/info.php?statistics
- Fernando N, Loke SW, Rahayu W (2013) Mobile cloud computing: a survey. Futur Gener Comput Syst 29(1):84–106. https://doi.org/10.1016/j.future.2012.05.023, http://www.sciencedirect.com/science/article/pii/S0167739X12001318. Including Special section: AIRCC-NetCoM 2009 and Special section: Clouds and Service-Oriented Architectures
- Jiao L, Lit J, Du W, Fu X (2014) Multi-objective data placement for multi-cloud socially aware services. In: IEEE INFOCOM 2014 – IEEE conference on computer communications, pp 28–36. https://doi.org/10.1109/INFOCOM.2014.6847921
- Reyes-Ortiz JL, Oneto L, Anguita D (2015) Big data analytics in the cloud: spark on hadoop vs mpi/openmp on beowulf. Proc Comput Sci 53:121–130. https://doi.org/10.1016/j.procs.2015.07.286, http://www.science- direct.com / science / article/pii/S1877050915017895. iNNS Conference on Big Data 2015 Program, San Francisco, 8–10 Aug 2015
- WWW (2018) Www. http://www.worldwidewebsize.com/