Research challenges of big data
- 54 Downloads
Big data is characterised by new characteristics such as 3Vs (Volume, Velocity, Variety), and/or 5Vs (Volume, Velocity, Variety, Veracity, and Value). Due to the distinguishing characteristics of big data, it is commonly stored and processed using NoSQL (Not Only SQL) database systems. Big data has been utilised in various applications and services ranging from E-commerce through to social media to public sector and governmental organisations. The goal of this editorial note is to provide a concise summary of the big data characteristics, models, and technologies and to identify some of the crucial research challenges that are open for further research.
1 Characteristics of big data
Volume This refers to the massive amount of data which are being generated, gathered, and processed, for example, in the size of petabytes, exabytes, and zettabytes. For instance, Twitter receives/processes millions of tweets on a regular basis. Similarly, Facebook routinely handles millions of posts and images. Google receives more than a billion search queries. Further, millions of data records are gathered from sensor technologies associated with transportation, weather, environmental systems, and so on.
Velocity This refers to the speed at which data are generated, processed, and moved between different systems and devices. Examples include the speed of social media posts; online transactions and fraud checking; live transportation data received from buses, trains, aeroplanes, etc.
Variety This refers to the different types of data that can be used (together) for achieving desired information or results. Types and format of big data include structured, semi-structured, and unstructured data.
Veracity This refers to the quality of data such as correctness, consistency, trust, security, and reliability. For example, data are not stale or out of date for a given purpose. Similarly, data should be correct and consistent and it should be generated by a trusted system.
Value This refers to the different types of benefits that can be derived from processing and analysing big data. Examples include, monetary value, social value, research/education value, and so on.
2 Big data models and technologies
Classical relational models and SQL technologies do not appropriately cater for the needs of big data due to its distinguishing characteristics as illustrated above. Thus, storing and processing of big data require new data models and new technologies. The most commonly used data models for big data are document model; key-value model; column model; and graph model.
3 Big data challenges
NoSQL databases are predominantly used to store and process big data. Such databases provide key benefits such as efficiency, scalability, and availability in storing and processing big data. However, they do not provide appropriate support for transactions, data normalisation, and integrity constraints which affect the consistency of big data [4, 5]. Thus, the current models and techniques implemented in NoSQL databases should be re-examined so that they can be used in applications/services that demand strong data consistency in addition to high efficiency, scalability, and availability.
A number of NoSQL databases have been designed and developed. Different NoSQL systems are implemented using different big data models and technologies. They also provide varying level of QoS with respect to performance, availability, and scalability. This makes the selection of a NoSQL database difficult—i.e. which NoSQL system is chosen for a particular use or application of a big data. This requires the design and development of a new benchmark which users/developers can use to select appropriate NoSQL database that effectively meets their needs.
Data as a Service (DaaS) has emerged as a new platform in order to facilitate the provisioning of data over the Internet and cloud. DaaS is generally based on web services and service-oriented computing (SOC) technologies. DaaS aims to consolidate and organise data in a centralised place in order to enable location transparency as well as sharing of data across different systems and services. However, existing models and architectures of web services and SOC may fall short of meeting the requirements of DaaS provisioning over the Internet and cloud. Thus, new models, methods, and architectures should be developed in order to further materialise the benefits of DaaS.
Internet of Things (IoT) is one of the major platforms (and a source) for big data given that millions of things or devices are generating and consuming a large volume of big data. However, resource scarcity is one of the major issues associated with the IoT devices as they do not have the capabilities of collecting, storing, analysing, and sharing big data in (real) time. Thus, new solutions are required to be developed in order to effectively conjoin IoT and big data.
- 1.Younas M (2018) Transactional services for NoSQL big data systems. In: Keynote talk at the 6th international conference on multimedia computing and systems (ICMCS 2018), Rabat, Morocco, 10–12 May 2018Google Scholar
- 2.Nguyen TL (2018) A framework for five big v’s of big data and organizational culture in firms. In: Proceedings of the IEEE international conference on big data (Big Data 2018), Seattle, WA, USA, 10–13 Dec 2018Google Scholar