Abstract
Nowadays communication is through social media for almost all activities like business, knowledge, personal updates, etc. This leads to the generation of large amount of data related to different activities. Hence, social media have become a vital content of our life. But going through this huge data for analysis is a tedious and complex task. There are many solutions to overcome this problem. Data reduction, indexing, and sorting can be the solutions. Further, which will be used for visualization, recommendation, etc. Indexing techniques for highly repetitive data group have become a relevant discussion. These techniques are used to accelerate queries with value and dimension subsetting conditions. There are different types of indexing with the suitability of data type, data size, dimension, representation, storage, etc. Indexing is of vital need as whatever electronic text collection is available, it is mostly large scale and heterogeneous. Hence, the motto is to find an improved approach for text search as it is used right from the help services built into operating systems to locate file on computers. Tree-based indexing, multidimensional indexing, hashing, etc., are few indexing approaches used depending on the data structures and big data analysis (BDA). Indexing’s need is to address the speed of search. So, size of index shall be a fraction of original data and to be built at the speed of data generation to avoid delay in result. Here, few indexing techniques/search structures are discussed based on data structure, frame work, space need, simplified implementations, and applications.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Mining Big data in Real time: Albert bifet. Informatica 37, 15–20 (2013)
Lodha, R., Jain, H., Kurup, L.: Big data challenges: data analysis perspective. Int. J. Curr. Eng. Technol. 4(5), 3286–3289 (2014)
Deshpande, P., Sharma, S.C., Peddoju, S.K.: Efficient multimedia data storage in cloud environment. Inform. Int. J. Comput. Inf. 39(4), 431–442 (2015)
Deshpande, P., Sharma, S.C., Peddoju, S.K.: Data storage security in cloud paradigm. In: Proceedings of Fifth International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol. 436. Springer, Singapore (2016)
Wang, M., Holub, V., Murphy, J., O’Sullivan, P.: High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring. Future Gener. Comput. Syst. 29(8), 1943–1962 (2013)
Rodríguez-García, M.Á., Valencia-García, R., García-Sánchez, F., Samper-Zapater, J.J.: Creating a semantically-enhanced cloud services environment through ontology evolution. Future Gener. Comput. Syst. 32, 295–306 (2013). https://doi.org/10.1016/j.future.2013.08.003
Wu, K., Shoshani, A., Stockinger, K.: Analyses of multi-level and multi-component compressed bitmap indexes. ACM Trans. Database Syst. 35(1), 1–52 (2010). https://doi.org/10.1145/1670243.1670245
Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X.: Fast graph query processing with a low-cost index. VLDB J. 20(4), 521–539 (2011)
Gonzalez, E., Figueroa, K.: G. Navarro.: Effective proximity retrieval by ordering permutations. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1647–1658 (2008). https://doi.org/10.1109/TPAMI.2007.70815
Mohamed, H., Marchand-Maillet, S.: Quantized ranking for permutation-based indexing. Inf. Syst. (Elsevier) 52, 163–175 (2015)
Ferragina, P., Roberto, G.: The string B-tree: a new data structure for string search in external memory and its applications. J. ACM (JACM) 46(2), 236–280 (1999)
Wang, F.: Adaptive semi-supervised recursive tree partitioning: the ART towards large scale patient indexing in personalized healthcare. J. Biomed. Inf. 55, 41–54 (2015)
Bentley, J.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 517 (1975)
Lu, Y., Shahabi, C., Kim, S.H.: Efficient indexing and retrieval of large-scale geo-tagged video databases. GeoInformatica 20(4), 829–857 (2016)
Mei, Y., Jing, Z., Liu, J.: Research on index compilation method of integrating big data. Manag. Eng. 22, 7 (2016)
Devi, R.S., Manjula, D., Siddharth, R.K.: An efficient approach for web indexing of big data through hyperlinks in web crawling. Sci. World J. (2015)
Borges, P., Mourao, A., Magalhaes, J.: High-dimensional indexing by sparse approximation. In: Proceedings of the ACM ICMR, pp. 163–170. ACM (2015)
Borges, P., Mourão, A., Magalhães, J.: Large-scale high-dimensional indexing by sparse hashing with l0 approximation. Multimed. Tools Appl. 1–24 (2016)
Ooi, B.C., Tan, K.L., Yu, C., Bressan, S.: Indexing the edges—a simple and yet efficient approach to high-dimensional indexing. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 166–174. ACM (2000)
Geng, Y., Huang, X., Yang, G.: Swiftarray: accelerating queries on multidimensional arrays. Tsinghua Sci. Technol. 19(5), 521–530 (2014)
Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.H.: Analysis of the clustering properties of the Hilbert space filling curve. IEEE Trans. Knowl. Data Eng. 13(1), 124–141 (2001)
Lawder, J.K.: Calculation of mappings between one and n-dimensional values using the hilbert space-filling curve, Technical report JL1/00 Birkbeck College, University of London (2000)
Han, X., et al.: Efficient skyline computation on big data. IEEE Trans. Knowl. Data Eng. 25(11), 2521–2535 (2013)
Godfrey, P., Shipley, R., Gryz, J.: Algorithms and analyses for maximal vector computation. VLDB J. 16(1), 5–28 (2007)
Yu, C., Boyd, J.: FB + -tree: indexing based on key ranges. In: IEEE 11th International Conference on Networking, Sensing and Control (ICNSC). IEEE (2014)
Pohl, D., Bouchachia, A., Hellwagner, H.: Online indexing and clustering of social media data for emergency management. Neurocomputing 172, 168–179 (2016)
Fagan, J.L.: Automatic P hrase indexing for document retrieval: an examination of syntactic and non-syntactic methods. In: ACM SIGIR Forum, vol. 51, no. 2. ACM (2017)
Al-Shalabi, E.F.: The effect of N-gram indexing on Arabic documents retrieval. Int. J. Comput. Sci. Issues (IJCSI) 14(1), 115 (2017)
Krallinger, M., et al.: Information retrieval and text mining technologies for chemistry. Chem. Rev. (2017)
Adamu, F.B., Habbal, A., Hassan, S., Les Cottrell, R., White, B., Abdullahi, I.: A survey on big data indexing strategies. In: 4th International Conference on Internet Applications, Protocol and Services (NETAPPS2015), Cyberjaya, Malaysia (2015)
Paliwal, A.V., Adam, N., Bornhoevd, C.: Adding semantics through service request expansion and latent semantic indexing. In: Proceedings of IEEE International Conference Services Computing (SCC), July 2007
Gani, A., et al.: A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl. Inf. Syst. 46(2), 241–284 (2016)
Jagadish, H.V., Mendelzon, A.O., Milo, T.: Similarity-based queries. In: Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS’95. ACM, New York, USA (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Nashipudimath, M.M., Shinde, S.K. (2019). Indexing in Big Data. In: Iyer, B., Nalbalwar, S., Pathak, N. (eds) Computing, Communication and Signal Processing . Advances in Intelligent Systems and Computing, vol 810. Springer, Singapore. https://doi.org/10.1007/978-981-13-1513-8_15
Download citation
DOI: https://doi.org/10.1007/978-981-13-1513-8_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1512-1
Online ISBN: 978-981-13-1513-8
eBook Packages: EngineeringEngineering (R0)