Indexing in Big Data

Nashipudimath, Madhu M.; Shinde, Subhash K.

doi:10.1007/978-981-13-1513-8_15

Indexing in Big Data

Madhu M. Nashipudimath¹⁷ &
Subhash K. Shinde¹⁸

Conference paper
First Online: 13 September 2018

1660 Accesses
1 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 810))

Abstract

Nowadays communication is through social media for almost all activities like business, knowledge, personal updates, etc. This leads to the generation of large amount of data related to different activities. Hence, social media have become a vital content of our life. But going through this huge data for analysis is a tedious and complex task. There are many solutions to overcome this problem. Data reduction, indexing, and sorting can be the solutions. Further, which will be used for visualization, recommendation, etc. Indexing techniques for highly repetitive data group have become a relevant discussion. These techniques are used to accelerate queries with value and dimension subsetting conditions. There are different types of indexing with the suitability of data type, data size, dimension, representation, storage, etc. Indexing is of vital need as whatever electronic text collection is available, it is mostly large scale and heterogeneous. Hence, the motto is to find an improved approach for text search as it is used right from the help services built into operating systems to locate file on computers. Tree-based indexing, multidimensional indexing, hashing, etc., are few indexing approaches used depending on the data structures and big data analysis (BDA). Indexing’s need is to address the speed of search. So, size of index shall be a fraction of original data and to be built at the speed of data generation to avoid delay in result. Here, few indexing techniques/search structures are discussed based on data structure, frame work, space need, simplified implementations, and applications.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Mining Big data in Real time: Albert bifet. Informatica 37, 15–20 (2013)
Google Scholar
Lodha, R., Jain, H., Kurup, L.: Big data challenges: data analysis perspective. Int. J. Curr. Eng. Technol. 4(5), 3286–3289 (2014)
Google Scholar
Deshpande, P., Sharma, S.C., Peddoju, S.K.: Efficient multimedia data storage in cloud environment. Inform. Int. J. Comput. Inf. 39(4), 431–442 (2015)
Google Scholar
Deshpande, P., Sharma, S.C., Peddoju, S.K.: Data storage security in cloud paradigm. In: Proceedings of Fifth International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol. 436. Springer, Singapore (2016)
Google Scholar
Wang, M., Holub, V., Murphy, J., O’Sullivan, P.: High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring. Future Gener. Comput. Syst. 29(8), 1943–1962 (2013)
Article Google Scholar
Rodríguez-García, M.Á., Valencia-García, R., García-Sánchez, F., Samper-Zapater, J.J.: Creating a semantically-enhanced cloud services environment through ontology evolution. Future Gener. Comput. Syst. 32, 295–306 (2013). https://doi.org/10.1016/j.future.2013.08.003
Article Google Scholar
Wu, K., Shoshani, A., Stockinger, K.: Analyses of multi-level and multi-component compressed bitmap indexes. ACM Trans. Database Syst. 35(1), 1–52 (2010). https://doi.org/10.1145/1670243.1670245
Article Google Scholar
Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X.: Fast graph query processing with a low-cost index. VLDB J. 20(4), 521–539 (2011)
Article Google Scholar
Gonzalez, E., Figueroa, K.: G. Navarro.: Effective proximity retrieval by ordering permutations. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1647–1658 (2008). https://doi.org/10.1109/TPAMI.2007.70815
Article Google Scholar
Mohamed, H., Marchand-Maillet, S.: Quantized ranking for permutation-based indexing. Inf. Syst. (Elsevier) 52, 163–175 (2015)
Google Scholar
Ferragina, P., Roberto, G.: The string B-tree: a new data structure for string search in external memory and its applications. J. ACM (JACM) 46(2), 236–280 (1999)
Article MathSciNet Google Scholar
Wang, F.: Adaptive semi-supervised recursive tree partitioning: the ART towards large scale patient indexing in personalized healthcare. J. Biomed. Inf. 55, 41–54 (2015)
Article Google Scholar
Bentley, J.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 517 (1975)
Article Google Scholar
Lu, Y., Shahabi, C., Kim, S.H.: Efficient indexing and retrieval of large-scale geo-tagged video databases. GeoInformatica 20(4), 829–857 (2016)
Article Google Scholar
Mei, Y., Jing, Z., Liu, J.: Research on index compilation method of integrating big data. Manag. Eng. 22, 7 (2016)
Google Scholar
Devi, R.S., Manjula, D., Siddharth, R.K.: An efficient approach for web indexing of big data through hyperlinks in web crawling. Sci. World J. (2015)
Google Scholar
Borges, P., Mourao, A., Magalhaes, J.: High-dimensional indexing by sparse approximation. In: Proceedings of the ACM ICMR, pp. 163–170. ACM (2015)
Google Scholar
Borges, P., Mourão, A., Magalhães, J.: Large-scale high-dimensional indexing by sparse hashing with l0 approximation. Multimed. Tools Appl. 1–24 (2016)
Google Scholar
Ooi, B.C., Tan, K.L., Yu, C., Bressan, S.: Indexing the edges—a simple and yet efficient approach to high-dimensional indexing. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 166–174. ACM (2000)
Google Scholar
Geng, Y., Huang, X., Yang, G.: Swiftarray: accelerating queries on multidimensional arrays. Tsinghua Sci. Technol. 19(5), 521–530 (2014)
Article Google Scholar
Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.H.: Analysis of the clustering properties of the Hilbert space filling curve. IEEE Trans. Knowl. Data Eng. 13(1), 124–141 (2001)
Article Google Scholar
Lawder, J.K.: Calculation of mappings between one and n-dimensional values using the hilbert space-filling curve, Technical report JL1/00 Birkbeck College, University of London (2000)
Google Scholar
Han, X., et al.: Efficient skyline computation on big data. IEEE Trans. Knowl. Data Eng. 25(11), 2521–2535 (2013)
Article Google Scholar
Godfrey, P., Shipley, R., Gryz, J.: Algorithms and analyses for maximal vector computation. VLDB J. 16(1), 5–28 (2007)
Article Google Scholar
Yu, C., Boyd, J.: FB + -tree: indexing based on key ranges. In: IEEE 11th International Conference on Networking, Sensing and Control (ICNSC). IEEE (2014)
Google Scholar
Pohl, D., Bouchachia, A., Hellwagner, H.: Online indexing and clustering of social media data for emergency management. Neurocomputing 172, 168–179 (2016)
Article Google Scholar
Fagan, J.L.: Automatic P hrase indexing for document retrieval: an examination of syntactic and non-syntactic methods. In: ACM SIGIR Forum, vol. 51, no. 2. ACM (2017)
Google Scholar
Al-Shalabi, E.F.: The effect of N-gram indexing on Arabic documents retrieval. Int. J. Comput. Sci. Issues (IJCSI) 14(1), 115 (2017)
Google Scholar
Krallinger, M., et al.: Information retrieval and text mining technologies for chemistry. Chem. Rev. (2017)
Google Scholar
Adamu, F.B., Habbal, A., Hassan, S., Les Cottrell, R., White, B., Abdullahi, I.: A survey on big data indexing strategies. In: 4th International Conference on Internet Applications, Protocol and Services (NETAPPS2015), Cyberjaya, Malaysia (2015)
Google Scholar
Paliwal, A.V., Adam, N., Bornhoevd, C.: Adding semantics through service request expansion and latent semantic indexing. In: Proceedings of IEEE International Conference Services Computing (SCC), July 2007
Google Scholar
Gani, A., et al.: A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl. Inf. Syst. 46(2), 241–284 (2016)
Article Google Scholar
Jagadish, H.V., Mendelzon, A.O., Milo, T.: Similarity-based queries. In: Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS’95. ACM, New York, USA (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Engineering, Pacific Academy of Higher Education and Research University, Udaipur, India
Madhu M. Nashipudimath
Computer Engineering, Lokmanya Tilak College of Engineering, Navi Mumbai, India
Subhash K. Shinde

Authors

Madhu M. Nashipudimath
View author publications
You can also search for this author in PubMed Google Scholar
Subhash K. Shinde
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Madhu M. Nashipudimath .

Editor information

Editors and Affiliations

Department of Electronics and Telecommunication Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad, Maharashtra, India
Brijesh Iyer
Department of Electronics and Telecommunication Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, Raigad, Maharashtra, India
S.L. Nalbalwar
Department of Electronics and Communication Engineering, Indian Institute of Technology Roorkee, Roorkee, Uttarakhand, India
Nagendra Prasad Pathak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nashipudimath, M.M., Shinde, S.K. (2019). Indexing in Big Data. In: Iyer, B., Nalbalwar, S., Pathak, N. (eds) Computing, Communication and Signal Processing . Advances in Intelligent Systems and Computing, vol 810. Springer, Singapore. https://doi.org/10.1007/978-981-13-1513-8_15

Download citation

DOI: https://doi.org/10.1007/978-981-13-1513-8_15
Published: 13 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1512-1
Online ISBN: 978-981-13-1513-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics