Advertisement

Indexing in Big Data

  • Madhu M. Nashipudimath
  • Subhash K. Shinde
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 810)

Abstract

Nowadays communication is through social media for almost all activities like business, knowledge, personal updates, etc. This leads to the generation of large amount of data related to different activities. Hence, social media have become a vital content of our life. But going through this huge data for analysis is a tedious and complex task. There are many solutions to overcome this problem. Data reduction, indexing, and sorting can be the solutions. Further, which will be used for visualization, recommendation, etc. Indexing techniques for highly repetitive data group have become a relevant discussion. These techniques are used to accelerate queries with value and dimension subsetting conditions. There are different types of indexing with the suitability of data type, data size, dimension, representation, storage, etc. Indexing is of vital need as whatever electronic text collection is available, it is mostly large scale and heterogeneous. Hence, the motto is to find an improved approach for text search as it is used right from the help services built into operating systems to locate file on computers. Tree-based indexing, multidimensional indexing, hashing, etc., are few indexing approaches used depending on the data structures and big data analysis (BDA). Indexing’s need is to address the speed of search. So, size of index shall be a fraction of original data and to be built at the speed of data generation to avoid delay in result. Here, few indexing techniques/search structures are discussed based on data structure, frame work, space need, simplified implementations, and applications.

Keywords

Algorithms Big data analysis Data indexing Data structure Querying 

References

  1. 1.
    Mining Big data in Real time: Albert bifet. Informatica 37, 15–20 (2013)Google Scholar
  2. 2.
    Lodha, R., Jain, H., Kurup, L.: Big data challenges: data analysis perspective. Int. J. Curr. Eng. Technol. 4(5), 3286–3289 (2014)Google Scholar
  3. 3.
    Deshpande, P., Sharma, S.C., Peddoju, S.K.: Efficient multimedia data storage in cloud environment. Inform. Int. J. Comput. Inf. 39(4), 431–442 (2015)Google Scholar
  4. 4.
    Deshpande, P., Sharma, S.C., Peddoju, S.K.: Data storage security in cloud paradigm. In: Proceedings of Fifth International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol. 436. Springer, Singapore (2016)Google Scholar
  5. 5.
    Wang, M., Holub, V., Murphy, J., O’Sullivan, P.: High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring. Future Gener. Comput. Syst. 29(8), 1943–1962 (2013)CrossRefGoogle Scholar
  6. 6.
    Rodríguez-García, M.Á., Valencia-García, R., García-Sánchez, F., Samper-Zapater, J.J.: Creating a semantically-enhanced cloud services environment through ontology evolution. Future Gener. Comput. Syst. 32, 295–306 (2013).  https://doi.org/10.1016/j.future.2013.08.003CrossRefGoogle Scholar
  7. 7.
    Wu, K., Shoshani, A., Stockinger, K.: Analyses of multi-level and multi-component compressed bitmap indexes. ACM Trans. Database Syst. 35(1), 1–52 (2010).  https://doi.org/10.1145/1670243.1670245CrossRefGoogle Scholar
  8. 8.
    Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X.: Fast graph query processing with a low-cost index. VLDB J. 20(4), 521–539 (2011)CrossRefGoogle Scholar
  9. 9.
    Gonzalez, E., Figueroa, K.: G. Navarro.: Effective proximity retrieval by ordering permutations. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1647–1658 (2008).  https://doi.org/10.1109/TPAMI.2007.70815CrossRefGoogle Scholar
  10. 10.
    Mohamed, H., Marchand-Maillet, S.: Quantized ranking for permutation-based indexing. Inf. Syst. (Elsevier) 52, 163–175 (2015)Google Scholar
  11. 11.
    Ferragina, P., Roberto, G.: The string B-tree: a new data structure for string search in external memory and its applications. J. ACM (JACM) 46(2), 236–280 (1999)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Wang, F.: Adaptive semi-supervised recursive tree partitioning: the ART towards large scale patient indexing in personalized healthcare. J. Biomed. Inf. 55, 41–54 (2015)CrossRefGoogle Scholar
  13. 13.
    Bentley, J.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 517 (1975)CrossRefGoogle Scholar
  14. 14.
    Lu, Y., Shahabi, C., Kim, S.H.: Efficient indexing and retrieval of large-scale geo-tagged video databases. GeoInformatica 20(4), 829–857 (2016)CrossRefGoogle Scholar
  15. 15.
    Mei, Y., Jing, Z., Liu, J.: Research on index compilation method of integrating big data. Manag. Eng. 22, 7 (2016)Google Scholar
  16. 16.
    Devi, R.S., Manjula, D., Siddharth, R.K.: An efficient approach for web indexing of big data through hyperlinks in web crawling. Sci. World J. (2015)Google Scholar
  17. 17.
    Borges, P., Mourao, A., Magalhaes, J.: High-dimensional indexing by sparse approximation. In: Proceedings of the ACM ICMR, pp. 163–170. ACM (2015)Google Scholar
  18. 18.
    Borges, P., Mourão, A., Magalhães, J.: Large-scale high-dimensional indexing by sparse hashing with l0 approximation. Multimed. Tools Appl. 1–24 (2016)Google Scholar
  19. 19.
    Ooi, B.C., Tan, K.L., Yu, C., Bressan, S.: Indexing the edges—a simple and yet efficient approach to high-dimensional indexing. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 166–174. ACM (2000)Google Scholar
  20. 20.
    Geng, Y., Huang, X., Yang, G.: Swiftarray: accelerating queries on multidimensional arrays. Tsinghua Sci. Technol. 19(5), 521–530 (2014)CrossRefGoogle Scholar
  21. 21.
    Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.H.: Analysis of the clustering properties of the Hilbert space filling curve. IEEE Trans. Knowl. Data Eng. 13(1), 124–141 (2001)CrossRefGoogle Scholar
  22. 22.
    Lawder, J.K.: Calculation of mappings between one and n-dimensional values using the hilbert space-filling curve, Technical report JL1/00 Birkbeck College, University of London (2000)Google Scholar
  23. 23.
    Han, X., et al.: Efficient skyline computation on big data. IEEE Trans. Knowl. Data Eng. 25(11), 2521–2535 (2013)CrossRefGoogle Scholar
  24. 24.
    Godfrey, P., Shipley, R., Gryz, J.: Algorithms and analyses for maximal vector computation. VLDB J. 16(1), 5–28 (2007)CrossRefGoogle Scholar
  25. 25.
    Yu, C., Boyd, J.: FB + -tree: indexing based on key ranges. In: IEEE 11th International Conference on Networking, Sensing and Control (ICNSC). IEEE (2014)Google Scholar
  26. 26.
    Pohl, D., Bouchachia, A., Hellwagner, H.: Online indexing and clustering of social media data for emergency management. Neurocomputing 172, 168–179 (2016)CrossRefGoogle Scholar
  27. 27.
    Fagan, J.L.: Automatic P hrase indexing for document retrieval: an examination of syntactic and non-syntactic methods. In: ACM SIGIR Forum, vol. 51, no. 2. ACM (2017)Google Scholar
  28. 28.
    Al-Shalabi, E.F.: The effect of N-gram indexing on Arabic documents retrieval. Int. J. Comput. Sci. Issues (IJCSI) 14(1), 115 (2017)Google Scholar
  29. 29.
    Krallinger, M., et al.: Information retrieval and text mining technologies for chemistry. Chem. Rev. (2017)Google Scholar
  30. 30.
    Adamu, F.B., Habbal, A., Hassan, S., Les Cottrell, R., White, B., Abdullahi, I.: A survey on big data indexing strategies. In: 4th International Conference on Internet Applications, Protocol and Services (NETAPPS2015), Cyberjaya, Malaysia (2015)Google Scholar
  31. 31.
    Paliwal, A.V., Adam, N., Bornhoevd, C.: Adding semantics through service request expansion and latent semantic indexing. In: Proceedings of IEEE International Conference Services Computing (SCC), July 2007Google Scholar
  32. 32.
    Gani, A., et al.: A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl. Inf. Syst. 46(2), 241–284 (2016)CrossRefGoogle Scholar
  33. 33.
    Jagadish, H.V., Mendelzon, A.O., Milo, T.: Similarity-based queries. In: Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS’95. ACM, New York, USA (1995)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Faculty of Computer EngineeringPacific Academy of Higher Education and Research UniversityUdaipurIndia
  2. 2.Computer EngineeringLokmanya Tilak College of EngineeringNavi MumbaiIndia

Personalised recommendations