Skip to main content

Indexing in Big Data

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 810))

Abstract

Nowadays communication is through social media for almost all activities like business, knowledge, personal updates, etc. This leads to the generation of large amount of data related to different activities. Hence, social media have become a vital content of our life. But going through this huge data for analysis is a tedious and complex task. There are many solutions to overcome this problem. Data reduction, indexing, and sorting can be the solutions. Further, which will be used for visualization, recommendation, etc. Indexing techniques for highly repetitive data group have become a relevant discussion. These techniques are used to accelerate queries with value and dimension subsetting conditions. There are different types of indexing with the suitability of data type, data size, dimension, representation, storage, etc. Indexing is of vital need as whatever electronic text collection is available, it is mostly large scale and heterogeneous. Hence, the motto is to find an improved approach for text search as it is used right from the help services built into operating systems to locate file on computers. Tree-based indexing, multidimensional indexing, hashing, etc., are few indexing approaches used depending on the data structures and big data analysis (BDA). Indexing’s need is to address the speed of search. So, size of index shall be a fraction of original data and to be built at the speed of data generation to avoid delay in result. Here, few indexing techniques/search structures are discussed based on data structure, frame work, space need, simplified implementations, and applications.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Mining Big data in Real time: Albert bifet. Informatica 37, 15–20 (2013)

    Google Scholar 

  2. Lodha, R., Jain, H., Kurup, L.: Big data challenges: data analysis perspective. Int. J. Curr. Eng. Technol. 4(5), 3286–3289 (2014)

    Google Scholar 

  3. Deshpande, P., Sharma, S.C., Peddoju, S.K.: Efficient multimedia data storage in cloud environment. Inform. Int. J. Comput. Inf. 39(4), 431–442 (2015)

    Google Scholar 

  4. Deshpande, P., Sharma, S.C., Peddoju, S.K.: Data storage security in cloud paradigm. In: Proceedings of Fifth International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol. 436. Springer, Singapore (2016)

    Google Scholar 

  5. Wang, M., Holub, V., Murphy, J., O’Sullivan, P.: High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring. Future Gener. Comput. Syst. 29(8), 1943–1962 (2013)

    Article  Google Scholar 

  6. Rodríguez-García, M.Á., Valencia-García, R., García-Sánchez, F., Samper-Zapater, J.J.: Creating a semantically-enhanced cloud services environment through ontology evolution. Future Gener. Comput. Syst. 32, 295–306 (2013). https://doi.org/10.1016/j.future.2013.08.003

    Article  Google Scholar 

  7. Wu, K., Shoshani, A., Stockinger, K.: Analyses of multi-level and multi-component compressed bitmap indexes. ACM Trans. Database Syst. 35(1), 1–52 (2010). https://doi.org/10.1145/1670243.1670245

    Article  Google Scholar 

  8. Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X.: Fast graph query processing with a low-cost index. VLDB J. 20(4), 521–539 (2011)

    Article  Google Scholar 

  9. Gonzalez, E., Figueroa, K.: G. Navarro.: Effective proximity retrieval by ordering permutations. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1647–1658 (2008). https://doi.org/10.1109/TPAMI.2007.70815

    Article  Google Scholar 

  10. Mohamed, H., Marchand-Maillet, S.: Quantized ranking for permutation-based indexing. Inf. Syst. (Elsevier) 52, 163–175 (2015)

    Google Scholar 

  11. Ferragina, P., Roberto, G.: The string B-tree: a new data structure for string search in external memory and its applications. J. ACM (JACM) 46(2), 236–280 (1999)

    Article  MathSciNet  Google Scholar 

  12. Wang, F.: Adaptive semi-supervised recursive tree partitioning: the ART towards large scale patient indexing in personalized healthcare. J. Biomed. Inf. 55, 41–54 (2015)

    Article  Google Scholar 

  13. Bentley, J.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 517 (1975)

    Article  Google Scholar 

  14. Lu, Y., Shahabi, C., Kim, S.H.: Efficient indexing and retrieval of large-scale geo-tagged video databases. GeoInformatica 20(4), 829–857 (2016)

    Article  Google Scholar 

  15. Mei, Y., Jing, Z., Liu, J.: Research on index compilation method of integrating big data. Manag. Eng. 22, 7 (2016)

    Google Scholar 

  16. Devi, R.S., Manjula, D., Siddharth, R.K.: An efficient approach for web indexing of big data through hyperlinks in web crawling. Sci. World J. (2015)

    Google Scholar 

  17. Borges, P., Mourao, A., Magalhaes, J.: High-dimensional indexing by sparse approximation. In: Proceedings of the ACM ICMR, pp. 163–170. ACM (2015)

    Google Scholar 

  18. Borges, P., Mourão, A., Magalhães, J.: Large-scale high-dimensional indexing by sparse hashing with l0 approximation. Multimed. Tools Appl. 1–24 (2016)

    Google Scholar 

  19. Ooi, B.C., Tan, K.L., Yu, C., Bressan, S.: Indexing the edges—a simple and yet efficient approach to high-dimensional indexing. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 166–174. ACM (2000)

    Google Scholar 

  20. Geng, Y., Huang, X., Yang, G.: Swiftarray: accelerating queries on multidimensional arrays. Tsinghua Sci. Technol. 19(5), 521–530 (2014)

    Article  Google Scholar 

  21. Moon, B., Jagadish, H.V., Faloutsos, C., Saltz, J.H.: Analysis of the clustering properties of the Hilbert space filling curve. IEEE Trans. Knowl. Data Eng. 13(1), 124–141 (2001)

    Article  Google Scholar 

  22. Lawder, J.K.: Calculation of mappings between one and n-dimensional values using the hilbert space-filling curve, Technical report JL1/00 Birkbeck College, University of London (2000)

    Google Scholar 

  23. Han, X., et al.: Efficient skyline computation on big data. IEEE Trans. Knowl. Data Eng. 25(11), 2521–2535 (2013)

    Article  Google Scholar 

  24. Godfrey, P., Shipley, R., Gryz, J.: Algorithms and analyses for maximal vector computation. VLDB J. 16(1), 5–28 (2007)

    Article  Google Scholar 

  25. Yu, C., Boyd, J.: FB + -tree: indexing based on key ranges. In: IEEE 11th International Conference on Networking, Sensing and Control (ICNSC). IEEE (2014)

    Google Scholar 

  26. Pohl, D., Bouchachia, A., Hellwagner, H.: Online indexing and clustering of social media data for emergency management. Neurocomputing 172, 168–179 (2016)

    Article  Google Scholar 

  27. Fagan, J.L.: Automatic P hrase indexing for document retrieval: an examination of syntactic and non-syntactic methods. In: ACM SIGIR Forum, vol. 51, no. 2. ACM (2017)

    Google Scholar 

  28. Al-Shalabi, E.F.: The effect of N-gram indexing on Arabic documents retrieval. Int. J. Comput. Sci. Issues (IJCSI) 14(1), 115 (2017)

    Google Scholar 

  29. Krallinger, M., et al.: Information retrieval and text mining technologies for chemistry. Chem. Rev. (2017)

    Google Scholar 

  30. Adamu, F.B., Habbal, A., Hassan, S., Les Cottrell, R., White, B., Abdullahi, I.: A survey on big data indexing strategies. In: 4th International Conference on Internet Applications, Protocol and Services (NETAPPS2015), Cyberjaya, Malaysia (2015)

    Google Scholar 

  31. Paliwal, A.V., Adam, N., Bornhoevd, C.: Adding semantics through service request expansion and latent semantic indexing. In: Proceedings of IEEE International Conference Services Computing (SCC), July 2007

    Google Scholar 

  32. Gani, A., et al.: A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl. Inf. Syst. 46(2), 241–284 (2016)

    Article  Google Scholar 

  33. Jagadish, H.V., Mendelzon, A.O., Milo, T.: Similarity-based queries. In: Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS’95. ACM, New York, USA (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Madhu M. Nashipudimath .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nashipudimath, M.M., Shinde, S.K. (2019). Indexing in Big Data. In: Iyer, B., Nalbalwar, S., Pathak, N. (eds) Computing, Communication and Signal Processing . Advances in Intelligent Systems and Computing, vol 810. Springer, Singapore. https://doi.org/10.1007/978-981-13-1513-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-1513-8_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-1512-1

  • Online ISBN: 978-981-13-1513-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics