Abstract
In chapters so far, you have relied on HDFS as your storage medium. It has two major advantages for the type of processing we desired to do. It excels at storing large files and enabling distributed processing of these files with help of MapReduce. HDFS is most efficient for tasks that require a pass through all data in a file (or a set of files). In case you only need to access a certain element in a dataset (operation sometimes called point query) or a continuous range of elements (sometimes called range query), HDFS does not provide you an efficient toolkit for the task. You are forced to simply scan over all elements to pick out the ones you are interested in.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, D (2010) DBMS musings: problems with CAP, and Yahoo’s little known NoSQL system. http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html (visited on 09/26/2018)
Gilbert S, Lynch N (2002) Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. en. In: ACM SIGACT News 33.2 (June 2002), p 51. ISSN: 01635700. https://doi.org/10.1145/564585.564601. http://portal.acm.org/citation.cfm?doid=564585.564601 (visited on 09/26/2018)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Wiktorski, T. (2019). NOSQL Databases. In: Data-intensive Systems. Advanced Information and Knowledge Processing(). Springer, Cham. https://doi.org/10.1007/978-3-030-04603-3_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-04603-3_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04602-6
Online ISBN: 978-3-030-04603-3
eBook Packages: Computer ScienceComputer Science (R0)