NotaQL Is Not a Query Language! It’s for Data Transformation on Wide-Column Stores

Schildgen, Johannes; Deßloch, Stefan

doi:10.1007/978-3-319-20424-6_14

Johannes Schildgen¹⁴ &
Stefan Deßloch¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9147))

Included in the following conference series:

British International Conference on Databases

1542 Accesses
4 Citations

Abstract

It is simple to query a relational database because all columns of the tables are known and the language SQL is easily applicable. In NoSQL, there usually is no fixed schema and no query language. In this article, we present NotaQL, a data-transformation language for wide-column stores. NotaQL is easy to use and powerful. Many MapReduce algorithms like filtering, grouping, aggregation and even breadth-first-search, PageRank and other graph and text algorithms can be expressed in two or three short lines of code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://nosql-database.org.
2.
http://hbase.apache.org.
3.
http://cassandra.apache.org.
4.
http://phoenix.apache.org.
5.
http://prestodb.io.
6.
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration.
7.
https://community.jaspersoft.com/wiki/jaspersoft-hbase-query-language.
8.
These triples are known as entity-attribute-value or object-attribute-value. They are very flexible regarding the number of attributes of each entity.
9.
http://blog.cloudera.com/blog/2012/06/online-hbase-backups-with-copytable-2/.
10.
http://hadoop.apache.org.

References

Buneman, P., Cheney, J.: A copy-and-paste model for provenance in curated databases. Notes 123, 6512 (2005)
Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 1–14 (2008). Article 4
Article MATH Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)
Google Scholar
Emde, M.: GUI und testumgebung für die HBase-schematransformationssprache NotaQL. Bachelor’s thesis, Kaiserslautern University (2014)
Google Scholar
George, L.: HBase: The Definitive Guide, 1st edn. O’Reilly Media, Sebastopol (2011)
Google Scholar
Gonzalez, J.E., Low, Y., Gu, H., Bickson, D., Guestrin, C.: Powergraph: distributed graph-parallel computation on natural graphs. In: OSDI, vol. 12, p. 2 (2012)
Google Scholar
Gupta, A., Jagadish, H.V., Mumick, I.S.: Data integration using self-maintainable views. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 140–144. Springer, Heidelberg (1996)
Google Scholar
Hernández, M.A., Miller, R.J., Haas, L.M.: Clio: A semi-automatic tool for schema mapping. ACM SIGMOD Rec. 30(2), 607 (2001)
Article Google Scholar
Hong, S., Chafi, H., Sedlar, E., Olukotun, K.: Green-marl: a DSL for easy and efficient graph analysis. ACM SIGARCH Comput. Archit. News 40(1), 349–362 (2012)
Article Google Scholar
Lakshmanan, L.V.S., Sadri, F., Subramanian, I.N.: SchemaSQL-a language for interoperability in relational multi-database systems. In: VLDB, vol. 96, pp. 239–250 (1996)
Google Scholar
Lin, J., Dyer, C.: Data-intensive text processing with MapReduce. Synth. Lect. Hum. Lang. Technol. 3(1), 1–177 (2010)
Article Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 135–146. ACM (2010)
Google Scholar
Grinev, M.: Do You Really Need SQL to Do It All in Cassandra? (2010). http://wp.me/pZn7Z-o
Sergey, M., Andrey, A., Long, J.J., Romer, G., Shivakumar, S., Tolton, M., Vassilakis, T.: Dremel: interactive analysis of web-scale datasets. Commun. ACM 54(6), 114–123 (2011)
Article Google Scholar
Murray, D.G., Sherry, F.M.C., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 439–455. ACM (2013)
Google Scholar
Olston, C., Chiou, G., Chitnis, L., Liu, F., Han, Y., Larsson, M., Neumann, A., Rao, V.B.N., Sankarasubramanian, V., Seth, S., et al.: Nova: continuous pig/hadoop workflows. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 1081–1090. ACM (2011)
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM (2008)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report 1999–66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120
Google Scholar
Pike, R., Dorward, S., Griesemer, R., Quinlan, S.: Interpreting the data: parallel analysis with sawzall. Sci. Program. 13(4), 277–298 (2005)
Google Scholar
Sato, K.: An inside look at google bigquery. White paper (2012). https://cloud.google.com/files/BigQueryTechnicalWP.pdf
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
Article Google Scholar
Wyss, C.M., Robertson, E.L.: Relational languages for metadata integration. ACM Trans. Database Syst. (TODS) 30(2), 624–660 (2005)
Article Google Scholar
Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: Graphx: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, p. 2. ACM (2013)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Kaiserslautern, Kaiserslautern, Germany
Johannes Schildgen & Stefan Deßloch

Authors

Johannes Schildgen
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Deßloch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Johannes Schildgen .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, United Kingdom
Sebastian Maneth

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schildgen, J., Deßloch, S. (2015). NotaQL Is Not a Query Language! It’s for Data Transformation on Wide-Column Stores. In: Maneth, S. (eds) Data Science. BICOD 2015. Lecture Notes in Computer Science(), vol 9147. Springer, Cham. https://doi.org/10.1007/978-3-319-20424-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-20424-6_14
Published: 11 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20423-9
Online ISBN: 978-3-319-20424-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics