Data Provisioning

Grossmann, Wilfried; Rinderle-Ma, Stefanie

doi:10.1007/978-3-662-46531-8_3

Wilfried Grossmann⁵ &
Stefanie Rinderle-Ma⁵

Part of the book series: Data-Centric Systems and Applications ((DCSA))

6849 Accesses

Abstract

This chapter elaborates on the data provisioning process ranging from data collection and extraction to a solid description of concepts and methods for transforming transactional data into analytical data formats. By the term transactional, data we also encompass data with a specific temporal structure, which will be later used in process analysis. Additional focus will be put on big data and data quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Adelberg B (1998) NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents. SIGMOD Rec 27(2):283–294
Article Google Scholar
Agrawal D, Das S, Abbadi Amr El(2011) Big data and cloud computing: current state and future opportunities. In: Ailamaki A, Amer-Yahia A, Patel JM, Risch T, Senellart P, Stoyanovich J (eds) EDBT’11: international conference on extending database technology. ACM, New York, pp 530–533
Google Scholar
Batini C, Lenzerini M, Navathe SB (1986) A comparative analysis of methodologies for database schema integration. ACM Comput Surv 18(4):323–364
Article Google Scholar
Baumgartner R, Gottlob G, Herzog M (2009) Scalable web data extraction for online market intelligence. VLDB Endowment 2(2):1512–1523
Article Google Scholar
Becker M, Chamon P (2006) Process performance management—verzahnte Prozesse stets im Blick. Fachbeitrag BI-Spektrum 01:24–26 (in German)
Google Scholar
Bellahsene Z, Bonifati A, Rahm E (2011) Schema matching and mapping. Springer, New York
Book MATH Google Scholar
Berchtold S, Böhm C, Kriegel H-P (1998) Improving the query performance of high-dimensional index structures by bulk load operations. In: Schek H-J, Saltor F, Ramos I, Alonso G (eds) EDBT’98: international conference on extending database technology. Lecture notes in computer science, vol 1377. Springer, Heidelberg, pp 216–230
Google Scholar
Bernstein PA, Haas LM (2008) Information integration in the enterprise. Commun ACM 51(9):72–79
Article Google Scholar
Bex GJ, Neven F, Vansummeren S (2007) Inferring XML schema definitions from XML data. In: Koch C, Gehrke J, Garofalakis MN, Srivastava D, Aberer K, Deshpande A, Florescu D, Chan CY, Ganti V, Kanne CC, Klas W, Neuhold EJ (eds) VLDB’07: international conference on very large data bases. ACM, New York, pp 998–1009
Google Scholar
Beyer M (2011) Gartner says solving ‘Big Data’ challenge involves more than just managing volumes of data. Gartner. http://www.gartner.com/it/page.jsp?id=1731916. Accessed 19 May 2014
Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Seman Web Inf Syst 5(3):1–22
Article Google Scholar
Bleiholder J, Naumann F (2009) Data fusion. ACM Comput Surv 41(1):1–41
Article Google Scholar
Bonifati A, Casati F, Dayal U, Shan M (2001) Warehousing workflow data: challenges and opportunities. In: Apers PMG, Atzeni P, Ceri S, Paraboschi S, Ramamohanarao K, Snodgrass RT (eds) VLDB’01: international conference on very large data bases. Morgan Kaufmann, San Francisco, pp 649–652
Google Scholar
Bourret R, Bornhovd C, Buchmann A (2000) Generic load/extract utility for data transfer between XML document and relational databases. In: WECWIS’00: international workshop on advance issues of e-commerce and web-based information systems. IEEE, New York, pp 134–143
Google Scholar
Buneman P, Khanna S, Tan W-C (2000) Data provenance: some basic issues. In: Karpoor S, Prasad S (eds) Foundations of software technology and theoretical computer science. Lecture notes in computer science, vol 1974. Springer, Heidelberg, pp 87–93
Google Scholar
Cappiello C, Daniel F, Matera M (2014) Mashups a journey from concepts and models to the quality of applications. ICWE 2014 tutorial
Google Scholar
Cattell R (2011) Scalable SQL and NoSQL data stores. SIGMOD Rec 39(4):12–27
Article Google Scholar
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):4
Article Google Scholar
Chaudhuri S, Dayal U, Ganti V (2001) Database technology for decision support systems. Computer 34(12):48–55
Article Google Scholar
Chaudhuri S, Dayal U, Narasayya V (2011) An overview of business intelligence technology. Commun ACM 54:88
Article Google Scholar
Cohen W, Ravikumar P, Fienberg S (2003) A comparison of string metrics for matching names and records. In: Kambhampati A, Knoblock CA (eds) IIWeb-03: proceedings of IJCAI-03 workshop on information integration on the web, pp 73–78
Google Scholar
Daniel F, Matera M (2014) Mashups: concepts, models and architectures. Data-centric systems and applications. Springer, New York
Book Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Dunkl R, Binder M, Dorda W, Fröschl KA, Gall W, Grossmann W, Harmankaya K, Hronsky M, Rinderle-Ma S, Rinner C, Weber S (2012) On analyzing process compliance in skin cancer treatment: an experience report from the evidence-based medical compliance cluster (EBMC2). In: Ralyte J, Franch X, Brinkkemper S, Wrycza S (eds) CaISE’12: international conference on advanced information systems engineering. Lecture notes in computer science, vol 7328. Springer, Heidelberg, pp 398–413
Google Scholar
Facebook Key Facts (2012) http://newsroom.fb.com/Key-Facts. Accessed 5 Jan 2013
Ferguson M (2014) Improving access to data for successful business intelligence. White Paper. Progress
Google Scholar
Florescu D, Kossmann D (1999) Storing and querying XML data using an RDMBS. IEEE Data Eng Bull 22:27–34
Google Scholar
Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM Sigmod Rec 34(2):18–26
Article Google Scholar
Garcia-Molina H, Labio WJ (2006) Efficient snapshot differential algorithms for data warehousing. Technical Report. Stanford University
Google Scholar
Garofalakis M, Gionis A, Rastogi R, Seshadri S, Shim K (2003) XTRACT: Learning document type descriptors from XML document collections. Data Min Knowl Disc 7:23–56
Article MathSciNet Google Scholar
Golfarelli M, Maio D, Rizzi S (1998) The dimensional fact model: a conceptual model for data warehouses. Int J Coop Inf Syst 7(02n03):215–247
Google Scholar
Gretschmann M (2013) Everything new with big data. In: Keynote at the predictive analytics conference, Vienna, 25 September 2013 (in German)
Google Scholar
Grün C, Holupirek A, Kramis M, Scholl MH, Waldvogel M (2006) pushing XPath accelerator to its limits. In: Bonnet P, Manolescu I (eds) ExpDB’06: International workshop on performance and evaluation of data management systems. ACM, New York
Google Scholar
Günther C, van der Aalst WMP (2006) Generic import framework for process event logs. In: Eder J, Dustdar S (eds) Business process management workshops. Lecture notes in computer science, vol 4103. Springer, Heidelberg, pp 81–92
Chapter Google Scholar
Haerder T, Reuter A (1983) Principles of transaction-oriented database recovery. ACM Comput Surv 15(4):287–317
Article MathSciNet Google Scholar
Han J, Chen Y, Dong G, Pei J, Wah BW, Wang J, Cai YD (2005) Stream cube: an architecture for multi-dimensional analysis of data streams. Distrib Parallel Databases 18(2):173–197
Article Google Scholar
Hernandez MA, Stolfo SJ (1998) Real-world data is dirty: data cleansing and the merge/purge problem. Data Min Knowl Discov 2(1):9–37
Article Google Scholar
Inmon WH (2002) Building the data warehouse. Wiley, New York
Google Scholar
Kearny AT (2014) Beyond big: the analytically powered organization. Online Report, http://www.atkearney.com/analytics/featured-article/-/asset_publisher/FNSUwH9BGQyt/content/beyond-big-the-analytically-powered-organization/10192. Accessed 21 Nov 2014
Kimball R, Ross M (2010) The Kimball Group Reader. Relentlessly practical tools for data warehousing and business intelligence. Wiley, New York
Google Scholar
Kimball R, Ross M, Thornthwaite W, Mundy J, Becker B (2011) The data warehouse lifecycle toolkit. Wiley, New York
Google Scholar
Klettke M, Meyer H (2003) XML and databases. dpunkt (in German)
Google Scholar
Leser U, Naumann F (2007) Information Integration. dpunkt (in German)
Google Scholar
Levene M, Loizou G (2003) Why is the snowflake schema a good data warehouse design? Inf Syst 28(3):225–240
Article Google Scholar
Li W-S, Clifton C (2000) SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl Eng 33(1):49–84
Article MATH Google Scholar
Moos A (2008) XQuery und SQL/XML in DB2-Datenbanken. Vieweg+Teubner
Google Scholar
NoSQL (2013) http://nosql-database.org/. Accessed 20 Jan 2013
O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: Agrawal R, Dittrich KR (eds) ICDE’02: 18th international conference on data engineering. IEEE, New York, pp 685–694
Chapter Google Scholar
Peltz C (2003) Web services orchestration and choreography. Computer 36(10):46–52
Article Google Scholar
Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350
Article MATH Google Scholar
Rahm E, Do HH (2000) Data cleaning: problems and current approaches. IEEE Data Eng Bull 23(4):3–13
Google Scholar
Santos RJ, Bernardino J (2009) Optimizing data warehouse loading procedures for enabling useful-time data warehousing. In: Desai BC, Saccà D, Greco S (eds) IDEAS’09: international database engineering and applications symposium. ACM, New York, pp 292–299
Google Scholar
Seeger M (2009) Key value stores: a practical overview. medien informatik. slideshare.net, http://de.slideshare.net/marc.seeger/keyvalue-stores-a-practical-overview. Accessed 20 Jan 2013
Shanmugasundaram J, Shekita E, Barr R, Carey M, Lindsay B, Pirahesh H, Reinwald B (2001) Efficiently publishing relational data as XML documents. VLDB J 10(2–3):133–154
MATH Google Scholar
Shvaiko P, Euzenat J (2005) A survey of schema-based matching approaches. J Data Seman IV:146–171
Google Scholar
Spaccapietra S, Parent C, Dupont Y (1992) Model independent assertions for integration of heterogeneous schemas. VLDB J 1:81–123
Article Google Scholar
“Stemming”. Wikipedia, the Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Stemming&oldid=535260860. Accessed 28 Jan 2013
Stonebraker M (2010) SQL databases v. NoSQL databases. Commun ACM 53(4):10–11
Article Google Scholar
Terdiman D (2012) Report: twitter hits half a billion tweets a day, CNET http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/. Accessed 5 Jan 2013
van der Aalst WMP (2011) Process mining—discovery, conformance and enhancement of business processes. Springer, New York
MATH Google Scholar
van der Aalst WMP et al. (2012) Process mining manifesto. In: Daniel F, Barkaoui K, Dustdar S (eds) Business process management workshops. Lecture notes in business information processing, vol 99. Springer, Heidelberg, pp 169–194
Chapter Google Scholar
Vassiliadis P, Simitsis A, Skiadopoulos S (2002) Conceptual modeling for ETL processes. In: Song I-Y, Theodoratos D (eds) DOLAP’02: ACM fifth international workshop on data warehousing and OLAP, pp 14–21
Google Scholar
Verbeek E, Buijs J, Dongen B, van der Aalst WMP (2011) XES, XESame, and ProM 6. Inf Syst Evol 72:60–75
Google Scholar
Walker M (2012) Data Veracity. www.datasciencecentral.com/profiles/blogs/data-veracity. Accessed 12 Sept 2013
Wang C, Wang Q, Ren K, Lou W (2010) Privacy-preserving public auditing for data storage security in cloud computing, INFOCOM’10: 29th IEEE international conference on computer communications, pp 1–9
Google Scholar
White T (2012) Hadoop: the definitive guide. O’Reilly Media, Sebastopol
Google Scholar

Download references

Author information

Authors and Affiliations

University of Vienna, Vienna, Austria
Wilfried Grossmann & Stefanie Rinderle-Ma

Authors

Wilfried Grossmann
View author publications
You can also search for this author in PubMed Google Scholar
Stefanie Rinderle-Ma
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Grossmann, W., Rinderle-Ma, S. (2015). Data Provisioning. In: Fundamentals of Business Intelligence. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46531-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-662-46531-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46530-1
Online ISBN: 978-3-662-46531-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics