Abstract
This chapter elaborates on the data provisioning process ranging from data collection and extraction to a solid description of concepts and methods for transforming transactional data into analytical data formats. By the term transactional, data we also encompass data with a specific temporal structure, which will be later used in process analysis. Additional focus will be put on big data and data quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adelberg B (1998) NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents. SIGMOD Rec 27(2):283–294
Agrawal D, Das S, Abbadi Amr El(2011) Big data and cloud computing: current state and future opportunities. In: Ailamaki A, Amer-Yahia A, Patel JM, Risch T, Senellart P, Stoyanovich J (eds) EDBT’11: international conference on extending database technology. ACM, New York, pp 530–533
Batini C, Lenzerini M, Navathe SB (1986) A comparative analysis of methodologies for database schema integration. ACM Comput Surv 18(4):323–364
Baumgartner R, Gottlob G, Herzog M (2009) Scalable web data extraction for online market intelligence. VLDB Endowment 2(2):1512–1523
Becker M, Chamon P (2006) Process performance management—verzahnte Prozesse stets im Blick. Fachbeitrag BI-Spektrum 01:24–26 (in German)
Bellahsene Z, Bonifati A, Rahm E (2011) Schema matching and mapping. Springer, New York
Berchtold S, Böhm C, Kriegel H-P (1998) Improving the query performance of high-dimensional index structures by bulk load operations. In: Schek H-J, Saltor F, Ramos I, Alonso G (eds) EDBT’98: international conference on extending database technology. Lecture notes in computer science, vol 1377. Springer, Heidelberg, pp 216–230
Bernstein PA, Haas LM (2008) Information integration in the enterprise. Commun ACM 51(9):72–79
Bex GJ, Neven F, Vansummeren S (2007) Inferring XML schema definitions from XML data. In: Koch C, Gehrke J, Garofalakis MN, Srivastava D, Aberer K, Deshpande A, Florescu D, Chan CY, Ganti V, Kanne CC, Klas W, Neuhold EJ (eds) VLDB’07: international conference on very large data bases. ACM, New York, pp 998–1009
Beyer M (2011) Gartner says solving ‘Big Data’ challenge involves more than just managing volumes of data. Gartner. http://www.gartner.com/it/page.jsp?id=1731916. Accessed 19 May 2014
Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Seman Web Inf Syst 5(3):1–22
Bleiholder J, Naumann F (2009) Data fusion. ACM Comput Surv 41(1):1–41
Bonifati A, Casati F, Dayal U, Shan M (2001) Warehousing workflow data: challenges and opportunities. In: Apers PMG, Atzeni P, Ceri S, Paraboschi S, Ramamohanarao K, Snodgrass RT (eds) VLDB’01: international conference on very large data bases. Morgan Kaufmann, San Francisco, pp 649–652
Bourret R, Bornhovd C, Buchmann A (2000) Generic load/extract utility for data transfer between XML document and relational databases. In: WECWIS’00: international workshop on advance issues of e-commerce and web-based information systems. IEEE, New York, pp 134–143
Buneman P, Khanna S, Tan W-C (2000) Data provenance: some basic issues. In: Karpoor S, Prasad S (eds) Foundations of software technology and theoretical computer science. Lecture notes in computer science, vol 1974. Springer, Heidelberg, pp 87–93
Cappiello C, Daniel F, Matera M (2014) Mashups a journey from concepts and models to the quality of applications. ICWE 2014 tutorial
Cattell R (2011) Scalable SQL and NoSQL data stores. SIGMOD Rec 39(4):12–27
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):4
Chaudhuri S, Dayal U, Ganti V (2001) Database technology for decision support systems. Computer 34(12):48–55
Chaudhuri S, Dayal U, Narasayya V (2011) An overview of business intelligence technology. Commun ACM 54:88
Cohen W, Ravikumar P, Fienberg S (2003) A comparison of string metrics for matching names and records. In: Kambhampati A, Knoblock CA (eds) IIWeb-03: proceedings of IJCAI-03 workshop on information integration on the web, pp 73–78
Daniel F, Matera M (2014) Mashups: concepts, models and architectures. Data-centric systems and applications. Springer, New York
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Dunkl R, Binder M, Dorda W, Fröschl KA, Gall W, Grossmann W, Harmankaya K, Hronsky M, Rinderle-Ma S, Rinner C, Weber S (2012) On analyzing process compliance in skin cancer treatment: an experience report from the evidence-based medical compliance cluster (EBMC2). In: Ralyte J, Franch X, Brinkkemper S, Wrycza S (eds) CaISE’12: international conference on advanced information systems engineering. Lecture notes in computer science, vol 7328. Springer, Heidelberg, pp 398–413
Facebook Key Facts (2012) http://newsroom.fb.com/Key-Facts. Accessed 5 Jan 2013
Ferguson M (2014) Improving access to data for successful business intelligence. White Paper. Progress
Florescu D, Kossmann D (1999) Storing and querying XML data using an RDMBS. IEEE Data Eng Bull 22:27–34
Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM Sigmod Rec 34(2):18–26
Garcia-Molina H, Labio WJ (2006) Efficient snapshot differential algorithms for data warehousing. Technical Report. Stanford University
Garofalakis M, Gionis A, Rastogi R, Seshadri S, Shim K (2003) XTRACT: Learning document type descriptors from XML document collections. Data Min Knowl Disc 7:23–56
Golfarelli M, Maio D, Rizzi S (1998) The dimensional fact model: a conceptual model for data warehouses. Int J Coop Inf Syst 7(02n03):215–247
Gretschmann M (2013) Everything new with big data. In: Keynote at the predictive analytics conference, Vienna, 25 September 2013 (in German)
Grün C, Holupirek A, Kramis M, Scholl MH, Waldvogel M (2006) pushing XPath accelerator to its limits. In: Bonnet P, Manolescu I (eds) ExpDB’06: International workshop on performance and evaluation of data management systems. ACM, New York
Günther C, van der Aalst WMP (2006) Generic import framework for process event logs. In: Eder J, Dustdar S (eds) Business process management workshops. Lecture notes in computer science, vol 4103. Springer, Heidelberg, pp 81–92
Haerder T, Reuter A (1983) Principles of transaction-oriented database recovery. ACM Comput Surv 15(4):287–317
Han J, Chen Y, Dong G, Pei J, Wah BW, Wang J, Cai YD (2005) Stream cube: an architecture for multi-dimensional analysis of data streams. Distrib Parallel Databases 18(2):173–197
Hernandez MA, Stolfo SJ (1998) Real-world data is dirty: data cleansing and the merge/purge problem. Data Min Knowl Discov 2(1):9–37
Inmon WH (2002) Building the data warehouse. Wiley, New York
Kearny AT (2014) Beyond big: the analytically powered organization. Online Report, http://www.atkearney.com/analytics/featured-article/-/asset_publisher/FNSUwH9BGQyt/content/beyond-big-the-analytically-powered-organization/10192. Accessed 21 Nov 2014
Kimball R, Ross M (2010) The Kimball Group Reader. Relentlessly practical tools for data warehousing and business intelligence. Wiley, New York
Kimball R, Ross M, Thornthwaite W, Mundy J, Becker B (2011) The data warehouse lifecycle toolkit. Wiley, New York
Klettke M, Meyer H (2003) XML and databases. dpunkt (in German)
Leser U, Naumann F (2007) Information Integration. dpunkt (in German)
Levene M, Loizou G (2003) Why is the snowflake schema a good data warehouse design? Inf Syst 28(3):225–240
Li W-S, Clifton C (2000) SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl Eng 33(1):49–84
Moos A (2008) XQuery und SQL/XML in DB2-Datenbanken. Vieweg+Teubner
NoSQL (2013) http://nosql-database.org/. Accessed 20 Jan 2013
O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: Agrawal R, Dittrich KR (eds) ICDE’02: 18th international conference on data engineering. IEEE, New York, pp 685–694
Peltz C (2003) Web services orchestration and choreography. Computer 36(10):46–52
Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350
Rahm E, Do HH (2000) Data cleaning: problems and current approaches. IEEE Data Eng Bull 23(4):3–13
Santos RJ, Bernardino J (2009) Optimizing data warehouse loading procedures for enabling useful-time data warehousing. In: Desai BC, Saccà D, Greco S (eds) IDEAS’09: international database engineering and applications symposium. ACM, New York, pp 292–299
Seeger M (2009) Key value stores: a practical overview. medien informatik. slideshare.net, http://de.slideshare.net/marc.seeger/keyvalue-stores-a-practical-overview. Accessed 20 Jan 2013
Shanmugasundaram J, Shekita E, Barr R, Carey M, Lindsay B, Pirahesh H, Reinwald B (2001) Efficiently publishing relational data as XML documents. VLDB J 10(2–3):133–154
Shvaiko P, Euzenat J (2005) A survey of schema-based matching approaches. J Data Seman IV:146–171
Spaccapietra S, Parent C, Dupont Y (1992) Model independent assertions for integration of heterogeneous schemas. VLDB J 1:81–123
“Stemming”. Wikipedia, the Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Stemming&oldid=535260860. Accessed 28 Jan 2013
Stonebraker M (2010) SQL databases v. NoSQL databases. Commun ACM 53(4):10–11
Terdiman D (2012) Report: twitter hits half a billion tweets a day, CNET http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/. Accessed 5 Jan 2013
van der Aalst WMP (2011) Process mining—discovery, conformance and enhancement of business processes. Springer, New York
van der Aalst WMP et al. (2012) Process mining manifesto. In: Daniel F, Barkaoui K, Dustdar S (eds) Business process management workshops. Lecture notes in business information processing, vol 99. Springer, Heidelberg, pp 169–194
Vassiliadis P, Simitsis A, Skiadopoulos S (2002) Conceptual modeling for ETL processes. In: Song I-Y, Theodoratos D (eds) DOLAP’02: ACM fifth international workshop on data warehousing and OLAP, pp 14–21
Verbeek E, Buijs J, Dongen B, van der Aalst WMP (2011) XES, XESame, and ProM 6. Inf Syst Evol 72:60–75
Walker M (2012) Data Veracity. www.datasciencecentral.com/profiles/blogs/data-veracity. Accessed 12 Sept 2013
Wang C, Wang Q, Ren K, Lou W (2010) Privacy-preserving public auditing for data storage security in cloud computing, INFOCOM’10: 29th IEEE international conference on computer communications, pp 1–9
White T (2012) Hadoop: the definitive guide. O’Reilly Media, Sebastopol
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Grossmann, W., Rinderle-Ma, S. (2015). Data Provisioning. In: Fundamentals of Business Intelligence. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46531-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-662-46531-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46530-1
Online ISBN: 978-3-662-46531-8
eBook Packages: Computer ScienceComputer Science (R0)