Skip to main content

Part of the book series: Data-Centric Systems and Applications ((DCSA))

  • 6849 Accesses

Abstract

This chapter elaborates on the data provisioning process ranging from data collection and extraction to a solid description of concepts and methods for transforming transactional data into analytical data formats. By the term transactional, data we also encompass data with a specific temporal structure, which will be later used in process analysis. Additional focus will be put on big data and data quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adelberg B (1998) NoDoSE—a tool for semi-automatically extracting structured and semistructured data from text documents. SIGMOD Rec 27(2):283–294

    Article  Google Scholar 

  2. Agrawal D, Das S, Abbadi Amr El(2011) Big data and cloud computing: current state and future opportunities. In: Ailamaki A, Amer-Yahia A, Patel JM, Risch T, Senellart P, Stoyanovich J (eds) EDBT’11: international conference on extending database technology. ACM, New York, pp 530–533

    Google Scholar 

  3. Batini C, Lenzerini M, Navathe SB (1986) A comparative analysis of methodologies for database schema integration. ACM Comput Surv 18(4):323–364

    Article  Google Scholar 

  4. Baumgartner R, Gottlob G, Herzog M (2009) Scalable web data extraction for online market intelligence. VLDB Endowment 2(2):1512–1523

    Article  Google Scholar 

  5. Becker M, Chamon P (2006) Process performance management—verzahnte Prozesse stets im Blick. Fachbeitrag BI-Spektrum 01:24–26 (in German)

    Google Scholar 

  6. Bellahsene Z, Bonifati A, Rahm E (2011) Schema matching and mapping. Springer, New York

    Book  MATH  Google Scholar 

  7. Berchtold S, Böhm C, Kriegel H-P (1998) Improving the query performance of high-dimensional index structures by bulk load operations. In: Schek H-J, Saltor F, Ramos I, Alonso G (eds) EDBT’98: international conference on extending database technology. Lecture notes in computer science, vol 1377. Springer, Heidelberg, pp 216–230

    Google Scholar 

  8. Bernstein PA, Haas LM (2008) Information integration in the enterprise. Commun ACM 51(9):72–79

    Article  Google Scholar 

  9. Bex GJ, Neven F, Vansummeren S (2007) Inferring XML schema definitions from XML data. In: Koch C, Gehrke J, Garofalakis MN, Srivastava D, Aberer K, Deshpande A, Florescu D, Chan CY, Ganti V, Kanne CC, Klas W, Neuhold EJ (eds) VLDB’07: international conference on very large data bases. ACM, New York, pp 998–1009

    Google Scholar 

  10. Beyer M (2011) Gartner says solving ‘Big Data’ challenge involves more than just managing volumes of data. Gartner. http://www.gartner.com/it/page.jsp?id=1731916. Accessed 19 May 2014

  11. Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Seman Web Inf Syst 5(3):1–22

    Article  Google Scholar 

  12. Bleiholder J, Naumann F (2009) Data fusion. ACM Comput Surv 41(1):1–41

    Article  Google Scholar 

  13. Bonifati A, Casati F, Dayal U, Shan M (2001) Warehousing workflow data: challenges and opportunities. In: Apers PMG, Atzeni P, Ceri S, Paraboschi S, Ramamohanarao K, Snodgrass RT (eds) VLDB’01: international conference on very large data bases. Morgan Kaufmann, San Francisco, pp 649–652

    Google Scholar 

  14. Bourret R, Bornhovd C, Buchmann A (2000) Generic load/extract utility for data transfer between XML document and relational databases. In: WECWIS’00: international workshop on advance issues of e-commerce and web-based information systems. IEEE, New York, pp 134–143

    Google Scholar 

  15. Buneman P, Khanna S, Tan W-C (2000) Data provenance: some basic issues. In: Karpoor S, Prasad S (eds) Foundations of software technology and theoretical computer science. Lecture notes in computer science, vol 1974. Springer, Heidelberg, pp 87–93

    Google Scholar 

  16. Cappiello C, Daniel F, Matera M (2014) Mashups a journey from concepts and models to the quality of applications. ICWE 2014 tutorial

    Google Scholar 

  17. Cattell R (2011) Scalable SQL and NoSQL data stores. SIGMOD Rec 39(4):12–27

    Article  Google Scholar 

  18. Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):4

    Article  Google Scholar 

  19. Chaudhuri S, Dayal U, Ganti V (2001) Database technology for decision support systems. Computer 34(12):48–55

    Article  Google Scholar 

  20. Chaudhuri S, Dayal U, Narasayya V (2011) An overview of business intelligence technology. Commun ACM 54:88

    Article  Google Scholar 

  21. Cohen W, Ravikumar P, Fienberg S (2003) A comparison of string metrics for matching names and records. In: Kambhampati A, Knoblock CA (eds) IIWeb-03: proceedings of IJCAI-03 workshop on information integration on the web, pp 73–78

    Google Scholar 

  22. Daniel F, Matera M (2014) Mashups: concepts, models and architectures. Data-centric systems and applications. Springer, New York

    Book  Google Scholar 

  23. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  24. Dunkl R, Binder M, Dorda W, Fröschl KA, Gall W, Grossmann W, Harmankaya K, Hronsky M, Rinderle-Ma S, Rinner C, Weber S (2012) On analyzing process compliance in skin cancer treatment: an experience report from the evidence-based medical compliance cluster (EBMC2). In: Ralyte J, Franch X, Brinkkemper S, Wrycza S (eds) CaISE’12: international conference on advanced information systems engineering. Lecture notes in computer science, vol 7328. Springer, Heidelberg, pp 398–413

    Google Scholar 

  25. Facebook Key Facts (2012) http://newsroom.fb.com/Key-Facts. Accessed 5 Jan 2013

  26. Ferguson M (2014) Improving access to data for successful business intelligence. White Paper. Progress

    Google Scholar 

  27. Florescu D, Kossmann D (1999) Storing and querying XML data using an RDMBS. IEEE Data Eng Bull 22:27–34

    Google Scholar 

  28. Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM Sigmod Rec 34(2):18–26

    Article  Google Scholar 

  29. Garcia-Molina H, Labio WJ (2006) Efficient snapshot differential algorithms for data warehousing. Technical Report. Stanford University

    Google Scholar 

  30. Garofalakis M, Gionis A, Rastogi R, Seshadri S, Shim K (2003) XTRACT: Learning document type descriptors from XML document collections. Data Min Knowl Disc 7:23–56

    Article  MathSciNet  Google Scholar 

  31. Golfarelli M, Maio D, Rizzi S (1998) The dimensional fact model: a conceptual model for data warehouses. Int J Coop Inf Syst 7(02n03):215–247

    Google Scholar 

  32. Gretschmann M (2013) Everything new with big data. In: Keynote at the predictive analytics conference, Vienna, 25 September 2013 (in German)

    Google Scholar 

  33. Grün C, Holupirek A, Kramis M, Scholl MH, Waldvogel M (2006) pushing XPath accelerator to its limits. In: Bonnet P, Manolescu I (eds) ExpDB’06: International workshop on performance and evaluation of data management systems. ACM, New York

    Google Scholar 

  34. Günther C, van der Aalst WMP (2006) Generic import framework for process event logs. In: Eder J, Dustdar S (eds) Business process management workshops. Lecture notes in computer science, vol 4103. Springer, Heidelberg, pp 81–92

    Chapter  Google Scholar 

  35. Haerder T, Reuter A (1983) Principles of transaction-oriented database recovery. ACM Comput Surv 15(4):287–317

    Article  MathSciNet  Google Scholar 

  36. Han J, Chen Y, Dong G, Pei J, Wah BW, Wang J, Cai YD (2005) Stream cube: an architecture for multi-dimensional analysis of data streams. Distrib Parallel Databases 18(2):173–197

    Article  Google Scholar 

  37. Hernandez MA, Stolfo SJ (1998) Real-world data is dirty: data cleansing and the merge/purge problem. Data Min Knowl Discov 2(1):9–37

    Article  Google Scholar 

  38. Inmon WH (2002) Building the data warehouse. Wiley, New York

    Google Scholar 

  39. Kearny AT (2014) Beyond big: the analytically powered organization. Online Report, http://www.atkearney.com/analytics/featured-article/-/asset_publisher/FNSUwH9BGQyt/content/beyond-big-the-analytically-powered-organization/10192. Accessed 21 Nov 2014

  40. Kimball R, Ross M (2010) The Kimball Group Reader. Relentlessly practical tools for data warehousing and business intelligence. Wiley, New York

    Google Scholar 

  41. Kimball R, Ross M, Thornthwaite W, Mundy J, Becker B (2011) The data warehouse lifecycle toolkit. Wiley, New York

    Google Scholar 

  42. Klettke M, Meyer H (2003) XML and databases. dpunkt (in German)

    Google Scholar 

  43. Leser U, Naumann F (2007) Information Integration. dpunkt (in German)

    Google Scholar 

  44. Levene M, Loizou G (2003) Why is the snowflake schema a good data warehouse design? Inf Syst 28(3):225–240

    Article  Google Scholar 

  45. Li W-S, Clifton C (2000) SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl Eng 33(1):49–84

    Article  MATH  Google Scholar 

  46. Moos A (2008) XQuery und SQL/XML in DB2-Datenbanken. Vieweg+Teubner

    Google Scholar 

  47. NoSQL (2013) http://nosql-database.org/. Accessed 20 Jan 2013

  48. O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: Agrawal R, Dittrich KR (eds) ICDE’02: 18th international conference on data engineering. IEEE, New York, pp 685–694

    Chapter  Google Scholar 

  49. Peltz C (2003) Web services orchestration and choreography. Computer 36(10):46–52

    Article  Google Scholar 

  50. Rahm E, Bernstein PA (2001) A survey of approaches to automatic schema matching. VLDB J 10(4):334–350

    Article  MATH  Google Scholar 

  51. Rahm E, Do HH (2000) Data cleaning: problems and current approaches. IEEE Data Eng Bull 23(4):3–13

    Google Scholar 

  52. Santos RJ, Bernardino J (2009) Optimizing data warehouse loading procedures for enabling useful-time data warehousing. In: Desai BC, Saccà D, Greco S (eds) IDEAS’09: international database engineering and applications symposium. ACM, New York, pp 292–299

    Google Scholar 

  53. Seeger M (2009) Key value stores: a practical overview. medien informatik. slideshare.net, http://de.slideshare.net/marc.seeger/keyvalue-stores-a-practical-overview. Accessed 20 Jan 2013

  54. Shanmugasundaram J, Shekita E, Barr R, Carey M, Lindsay B, Pirahesh H, Reinwald B (2001) Efficiently publishing relational data as XML documents. VLDB J 10(2–3):133–154

    MATH  Google Scholar 

  55. Shvaiko P, Euzenat J (2005) A survey of schema-based matching approaches. J Data Seman IV:146–171

    Google Scholar 

  56. Spaccapietra S, Parent C, Dupont Y (1992) Model independent assertions for integration of heterogeneous schemas. VLDB J 1:81–123

    Article  Google Scholar 

  57. “Stemming”. Wikipedia, the Free Encyclopedia, http://en.wikipedia.org/w/index.php?title=Stemming&oldid=535260860. Accessed 28 Jan 2013

  58. Stonebraker M (2010) SQL databases v. NoSQL databases. Commun ACM 53(4):10–11

    Article  Google Scholar 

  59. Terdiman D (2012) Report: twitter hits half a billion tweets a day, CNET http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/. Accessed 5 Jan 2013

  60. van der Aalst WMP (2011) Process mining—discovery, conformance and enhancement of business processes. Springer, New York

    MATH  Google Scholar 

  61. van der Aalst WMP et al. (2012) Process mining manifesto. In: Daniel F, Barkaoui K, Dustdar S (eds) Business process management workshops. Lecture notes in business information processing, vol 99. Springer, Heidelberg, pp 169–194

    Chapter  Google Scholar 

  62. Vassiliadis P, Simitsis A, Skiadopoulos S (2002) Conceptual modeling for ETL processes. In: Song I-Y, Theodoratos D (eds) DOLAP’02: ACM fifth international workshop on data warehousing and OLAP, pp 14–21

    Google Scholar 

  63. Verbeek E, Buijs J, Dongen B, van der Aalst WMP (2011) XES, XESame, and ProM 6. Inf Syst Evol 72:60–75

    Google Scholar 

  64. Walker M (2012) Data Veracity. www.datasciencecentral.com/profiles/blogs/data-veracity. Accessed 12 Sept 2013

  65. Wang C, Wang Q, Ren K, Lou W (2010) Privacy-preserving public auditing for data storage security in cloud computing, INFOCOM’10: 29th IEEE international conference on computer communications, pp 1–9

    Google Scholar 

  66. White T (2012) Hadoop: the definitive guide. O’Reilly Media, Sebastopol

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Grossmann, W., Rinderle-Ma, S. (2015). Data Provisioning. In: Fundamentals of Business Intelligence. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46531-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-46531-8_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-46530-1

  • Online ISBN: 978-3-662-46531-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics