Skip to main content

Transparent Forecasting Strategies in Database Management Systems

  • Chapter
Business Intelligence (eBISS 2013)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 172))

Included in the following conference series:

  • 1581 Accesses

Abstract

Whereas traditional data warehouse systems assume that data is complete or has been carefully preprocessed, increasingly more data is imprecise, incomplete, and inconsistent. This is especially true in the context of big data, where massive amount of data arrives continuously in real-time from vast data sources. Nevertheless, modern data analysis involves sophisticated statistical algorithm that go well beyond traditional BI and, additionally, is increasingly performed by non-expert users. Both trends require transparent data mining techniques that efficiently handle missing data and present a complete view of the database to the user. Time series forecasting estimates future, not yet available, data of a time series and represents one way of dealing with missing data. Moreover, it enables queries that retrieve a view of the database at any point in time — past, present, and future. This article presents an overview of forecasting techniques in database management systems. After discussing possible application areas for time series forecasting, we give a short mathematical background of the main forecasting concepts. We then outline various general strategies of integrating time series forecasting inside a database and discuss some individual techniques from the database community. We conclude this article by introducing a novel forecasting-enabled database management architecture that natively and transparently integrates forecast models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, D., Chen, D., Lin, L., Shanmugasundaram, J., Vee, E.: Forecasting high-dimensional data. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 1003–1012 (2010)

    Google Scholar 

  2. Akdere, M., Çetintemel, U., Upfal, E.: Database-support for continuous prediction queries over streaming data. Proc. VLDB Endowment 3, 1291–1301 (2010)

    Article  Google Scholar 

  3. Akdere, M., Cetintemel, U., Riondato, M., Upfal, E., Zdonik, S.B.: The case for predictive database systems: opportunities and challenges. In: Fifth Biennial Conference on Innovative Data Systems Research, pp. 167–174 (2011)

    Google Scholar 

  4. Alur, N., Haas, P., Momiroska, D., Read, P., Summers, N., Totanes, V., Zuzarte, C.: DB2 UDB’s High Function Business Intelligence in e-Business. IBM Redbook Series (2002)

    Google Scholar 

  5. Andersen, T.G., Bollerslev, T., Lange, S.: Forecasting financial market volatility: sample frequency vis-a-vis forecast horizon. J. Empirical Finan. 6, 457–477 (1999)

    Article  Google Scholar 

  6. Apache. Apache Mahout (2013). http://mahout.apache.org/

  7. Ballard, C., Rollins, J., Ramos, J., Perkins, A., Hale, R., Doerneich, A., Milner, E.C., Chodagam, J.: Dynamic Warehousing: Data Mining Made Easy. IBM Redbooks Series (2007). http://www.redbooks.ibm.com/redbooks/pdfs/sg247418.pdf

  8. Bontempi, G., Ben Taieb, S., Le Borgne, Y.-A.: Machine learning strategies for time series forecasting. In: Aufaure, M.-A., Zimányi, E. (eds.) eBISS 2012. LNBIP, vol. 138, pp. 62–77. Springer, Heidelberg (2013)

    Google Scholar 

  9. Box, G.E.P., Jenkins, G.M., Reinsel, G.C.: Time Series Analysis: Forecasting and Control, 4th edn. Wiley, New York (2008)

    Book  Google Scholar 

  10. Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. Prentice Hall, Englewood Clifs (2002)

    Book  Google Scholar 

  11. Brown, P.G.: Overview of sciDB: large scale array storage, processing and analysis. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 963–968 (2010)

    Google Scholar 

  12. Cetintas, S., Chen, D., Si, L., Shen, B., Datbayev, Z.: Forecasting counts of user visits for online display advertising with probabilistic latent class models. In: International Conference on Research and Development in Information Retrieval, pp. 1217–1218 (2011)

    Google Scholar 

  13. Chatfield, C.: Time-Series Forecasting. Chapman & Hall, Boca Raton (2000)

    Google Scholar 

  14. Chaudhuri, S., Narasayya, V., Sarawagi, S.: Efficient evaluation of queries with mining predicates. In: Proceedings of the 18th International Conference on Data Engineering, pp. 529–540 (2002)

    Google Scholar 

  15. Cohen, J., Dolan, B., Dunlap, M., Hellerstein, J.M., Welton, C.: MAD skills: new analysis practices for big data. Proc. VLDB Endowment 2, 1481–1492 (2009)

    Article  Google Scholar 

  16. Dannecker, L., Böhm, M., Lehner, W., Hackenbroich, G.: Forcasting evolving time series of energy demand and supply. In: Eder, J., Bielikova, M., Tjoa, A.M. (eds.) ADBIS 2011. LNCS, vol. 6909, pp. 302–315. Springer, Heidelberg (2011)

    Google Scholar 

  17. Dannecker, L., Schulze, R., Böhm, M., Lehner, W., Hackenbroich, G.: Context-aware parameter estimation for forecast models in the energy domain. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 491–508. Springer, Heidelberg (2011)

    Google Scholar 

  18. Das, S., Sismanis, Y., Beyer, K.S., Gemulla, R., Haas, P.J., McPherson, J.: Ricardo: integrating r and hadoop. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 987–998 (2010)

    Google Scholar 

  19. Deshpande, A., Madden, S.: MauveDB: supporting model-based user views in database systems. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 73–84 (2006)

    Google Scholar 

  20. Duan, S., Babu, S.: Processing forecasting queries. In: Proceedings of the VLDB Endowment, pp. 711–722 (2007)

    Google Scholar 

  21. European Commission. Energy Roadmap 2050. Brussels (2011)

    Google Scholar 

  22. Fang, L., LeFevre, K.: Splash: ad-hoc querying of data and statistical models. In: Proceedings of the 13th International Conference on Extending Database Technology, pp. 275–286 (2010)

    Google Scholar 

  23. Feng, H.: Performance problems of forecasting systems. In: 15th East-European Conference on Advances in Databases and Information Systems, pp. 254–261 (2011)

    Google Scholar 

  24. Feng, X., Kumar, A., Recht, B., Ré, C.: Towards a unified architecture for in-rdbms analytics. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 325–336 (2012)

    Google Scholar 

  25. Fischer, U., Dannecker, L., Siksnys, L., Rosenthal, F., Boehm, M., Lehner, W.: Towards integrated data analytics: time series forecasting in dbms. Datenbank-Spektrum, 1–9 (2012)

    Google Scholar 

  26. Fischer, U., Kaulakienė, D., Khalefa, M.E., Lehner, W., Pedersen, T.B., Šikšnys, L., Thomsen, C.: Real-time business intelligence in the MIRABEL smart grid system. In: Castellanos, M., Dayal, U., Rundensteiner, E.A. (eds.) BIRTE 2012. LNBIP, vol. 154, pp. 1–22. Springer, Heidelberg (2013)

    Google Scholar 

  27. Fischer, U., Rosenthal, F., Böhm, M., Lehner, W.: Indexing forecast models for matching and maintenance. In: IDEAS, pp. 26–31 (2010)

    Google Scholar 

  28. Fischer, U., Rosenthal, F., Lehner, W.: F2DB: the flash-forward database system. In: Proceedings of the 28th International Conference on Data Engineering, pp. 1245–1248 (2012)

    Google Scholar 

  29. Fischer, U., Rosenthal, F., Lehner, W.: Sample-based forecasting exploiting hierarchical time series. In: Proceedings of the 16th International Database Engineering and Applications Sysmposium, pp. 120–129 (2012)

    Google Scholar 

  30. Fischer, U., Schildt, C., Hartmann, C., Lehner, W.: Forecasting the data cube: a model configuration advisor for multi-dimensional data sets. In: Proceedings of the 29th International Conference on Data Engineering (2013)

    Google Scholar 

  31. Fliedner, G.: Hierarichal forecasting issues and use guidelines. Ind. Manage. Data Syst. 101, 5–12 (2001)

    Article  Google Scholar 

  32. Ganapathi, A., Kuno, H., Dayal, U., Wiener, J.L., Fox, A., Jordan, M., Patterson, D.: Predicting multiple metrics for queries: better decisions enabled by machine learning. In: Proceedings of the 25th International Conference on Data Engineering, pp. 592–603 (2009)

    Google Scholar 

  33. Gardner Jr, E.S.: Exponential smoothing: the state of the art. Int. J. Forecast. 4, 1–28 (1985)

    Article  Google Scholar 

  34. Ge, T., Zdonik, S.B.: A skip-list approach for efficiently processing forecasting queries. Proc. VLDB Endowment 1, 984–995 (2008)

    Article  Google Scholar 

  35. Ghoting, A., Krishnamurthy, R., Pednault, E., Reinwald, B., Sindhwani, V., Tatikonda, S., Tian, Y., Vaithyanathan, S.: SystemML: declarative machine learning on mapreduce. In: Proceedings of the 2011 IEEE 27th International Conference on Data Engineering, pp. 231–242 (2011)

    Google Scholar 

  36. Gooijera, J.G.D., Hyndman, R.J.: 25 years of time series forecasting. Int. J. Forecast. 22, 443–473 (2006)

    Article  Google Scholar 

  37. Große, P., Lehner, W., Weichert, T., Färber, F., Li, W.-S.: Bridging two worlds with rice integrating r into the sap in-memory computing engine. Proc. VLDB Endowment 4, 1307–1317 (2011)

    Article  Google Scholar 

  38. Grumbach, S., Rigaux, P., Segoufin, L.: Manipulating interpolated data is easier than you thought. In: Proceedings of the 26th International Conference on Very Large Data Bases, pp. 156–165 (2000)

    Google Scholar 

  39. Harries, M., Horn, K.: Detecting concept drift in financial time series prediction using symbolic machine learning. In: Proceedings of the 8th Australian Joint Conference on Artificial Intelligence, pp. 91–98 (1995)

    Google Scholar 

  40. Holt, C.C.: Forecasting seasonals and trends by exponentially weighted moving averages. Int. J. Forecast. 20, 5–10 (2004)

    Article  Google Scholar 

  41. Hyndman, R.J., Khandakar, Y.: Automatic time series forecasting: the forecast package for R. J. Stat. Softw. 27, 1–22 (2008)

    Article  Google Scholar 

  42. Hyndman, R.J., Koehler, A.B., Snyder, R.D., Grose, S.: A state space framework for automatic forecasting using exponential smoothing methods. Int. J. Forecast. 18, 439–454 (2002)

    Article  Google Scholar 

  43. Hyndman, R.J., Kostenko, A.V.: Minimum sample size requirements for seasonal forecasting models. Foresight: the Int. J. Appl Forecast. 6, 12–15 (2007)

    Google Scholar 

  44. Ikonomovska, E., Gama, J., Džeroski, S.: Learning model trees from evolving data streams. Data Min. Knowl. Discov. 23, 128–168 (2011)

    Article  Google Scholar 

  45. Imieliński, T., Virmani, A.: Msql: a query language for database mining. Data Min. Knowl. Discov. 3, 373–408 (1999)

    Article  Google Scholar 

  46. Jampani, R., Xu, F., Wu, M., Perez, L.L., Jermaine, C., Haas, P.J.: Mcdb: a monte carlo approach to managing uncertain data. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 687–700 (2008)

    Google Scholar 

  47. Kimball, R., Ross, M.: The Data Warehouse Toolkit. Wiley, New York (2002)

    Google Scholar 

  48. Koc, M.L., Ré, C.: Incrementally maintaining classification using an rdbms. Proc. VLDB Endowment 4, 302–313 (2011)

    Article  Google Scholar 

  49. Kraska, T., Talwalkar, A., Duchi, J., Griffith, R., Franklin, M.J., Jordan, M.: Mlbase:a distributed machine learning system. In: 6th Biennial Conference on Innovative Data Systems Research (2013)

    Google Scholar 

  50. Kusters, U., McCullough, B., Bell, M.: Forecasting software: past, present and future. Int. J. Forecast. 22, 599–615 (2006)

    Article  Google Scholar 

  51. Lazarescu, M.M., Venkatesh, S., Bui, H.H.: Using multiple windows to track concept drift. Intell. Data Anal. J., 1–28 (2003)

    Google Scholar 

  52. Li, M., Ganesan, D., Shenoy, P.: Presto: feedback-driven data management in sensor networks. In: Proceedings of the 3rd Conference on Networked Systems Design & Implementation, pp. 23–23 (2006)

    Google Scholar 

  53. Makridakis, S.: Accuracy measures: theoretical and practical concerns. Int. J. Forecast. 9, 527–529 (1993)

    Article  Google Scholar 

  54. Makridakis, S., Hibon, M.: The M3-Competition: results, conclusions and implications. Int. J. Forecast. 16, 451–476 (2000)

    Article  Google Scholar 

  55. Matlab. The language of technical computing (2012). http://www.mathworks.com/products/matlab/

  56. Meek, C., Chickering, D.M., Heckerman, D.: Autoregressive tree models for time-series analysis. In: SIAM International Conference on Data Mining (2002)

    Google Scholar 

  57. Mentzer, J.T., Bienstock, C.C.: The seven principles of sales-forecasting systems. Supply Chain, Manage. Rev. 11, 76–83 (1998)

    Google Scholar 

  58. Milenova, B.L., Yarmus, J.S., Campos, M.M.: Svm in oracle database 10g: removing the barriers to widespread adoption of support vector machines. In: Proceedings of the VLDB Endowment, pp. 1152–1163 (2005)

    Google Scholar 

  59. Mills, T.C.: Time Series Techniques for Economists. Business & Economics (1991)

    Google Scholar 

  60. Müller, K.-R., Smola, A.J., Rätsch, G., Schölkopf, B., Kohlmorgen, J., Vapnik, V.: Predicting time series with support vector machines. In: Gerstner, W., Hasler, M., Germond, A., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 999–1004. Springer, Heidelberg (1997)

    Google Scholar 

  61. Oracle OLAP DML Reference 11g. Forecast - dml statement (2012). http://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_commands_1052.htm

  62. Oracle R. Enterprise user’s guide (2012). http://docs.oracle.com/cd/E27988_01/doc/doc.112/e26499.pdf

  63. Oracle White Paper. Oracle data mining 11g release 2 - competing on in-database analytics (2012)

    Google Scholar 

  64. Ordonez, C.: Programming the k-means clustering algorithm in sql. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 823–828 (2004)

    Google Scholar 

  65. Ordonez, C., Pitchaimalai, S.K.: Bayesian classifiers programmed in sql. IEEE Trans. Knowl. Data Eng. 22, 139–144 (2010)

    Article  Google Scholar 

  66. Ordonez, C., Pitchaimalai, S.K.: One-pass data mining algorithms in a dbms with udfs. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, pp. 1217–1220 (2011)

    Google Scholar 

  67. Panda, B., Herbach, J.S., Basu, S., Bayardo, R.J.: Planet: massively parallel learning of tree ensembles with mapreduce. Proc. VLDB Endowment 2, 1426–1437 (2009)

    Article  Google Scholar 

  68. Papadimitriou, S., Sun, J., Faloutsos, C.: Streaming pattern discovery in multiple time-series. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 697–708 (2005)

    Google Scholar 

  69. Parisi, F., Sliva, A., Subrahmanian, V.S.: Embedding forecast operators in databases. In: Benferhat, S., Grant, J. (eds.) SUM 2011. LNCS, vol. 6929, pp. 373–386. Springer, Heidelberg (2011)

    Google Scholar 

  70. PostgreSQL (2012). http://www.postgresql.org/

  71. R Development Core Team. R: A language and environment for statistical computing, reference index version 2.1.1. R Foundation for Statistical Computing (2012). http://www.r-project.org

  72. Ramanathan, R., Engle, R., Granger, C.W.J., Vahid-Araghi, F., Brace, C.: Short-run forecasts of electricity loads and peaks. Int. J. Forecast. 13(2), 161–174 (1997)

    Article  Google Scholar 

  73. Rosenthal, F., Lehner, W.: Efficient in-database maintenance of ARIMA models. In: Bayard Cushing, J., French, J., Bowers, S. (eds.) SSDBM 2011. LNCS, vol. 6809, pp. 537–545. Springer, Heidelberg (2011)

    Google Scholar 

  74. Sadri, R., Zaniolo, C., Zarkesh, A.M., Adibi, J.: A sequential pattern query language for supporting instant data mining for e-services. In: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 653–656 (2001)

    Google Scholar 

  75. SAS. Business intelligence software (2012). http://www.sas.com

  76. Schmidberger, M., Morgan, M., Eddelbuettel, D., Yu, H., Tierney, L., Mansmann, U.: State-of-the-art in parallel computing with R. J. Stat. Softw. 31, 1–27 (2009)

    Article  Google Scholar 

  77. Shalev-Shwartz, S., Srebro, N.: SVM optimization: inverse dependence on training set size. In: Proceedings of the 25th International Conference on Machine Learning, pp. 928–935 (2008)

    Google Scholar 

  78. SPSS. IBM SPSS Statistics (2012). http://www-01.ibm.com/software/analytics/spss/

  79. SQL Server. Data Mining Algorithms - Books Online for SQL Server 2012 (2012). http://msdn.microsoft.com/en-us/library/ms175595.aspx

  80. Thiagarajan, A., Madden, S.: Querying continuous functions in a database system. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 791–804 (2008)

    Google Scholar 

  81. Tulone, D., Madden, S.: PAQ: time series forecasting for approximate query answering in sensor networks. In: Römer, K., Karl, H., Mattern, F. (eds.) EWSN 2006. LNCS, vol. 3868, pp. 21–37. Springer, Heidelberg (2006)

    Google Scholar 

  82. Turner, J.: The planning of guaranteed targeted display advertising. Oper. Res. 60, 18–33 (2012)

    Article  Google Scholar 

  83. Wagner, N., Michalewicz, Z., Khouja, M., McGregor, R.: Time series forecasting for dynamic environments: the dyfor genetic program model. IEEE Trans. Evol. Comput. 11, 433–452 (2007)

    Article  Google Scholar 

  84. Wang, H., Zaniolo, C., Luo, C.R.: ATLAS: a small but complete sql extension for data mining and data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 1113–1116 (2003)

    Google Scholar 

  85. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23, 69–101 (1996)

    Google Scholar 

  86. Yi, B., Sidiropoulos, N.D., Johnson, T., Jagadish, H.V., Faloutsos, C., Biliris, A.: Online data mining for co-evolving time sequences. In: Proceedings of the 16th International Conference on Data Engineering, pp. 13–22 (2000)

    Google Scholar 

  87. Zhang, C., Sun, S., Yu, G.: A bayesian network approach to time series forecasting of short-term traffic flows. In: Proceedings of the 7th International IEEE Conference on Intelligent Transportation Systems, pp. 216–221 (2004)

    Google Scholar 

  88. Zhang, G., Eddy-Patuwo, B., Hu, M.Y.: Forecasting with artificial neural networks: the state of the art. Int. J. Forecast. 14, 35–62 (1998)

    Article  Google Scholar 

  89. Zhang, Y., Zhang, W., Yang, J.: I/O-efficient statistical computing with RIOT. In: Proceedings of the 26th International Conference on Data Engineering, pp. 1157–1160 (2010)

    Google Scholar 

  90. Zinkevich, M.: Online convex programming and generalized infinitesimal gradient ascent. In: Proceedings of the 20th International Conference on Machine Learning, pp. 928–936 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ulrike Fischer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Fischer, U., Lehner, W. (2014). Transparent Forecasting Strategies in Database Management Systems. In: Zimányi, E. (eds) Business Intelligence. eBISS 2013. Lecture Notes in Business Information Processing, vol 172. Springer, Cham. https://doi.org/10.1007/978-3-319-05461-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05461-2_5

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05460-5

  • Online ISBN: 978-3-319-05461-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics