Abstract
The performance of data processing in distributed information systems strongly depends on the efficient scheduling of the applications that access data at the remote sites. This work assumes a typical model of distributed information system where a central site is connected to a number of remote and highly autonomous remote sites. An application started by a user at a central site is decomposed into several data processing tasks to be independently processed at the remote sites. The objective of this work is to find a method for optimization of task processing schedules at a central site. We define an abstract model of data and a system of operations that implements the data processing tasks. Our abstract data model is general enough to represent many specific data models. We show how an entirely parallel schedule can be transformed into a more optimal hybrid schedule where certain tasks are processed simultaneously while the other tasks are processed sequentially. The transformations proposed in this work are guided by the cost-based optimization model whose objective is to reduce the total data transmission time between the remote sites and a central site. We show how the properties of data integration expressions can be used to find more efficient schedules of data processing tasks in distributed information systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ahmad, M., Aboulnaga, A., Babu, S.: Query interactions in database workloads. In: Proceedings of the Second International Workshop on Testing Database Systems, pp. 1–6 (2009)
Ahmad, M., Duan, S., Aboulnaga, A., Babu, S.: Predicting completion times of batch query workloads using interaction-aware models and simulation. In: Proceedings of the 14th International Conference on Extending Database Technology, pp. 449–460 (2011)
Braumandl, R., Keidl, M., Kemper, A., Kossmann, D., Kreutz, A., Seltzsam, S., Stocker, K.: ObjectGlobe: Ubiquitous query processing on the Internet. The VLDB Journal 10(1), 48–71 (2001)
Costa, R.L.-C., Furtado, P.: Runtime Estimations, Reputation and Elections for Top Performing Distributed Query Scheduling. In: Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 28–35 (2009)
Du, W., Krishnamurthy, R., Shan, M.-C.: Query Optimization in Heterogeneous DBMS. In: Proceedings of the 18th VLDB Conference, pp. 277–299 (1992)
Friedman, M., Levy, A., Millstein, T.: Navigational plans For Data Integration. In: Proceedings of the National Conference on Artificial Intelligence, pp. 67–73 (1999)
Ilarri, S., Mena, E., Illarramendi, A.: Location-dependent query processing: Where we are and where we are heading. ACM Computing Surveys 42(3), 1–73 (2010)
Lenzerini, M.: Data Integration: A Theoretical Perspective (2002)
Mishra, C., Koudas, N.: The design of a query monitoring system. ACM Transactions on Database Systems 34(1), 1–51 (2009)
Nam, B., Shin, M., Andrade, H., Sussman, A.: Multiple query scheduling for distributed semantic caches. Journal of Parallel and Distributed Computing 70(5), 598–611 (2010)
Harangsri, B., Shepherd, J., Ngu, A.: Query Classification in Multidatabase Systems. In: Proceedings of the 7th Australasian Database Conference, pp. 147–156 (1996)
Ives, Z.G., Green, T.J., Karvounarakis, G., Taylor, N.E., Tannen, V., Talukdar, P.P., Jacob, M., Pereira, F.: The ORCHESTRA Collaborative Data Sharing System. SIGMOD Record (2008)
Liu, L., Pu, C.: A Dynamic Query Scheduling Framework for Distributed and Evolving Information Systems. In: Proceedings of the 17th International Conference on Distributed Computing Systems (1997)
Lu, H., Ooi, B.-C., Goh, C.-H.: Multidatabase Query Optimization: Issues and Solutions. In: Proceedings RIDE-IMS 1993, Research Issues in Data Engineering: Interoperability in Multidatabase Systems, pp. 137–143 (April 1993)
Ozcan, F., Nural, S., Koksal, P., Evrendilek, C., Dogac, A.: Dynamic Query Optimization in Multidatabases. Bulletin of the Technical Committee on Data Engineering 20(3), 38–45 (1997)
Sheth, A.P., Larson, J.A.: Federated Database Systems for Managing Distributed, Heterogeneous, and Autonomous Databases. ACM Computing Surveys 22(3), 183–236 (1990)
Srinivasan, V., Carey, M.J.: Compensation-Based On-Line Query Processing. In: Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data, pp. 331–340 (1992)
Thain, D., Tannenbaum, T., Livny, M.: Distributed computing in practice: the Condor experience: Research Articles. Concurrency Computing: Practice and Experience 17(2-4), 323–356 (2005)
Wache, H., Vogele, T., Visser, U., Stuckenschmidt, H., Schuster, G., Neuman, H., Hubner, S.: Ontology-Based Integration of information - A Survey of Existing Approaches (2001)
Zhou, Y., Ooi, B.C., Tan, K.-L., Tok, W.H.: An adaptable distributed query processing architecture. Data and Knowledge Engineering 53(3), 283–309 (2005)
Zhu, Q., Larson, P.A.: Solving Local Cost Estimation Problem for Global Query Optimization in Multidatabase Systems. Distributed and Parallel Databases 6(4), 373–420 (1998)
Ziegler, P.: Three Decades of Data Integration - All problems Solved? In: 18th IFIP World Computer Congress, vol. 12 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Getta, J.R. (2011). Optimization of Task Processing Schedules in Distributed Information Systems. In: Abd Manaf, A., Sahibuddin, S., Ahmad, R., Mohd Daud, S., El-Qawasmeh, E. (eds) Informatics Engineering and Information Science. ICIEIS 2011. Communications in Computer and Information Science, vol 253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25462-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-25462-8_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25461-1
Online ISBN: 978-3-642-25462-8
eBook Packages: Computer ScienceComputer Science (R0)