Exploring Data Locality for Clustered Enterprise Applications

  • Stoyan Garbatov
  • João Cachopo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8055)


Exploring data locality is crucial to achieve good performance on a distributed system. For many complex, constantly evolving applications, relying on programmers to write their code so as to explore data locality results often in sub-par performance. We propose an automatic approach for dealing with this problem. Instead of expecting programmers to identify data locality, the solution developed here relies on a stochastic analysis of the data-access patterns exhibited by the application at run-time. The analysis makes it possible to correlate not only domain data but application functionality as well. This information is used to explore data locality in clustered enterprise applications by combining two orthogonal and complementary approaches. The first approach reduces the memory foot-print by using a more compact in-memory representation for the application’s domain classes and, furthermore, by delaying the loading of less frequently accessed data. The second approach generates a new request distribution policy. It employs the Latent Dirichlet Allocation partitioning algorithm, generating sub-sets of highly correlated application functionality. Every cluster node is responsible for processing requests belonging to a single sub-set. The combination of these approaches allows cluster nodes to make better use of their memory, thereby increasing the computational efficiency of the system. The work has been validated on the TPC-W benchmark, demonstrating significant performance improvements.


heap management in-memory object representation persistence clustered web servers load balance locality awareness Latent Dirichlet Allocation scalability performance 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Amza, C., Cox, A.L., Zwaenepoel, W.: Conflict-aware scheduling for dynamic content applications. In: Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems, vol. 4, pp. 6–20. USENIX Association (2003)Google Scholar
  2. 2.
    Amza, C., Cox, A.L., Zwaenepoel, W.: A comparative evaluation of transparent scaling techniques for dynamic content servers. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), pp. 230–241. IEEE (2005)Google Scholar
  3. 3.
    Bhattacharya, S., Nanda, M.G., Gopinath, K., Gupta, M.: Reuse, Recycle to De-bloat Software. In: Mezini, M. (ed.) ECOOP 2011. LNCS, vol. 6813, pp. 408–432. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  4. 4.
    Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  5. 5.
    Cardellini, V., Casalicchio, E., Colajanni, M., Yu, P.: The state of the art in locally distributed Web-server systems. ACM Computing Surveys (CSUR) 34(2), 263–311 (2002)CrossRefGoogle Scholar
  6. 6.
    Chis, A.E., Mitchell, N., Schonberg, E., Sevitsky, G., O’Sullivan, P., Parsons, T., Murphy, J.: Patterns of Memory Inefficiency. In: Mezini, M. (ed.) ECOOP 2011. LNCS, vol. 6813, pp. 383–407. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  7. 7.
    Denning, P.J., Schwartz, S.C.: Properties of the working-set model. Communications of the ACM 15(3), 191–198 (1972)MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    Elnikety, S., Dropsho, S., Zwaenepoel, W.: Tashkent+: Memory-aware load balancing and update filtering in replicated databases. ACM SIGOPS Operating Systems Review 41(3), 399–412 (2007)CrossRefGoogle Scholar
  9. 9.
    Fernandes, S., Cachopo, J.: Strict serializability is harmless: a new architecture for enterprise applications. In: Proceedings of the ACM International Conference on Object-Oriented Programming Systems, Languages and Applications, Portland, Oregon, USA, pp. 257–276. ACM (2011)Google Scholar
  10. 10.
    Garbatov, S., Cachopo, J.: Importance Analysis for Predicting Data Access Behaviour in Object-Oriented Applications. Journal of Computer Science and Technologies 14(1), 37–43 (2010)Google Scholar
  11. 11.
    Garbatov, S., Cachopo, J.: Predicting Data Access Patterns in Object-Oriented Applications Based on Markov Chains. In: Proceedings of the Fifth International Conference on Software Engineering Advances (ICSEA 2010), Nice, France, pp. 465–470 (2010)Google Scholar
  12. 12.
    Garbatov, S., Cachopo, J.: Data Access Pattern Analysis and Prediction for Object-Oriented Applications. INFOCOMP Journal of Computer Science 10(4), 1–14 (2011)Google Scholar
  13. 13.
    Garbatov, S., Cachopo, J.: Optimal Functionality and Domain Data Clustering based on Latent Dirichlet Allocation. In: Proceedings of the Sixth International Conference on Software Engineering Advances (ICSEA 2011), Barcelona, Spain, pp. 245–250. ThinkMind (2011)Google Scholar
  14. 14.
    Garbatov, S., Cachopo, J.: Decreasing Memory Footprints for Better Enterprise Java Application Performance. In: Liddle, S.W., Schewe, K.-D., Tjoa, A.M., Zhou, X. (eds.) DEXA 2012, Part I. LNCS, vol. 7446, pp. 430–437. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  15. 15.
    Garbatov, S., Cachopo, J.: Explicit use of working-set correlation for load-balancing in clustered web servers. In: Proceedings of the Seventh International Conference on Software Engineering Advances (ICSEA 2012), Lisbon, Portugal (2012) (in print)Google Scholar
  16. 16.
    Garbatov, S., Cachopo, J., Pereira, J.: Data Access Pattern Analysis based on Bayesian Updating. In: Proceedings of the First Symposium of Informatics (INForum 2009), Lisbon, Paper 23 (2009)Google Scholar
  17. 17.
    Jones, R.E., Ryder, C.: A study of Java object demographics. In: Proceedings of the 7th International Symposium on Memory Management, Tucson, AZ, USA, pp. 121–130. ACM (2008)Google Scholar
  18. 18.
    Pai, V., Aron, M., Banga, G., Svendsen, M., Druschel, P., Zwaenepoel, W., Nahum, E.: Locality-aware request distribution in cluster-based network servers. In: Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, California, United States, pp. 205–216. ACM (1998)Google Scholar
  19. 19.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53–65 (1987)zbMATHCrossRefGoogle Scholar
  20. 20.
    Smith, W.: TPC-W: Benchmarking An Ecommerce Solution. Intel Corporation (2000)Google Scholar
  21. 21.
    Zhang, Q., Riska, A., Sun, W., Smirni, E., Ciardo, G.: Workload-aware load balancing for clustered web servers. IEEE Transactions on Parallel and Distributed Systems 16(3), 219–233 (2005)CrossRefGoogle Scholar
  22. 22.
    Zhong, M., Shen, K., Seiferas, J.: Correlation-Aware Object Placement for Multi-Object Operations. In: Proceedings of the 2008 the 28th International Conference on Distributed Computing Systems, pp. 512–521. IEEE Computer Society (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Stoyan Garbatov
    • 1
  • João Cachopo
    • 1
  1. 1.INESC-ID Lisboa / Instituto Superior TécnicoUniversidade Técnica de LisboaLisboaPortugal

Personalised recommendations