Abstract
In this paper, we propose and investigate a dynamic heuristic mapping algorithm with lookup table optimizations. Distributed and parallel computing are trends due to the performance requirement of modern applications. Application mapping in a multiprocessor system is therefore critical due to the dynamic and unpredictable nature of the applications. We analyse the communication delay among different tasks in an application. A fundamental algorithm is analysed to optimize the average delay of the mapping region. We discuss and evaluate the effectiveness of the algorithm in terms of average intra-application latency. Results from synthetic applications revealed that average latencies from the mapping regions of the fundamental algorithm have reduced up to 23 % compared with the incremental mapping. By noticing the time overhead of the algorithm due to extra number of search spaces, we introduce a mechanism with lookup tables to speed up the process of searching optimized mapping regions. The lookup table is examined with both size and construction time. Experiments shown that the lookup table is small enough to fit into the cache, and the table can be constructed in milliseconds in most practical cases. The results from real applications show that the average execution time of applications of the proposed algorithm has reduced by 15.2 % compared with the first fit algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT 2008, pp. 72–81. ACM, New York (2008)
Chen, Y.J., Yang, C.L., Chang, Y.S.: An architectural co-synthesis algorithm for energy-aware network-on-chip design. J. Syst. Archit. 55(5–6), 299–309 (2009)
Chou, C.L., Ogras, U., Marculescu, R.: Energy- and performance-aware incremental mapping for networks on chip with multiple voltage levels. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 27(10), 1866–1879 (2008)
Dally, W., Towles, B.: Principles Practices Interconnection Netw. Morgan Kaufmann Publishers Inc., San Francisco (2003)
Demaine, E.D., Fekete, S.P., Rote, G., Schweer, N., Schymura, D., Zelke, M.: Integer point sets minimizing average pairwise distance: what is the optimal shape of a town? Comput. Geom. 44(2), 82–94 (2011). Special issue of selected papers from the 21st Annual Canadian Conference on Computational Geometry
Fattah, M., Rahmani, A.M., Xu, T., Kanduri, A., Liljeberg, P., Plosila, J., Tenhunen, H.: Mixed-criticality run-time task mapping for noc-based many-core systems. In: 2014 22nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 458–465, February 2014
Fleig, T., Mattes, O., Karl, W.: Evaluation of adaptive memory management techniques on the tilera tile-gx platform. In: 2014 27th International Conference on Architecture of Computing Systems (ARCS), pp. 1–8, February 2014
Ghosh, A., Paul, S., Bhunia, S.: Energy-efficient application mapping in FPGA through computation in embedded memory blocks. In: 2012 25th International Conference on VLSI Design (VLSID), pp. 424–429, January 2012
Hu, J., Marculescu, R.: Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE 2004, vol. 1, p. 10234. IEEE Computer Society, Washington, DC (2004)
Hyde, R.: The Art of Assembly Language, 2nd edn. No Starch Press, San Francisco (2010)
LaCouvee, D.: Fact or fiction: Android apps only use one CPU core, December 2015. http://www.androidauthority.com/fact-or-fiction-android-apps-only-use-one-cpu-core-610352/
Lei, T., Kumar, S.: A two-step genetic algorithm for mapping task graphs to a network on chip architecture. In: 2003 Proceedings of Euromicro Symposium on Digital System Design, pp. 180–187 (2003)
Leung, V.J., Sabin, G., Sadayappan, P.: Parallel job scheduling policies to improve fairness: a case study. In: 39th International Conference on Parallel Processing, ICPP. Workshops 2010, San Diego, California, USA, 13–16 September, pp. 346–353 (2010)
Leutenegger, S.T., Vernon, M.K.: The performance of multiprogrammed multiprocessor scheduling algorithms. SIGMETRICS Perform. Eval. Rev. 18(1), 226–236 (1990)
Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., Werner, B.: Simics: a full system simulation platform. Computer 35(2), 50–58 (2002)
Martin, M.M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. Computer Architecture News, September 2005
Mediatek: Helio x20, December 2015. http://mediatek-helio.com/x20/
de Souza Carvalho, E., Calazans, N., Moraes, F.: Dynamic task mapping for MPSoCS. IEEE Des. Test Comput. 27(5), 26–35 (2010)
TGG: Task graph generator, July 2014. http://taskgraphgen.sourceforge.net/
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The splash-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, pp. 24–36, June 1995
Xu, T., Toivonen, J., Pahikkala, T., Leppanen, V.: BDMap: a heuristic application mapping algorithm for the big data era. In: 2014 IEEE 11th International Conference on Ubiquitous Intelligence and Computing and IEEE 11th International Conference on Autonomic and Trusted Computing, and IEEE 14th International Conference on Scalable Computing and Communications and Its Associated Workshops (UTC-ATC-ScalCom), pp. 821–828, December 2014
Xu, T.C., Leppänen, V.: DBFS: dual best-first search mapping algorithm for shared-cache multicore processors. In: Wang, G., Zomaya, A., Martinez Perez, G., Kenli, L. (eds.) ICA3PP 2015. LNCS, vol. 9528, pp. 185–198. Springer, Heidelberg (2015). doi:10.1007/978-3-319-27119-4_13
Xu, T.C., Liljeberg, P., Plosila, J., Tenhunen, H.: Exploration of heuristic scheduling algorithms for 3D multicore processors. In: Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems, SCOPES 2012, pp. 22–31. ACM, New York (2012)
Xu, T.C., Leppänen, V.: Cache- and communication-aware application mapping for shared-cache multicore processors. In: Pinho, L.M.P., Karl, W., Cohen, A., Brinkschulte, U. (eds.) ARCS 2015. LNCS, vol. 9017, pp. 55–67. Springer, Heidelberg (2015)
Xu, T.C., Liljeberg, P., Tenhunen, H.: A minimal average accessing time scheduler for multicore processors. In: Xiang, Y., Cuzzocrea, A., Hobbs, M., Zhou, W. (eds.) ICA3PP 2011, Part II. LNCS, vol. 7017, pp. 287–299. Springer, Heidelberg (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Xu, T.C., Leppänen, V. (2016). LUTMap: A Dynamic Heuristic Application Mapping Algorithm Based on Lookup Tables. In: Li, W., et al. Internet and Distributed Computing Systems. IDCS 2016. Lecture Notes in Computer Science(), vol 9864. Springer, Cham. https://doi.org/10.1007/978-3-319-45940-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-45940-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45939-4
Online ISBN: 978-3-319-45940-0
eBook Packages: Computer ScienceComputer Science (R0)