GpDL: A Spatially Aggregated Data Layout for Long-Term Astronomical Observation Archive
A great number of excellent astronomical academic achievements are built on historical observation data. So long-term astronomical observation archive has great significance for astronomical research. At the observation site, data from different sky areas shot in a consecutive time period are stored in one disk. So original data layout is temporally aggregated and spatially scattered. After an observation cycle, data are backuped into long-term astronomical observation archive. Astronomers request data from archive. But original data layout does not match requests’ spatial locality, i.e., one request focuses on specific sky area during a time period. In this situation, archive adopting original data layout consumes lots of energy and shortens disk life. Therefore, a reorganized spatially aggregated data layout is indispensable for archive. But how to aggregate observation data from nearby sky areas into one disk while keeping high disk capacity utilization is challenging. In this paper, we propose a spatially aggregated data layout based on HEALPix and graph partition for long-term astronomical observation archive, named GpDL. GpDL is generated based on distribution-known original data layout before observation data are backuped into archive. GpDL saves a lot of resources for archive while keeping up to 91% disk capacity utilization. In simulation experiments, compared with TaDL (original temporally aggregated data layout) and AmrDL (another spatially aggregated data layout based on thought of Adaptive Mesh Refinement), GpDL effectively reduces open disks number and energy cost for the same requests.
KeywordsSpatially aggregated Data layout Astronomical observation Long-term archive Energy cost
This work is supported by the Joint Research Fund in Astronomy (U1531111, U1731423, U1731125) under cooperative agreement between the National Natural Science Foundation of China (NSFC) and Chinese Academy of Sciences (CAS), the National Natural Science Foundation of China (11573019, 61602336).
- 1.Cui, X., Yuan, X., Gong, X.: Antarctic schmidt telescopes (AST3) for dome A. In: Ground-Based and Airborne Telescopes II, vol. 7012, p. 70122D. International Society for Optics and Photonics (2008)Google Scholar
- 2.Gong, Z., et al.: Multi-level layout optimization for efficient spatio-temporal queries on ISABELA-compressed data. In: 2012 IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS), pp. 873–884. IEEE (2012)Google Scholar
- 5.He, Y.Q., Sun, S.X.: A data layout and access control strategies of the video storage server based disk array. In: 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing, IIHMSP 2008, pp. 433–437. IEEE (2008)Google Scholar
- 6.Hong, Z., et al.: AQUAdex: a highly efficient indexing and retrieving method for astronomical big data of time series images. In: Wang, G., Zomaya, A., Perez, G.M., Li, K. (eds.) ICA3PP 2015. LNCS, vol. 9529, pp. 92–105. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-27122-4_7CrossRefGoogle Scholar
- 8.Huang, D., Zhang, X., Shi, W., Zheng, M., Jiang, S., Qin, F.: LiU: hiding disk access latency for HPC applications with a new SSD-enabled data layout. In: 2013 IEEE 21st International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 111–120. IEEE (2013)Google Scholar
- 9.Huang, H., Hung, W., Shin, K.G.: FS2: dynamic data replication in free disk space for improving disk performance and energy consumption. In: ACM SIGOPS Operating Systems Review, vol. 39, pp. 263–276. ACM (2005)Google Scholar
- 12.Rubin, S., Bodík, R., Chilimbi, T.: An efficient profile-analysis framework for data-layout optimizations. In: ACM SIGPLAN Notices, vol. 37, pp. 140–153. ACM (2002)Google Scholar
- 13.Son, S.W., Chen, G., Kandemir, M.: Disk layout optimization for reducing energy consumption. In: Proceedings of the 19th Annual International Conference on Supercomputing, pp. 274–283. ACM (2005)Google Scholar
- 16.Xiao, L., Yu-An, T.: TPL: a data layout method for reducing rotational latency of modern hard disk drive. In: 2009 WRI World Congress on Computer Science and Information Engineering, vol. 7, pp. 336–340. IEEE (2009)Google Scholar