An Efficient Retrieval Method for Astronomical Catalog Time Series Data

  • Bingyao Li
  • Ce YuEmail author
  • Xiaoteng Hu
  • Jian Xiao
  • Shanjiang TangEmail author
  • Lianmeng Li
  • Bin Ma
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11334)


Astronomical catalog time series data refer to the data collected at different time, which can provide a comprehensive understanding of the celestial objects’ attributes and expose various astronomical phenomena. Its retrieval is indispensable to astronomy research. However, the existing time series data retrieval methods involve lots of manual work and extremely time-consuming. The complexity will also be augmented by the exponentially growth of observation data. In this paper, we propose an automatic and efficient retrieval method for astronomical catalog time series data. With the goal of identifying the same celestial objects time series data automatically, a cross-match scheme is designed, which labeled a unique MatchID for each record matched with the datum catalog. To accelerate the matching process, an in-memory index structure based on Redis is specially designed, which enables matching speed 1.67 times faster than that of MySQL in massive amounts of data. Moreover, Catalog-Mongo—an improved database of MongoDB—is presented, in which a Data Blocking Algorithm is proposed to improve the data partitioning of MongoDB and accelerate query performance. The experimental results show that the query speed is about 2 times faster than MongoDB and 7.6 to 8.7 times than MySQL.


Astronomical catalog Cross-match Distributed retrieval method MongoDB Time series data 



This work is supported by the Joint Research Fund in Astronomy (U1531111, U1731243, U1731125) under cooperative agreement between the National Natural Science Foundation of China (NSFC) and Chinese Academy of Sciences (CAS), the National Natural Science Foundation of China (11573019, 61602336).


  1. 1.
    Berriman, G.B., Groom, S.L.: How will astronomy archives survive the data tsunami? Commun. ACM 54(12), 52–56 (2011)CrossRefGoogle Scholar
  2. 2.
    Boch, T., Pineau, F.X., Derriere, S.: CDS xMatch service documentation (2016)Google Scholar
  3. 3.
    Brown, P.G.: Overview of SciDB: large scale array storage, processing and analysis. In: ACM SIGMOD International Conference on Management of Data, pp. 963–968 (2010)Google Scholar
  4. 4.
    Budavari, T., Lee, M.A.: Xmatch: GPU enhanced astronomic catalog cross-matching. Astrophysics Source Code Library, p. 03021 (2013)Google Scholar
  5. 5.
    Chilingarian, I., Bartunov, O., Richter, J., Sigaev, T.: PostgreSQL: the suitable DBMS solution for astronomy and astrophysics. Astron. Data Anal. Softw. Syst. (ADASS) 314, 225 (2004)Google Scholar
  6. 6.
    Chodorow, K.: MongoDB: The Definitive Guide. O’Reilly Media, Inc., Sebastopol (2013)Google Scholar
  7. 7.
    Damodaran, B.D., Salim, S., Vargese, S.M.: Performance evaluation of MySQL and MongoDB databases. Int. J. Cybern. Inform. 5, 387–394 (2016)Google Scholar
  8. 8.
    Fan, D., Budav, S.T.R., Norris, P.R., Hopkins, M.A.: Matching radio catalogues with realistic geometry: application to SWIRE and ATLAS. Mon. Not. R. Astron. Soc. 451(2), 1299–1305 (2015)CrossRefGoogle Scholar
  9. 9.
    Gray, J., Nieto-Santisteban, M.A., Szalay, A.S.: The zones algorithm for finding points-near-a-point or cross-matching spatial datasets. Microsoft Research (2007)Google Scholar
  10. 10.
    Górski, K.M.: HEALPix: a framework for high-resolution discretization and fast analysis of data distributed on the sphere. Astrophys. J. 622(2), 759–771 (2004)CrossRefGoogle Scholar
  11. 11.
    Huijse, P., Estevez, P.A., Protopapas, P., Principe, J.C., Zegers, P.: Computational intelligence challenges and applications on large-scale astronomical time series databases. IEEE Comput. Intell. Mag. 9(3), 27–39 (2015)CrossRefGoogle Scholar
  12. 12.
    Jia, X., Luo, Q.: Multi-assignment single joins for parallel cross-match of astronomic catalogs on heterogeneous clusters. In: Proceedings of the 28th International Conference on Scientific and Statistical Database Management, pp. 1–12 (2016)Google Scholar
  13. 13.
    Jia, X., Luo, Q., Fan, D.: Cross-matching large astronomical catalogs on heterogeneous clusters, pp. 617–624(2015)Google Scholar
  14. 14.
    Kunszt, P.Z., Szalay, A.S., Thakar, A.R.: The hierarchical triangular mesh. In: Banday, A.J., Zaroubi, S., Bartelmann, M. (eds.) Mining the Sky, pp. 631–637. Springer, Berlin (2001). Scholar
  15. 15.
    Lee, M.A., Budavári, T.: Cross-identification of astronomical catalogs on multiple GPUs. Astron. Data Anal. Softw. Syst. 475, 235 (2013)Google Scholar
  16. 16.
    Li, L., Tang, D., Liu, T., Liu, H., Li, W., Cui, C.: Optimizing the join operation on hive to accelerate cross-matching in astronomy. In: IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 1735–1745 (2014)Google Scholar
  17. 17.
    Mesmoudi, A., Hacid, M.S.: A comparison of systems to large-scale data access. In: Han, W.S., Lee, M.L., Muliantara, A., Sanjaya, N.A., Thalheim, B., Zhou, S. (eds.) DASFAA 2014. LNCS, vol. 8505, pp. 161–175. Springer, Heidelberg (2014). Scholar
  18. 18.
    NASA: Jet propulsion laboratory HEALPix homepage.
  19. 19.
    Ochsenbein, F., Bauer, P., Marcout, J.: The VizieR database of astronomical catalogues. Astron. Astrophys. Suppl. 143(1), 23–32 (2000)CrossRefGoogle Scholar
  20. 20.
    Ochsenbein, F., Derriere, S., Nicaisse, S., Schaaff, A.: Clustering the large VizieR catalogues, the CoCat experience. Astron. Data Anal. Softw. Syst. (ADASS) 314(314), 58 (2004)Google Scholar
  21. 21.
    Planthaber, G., Stonebraker, M., Frew, J.: EarthDB: scalable analysis of MODIS data using SciDB. In: ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, pp. 11–19 (2012)Google Scholar
  22. 22.
    Richter, S., Quiané-Ruiz, J.A., Schuh, S., Dittrich, J.: Towards zero-overhead static and adaptive indexing in Hadoop. VLDB J. 23(3), 469–494 (2014)CrossRefGoogle Scholar
  23. 23.
    Salvato, M., et al.: Finding counterparts for all-sky X-ray surveys with NWAY: a Bayesian algorithm for cross-matching multiple catalogues. Mon. Not. R. Astron. Soc. 473, 4937–4955 (2018)CrossRefGoogle Scholar
  24. 24.
    Smareglia, R., Laurino, O., Knapic, C.: VODance: VO data access layer service creation made easy, vol. 442, p. 575 (2011)Google Scholar
  25. 25.
    Soumagnac, M.T., Ofek, E.O.: catsHTM - a tool for fast accessing and cross-matching large astronomical catalogs. ArXiv e-prints (2018)Google Scholar
  26. 26.
    Taylor, M.: TOPCAT - tool for operations on catalogues and tables. Starlink User Note 253 (2011)Google Scholar
  27. 27.
    Wang, S., Zhao, Y., Luo, Q., Wu, C., Yang, X.: Accelerating in-memory cross match of astronomical catalogs. In: IEEE International Conference on E-Science, pp. 326–333 (2013)Google Scholar
  28. 28.
    Wenger, M., Ochsenbein, F., Egret, D., et al.: The SIMBAD astronomical database. The CDS reference database for astronomical objects. Astron. Astrophys. Suppl. 143(1), 9–22 (2000)CrossRefGoogle Scholar
  29. 29.
    White, T., Cutting, D.: Hadoop: The Definitive Guide, vol. 215, no. 11, pp. 1–4. O’reilly Media Inc., sebastopol (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.College of Intelligence and ComputingTianjin UniversityTianjinChina
  2. 2.National Astronomical ObservatoriesChinese Academy of SciencesBeijingChina

Personalised recommendations