Big Data New Frontiers: Mining, Search and Management of Massive Repositories of Solar Image Data and Solar Events

  • Juan M. BandaEmail author
  • Michael A. Schuh
  • Rafal A. Angryk
  • Karthik Ganesan Pillai
  • Patrick McInerney
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 241)


This work presents one of the many emerging research domains where big data analysis has become an immediate need to process the massive amounts of data being generated each day: solar physics. While building a content-based image retrieval system for NASA’s Solar Dynamics Observatory mission, we have discovered research problems that can be addressed by the use of big data processing techniques and in some cases require the development of novel techniques. With over one terabyte of solar data being generated each day, and ever more missions on the horizon that expect to generate petabytes of data each year, solar physics presents many exciting opportunities. This paper presents the current status of our work with solar image data and events, our shift towards using big data methodologies, and future directions for big data processing in solar physics.


Image Retrieval Solar Physic Solar Dynamics Observatory Solar Image CBIR System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hapgood, M.A.: Towards a scientific understanding of the risk from extreme space weather. Advances in Space Research 47(12), 2059–2072 (2011)CrossRefGoogle Scholar
  2. 2.
    Banda, J.M., Angryk, R.: Selection of Image Parameters as the First Step Towards creating a CBIR System for the Solar Dynamics Observatory. In: Proc. of Int. Conf. on Digital Image Computing: Techniques and Applications (DICTA), pp. 528–534 (2010)Google Scholar
  3. 3.
    Banda, J.M., Angryk, R.: An Experimental Evaluation of Popular Image Parameters for Monochromatic Solar Image Categorization. In: Proc. of the 23rd Florida Artificial Intelligence Research Society Conf., pp. 380–385 (2010)Google Scholar
  4. 4.
    Banda, J.M., Angryk, R.: On the effectiveness of fuzzy clustering as a data discretization technique for Large-scale classification of solar images. In: Proc. IEEE International Conference on Fuzzy Systems, pp. 2019–2024 (2009)Google Scholar
  5. 5.
    Banda, J.M., Angryk, R.: Usage of dissimilarity measures and multidimensional scaling for large scale solar data analysis. In: Proc 2010 Conf. on Intelligent Data Understanding (CIDU), pp. 189–203 (2010)Google Scholar
  6. 6.
    Banda, J.M., Angryk, R., Martens, P.C.H.: On Dimensionality Reduction for Indexing and Retrieval of Large-Scale Solar Image Data. Solar Phys. 283, 113–141 (2012)CrossRefGoogle Scholar
  7. 7.
    Schuh, M.A., Wylie, T., Banda, J.M., Angryk, R.A.: A comprehensive study of iDistance partitioning strategies for kNN queries and high-dimensional data indexing. In: Gottlob, G., Grasso, G., Olteanu, D., Schallhart, C. (eds.) BNCOD 2013. LNCS, vol. 7968, pp. 238–252. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  8. 8.
    Banda, J.M., Angryk, R., Martens, P.: On the surprisingly accurate transfer of image parameters between medical and solar images. In: Proceedings of the International Conference on Image Processing (ICIP), pp. 3730–3733 (2011)Google Scholar
  9. 9.
    Müller, H., Michoux, N., Bandon, D., Geissbuhler, A.: A review of content-based image retrieval systems in medical applications: clinical benefits and future directions. International journal of medical informatics 73, 1–23 (2004)CrossRefGoogle Scholar
  10. 10.
    Schuh, M.A., Angryk, R.A., Pillai, K.-G., Banda, J.M., Martens, P.C.H.: A large-scale solar image dataset with labeled event regions. To appear in. In: Proc. of the International Conference on Image Processing, ICIP (2013)Google Scholar
  11. 11.
    Pillai, K.-G., Angryk, R.A., Banda, J.M., Schuh, M.A., Wylie, T.: Spatio-temporal co-occurrence pattern mining in data sets with evolving regions. In: ICDM Workshops 2012, pp. 805–812 (2012)Google Scholar
  12. 12.
    Pillai, K.G., Sturlaugson, L., Banda, J.M., Angryk, R.A.: Extending high-dimensional indexing techniques pyramid and iMinMax(θ): Lessons learned. In: Gottlob, G., Grasso, G., Olteanu, D., Schallhart, C. (eds.) BNCOD 2013. LNCS, vol. 7968, pp. 253–267. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  13. 13.
    Martens, P.C.H., Attrill, G.D.R., Davey, A.R., Engell, A., Farid, S., et al.: Computer vision for the solar dynamics observatory (SDO). Solar Physics (2011)Google Scholar
  14. 14.
    Schuh, M.A., Banda, J.M., Bernasconi, P.N., Angryk, R.A., Martens, P.C.H.: A comparative evaluation of automated solar filament detection. Solar Physics (under review, 2013)Google Scholar
  15. 15.
    Gu, C., Gao, Y.: A Content-Based Image Retrieval System Based on Hadoop and Lucene. In: Cloud and Green Computing (CGC), November 1-3, pp. 684–687 (2012)Google Scholar
  16. 16.
    Deng, J., Berg, A.C., Li, K., Fei-Fei, L.: What does classifying more than 10,000 image categories tell us? In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 71–84. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  17. 17.
    Sánchez, J., Perronnin, F.: High-dimensional signature compression for large-scale image classification. In: Proc. of CVPR (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Juan M. Banda
    • 1
    Email author
  • Michael A. Schuh
    • 1
  • Rafal A. Angryk
    • 1
  • Karthik Ganesan Pillai
    • 1
  • Patrick McInerney
    • 1
  1. 1.Montana State UniversityBozemanUSA

Personalised recommendations