Skip to main content

Classification and Anomaly Detection for Astronomical Survey Data

  • Chapter
  • First Online:

Part of the book series: Springer Series in Astrostatistics ((SSIA,volume 1))

Abstract

We present two statistical techniques for astronomical problems: a star-galaxy separator for the UKIRT Infrared Deep Sky Survey (UKIDSS) and a novel anomaly detection method for cross-matched astronomical datasets. The star-galaxy separator is a statistical classification method which outputs class membership probabilities rather than class labels and allows the use of prior knowledge about the source populations. Deep Sloan Digital Sky Survey (SDSS) data from the multiply imaged Stripe 82 region are used to check the results from our classifier, which compares favourably with the UKIDSS pipeline classification algorithm. The anomaly detection method addresses the problem posed by objects having different sets of recorded variables in cross-matched datasets. This prevents the use of methods unable to handle missing values and makes direct comparison between objects difficult. For each source, our method computes anomaly scores in subspaces of the observed feature space and combines them to an overall anomaly score. The proposed technique is very general and can easily be used in applications other than astronomy. The properties and performance of our method are investigated using both real and simulated datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Borne, K.D.: Data-Driven Discovery through e-Science Technologies. In: 2nd IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT–06), pp. 251–256. IEEE Computer Society (2006)

    Google Scholar 

  2. York, D.G., et al.: The Sloan Digital Sky Survey: technical summary. Astron. J. 120, 1579–1587 (2000)

    Article  Google Scholar 

  3. Gunn, J.E., et al.: The 2.5 m telescope of the Sloan Digital Sky Survey. Astrophys. J. 131, 2332–2359 (2006)

    Google Scholar 

  4. Fukugita, M., Ichikawa, T., Gunn, J.E., Doi, M., Shimasaku, K., Schneider, D.P.: The Sloan Digital Sky Survey photometric system. Astron. J. 111, 1748–1756 (1996)

    Article  Google Scholar 

  5. Lupton, R.H., Gunn, J.E., Szalay, A.S.: A modified magnitude system that produces well-behaved magnitudes, colors, and errors even for low signal-to-noise ratio measurements. Astron. J. 118, 1406–1410 (1999)

    Article  Google Scholar 

  6. Lawrence, A., Warren, S.J., Almaini, O., Edge, A.C., Hambly, N.C., Jameson, R.F., Lucas, P., Casali, M., Adamson, A., Dye, S., Emerson, J.P., Foucaud, S., Hewett, P., Hirst, P., Hodgkin, S.T., Irwin, M.J., Lodieu, N., McMahon, R.G., Simpson, C., Smail, I., Mortlock, D., Folger, M.: The UKIRT Infrared Deep Sky Survey (UKIDSS). Mon. Not. R. Astron. Soc. 379, 1599–1617(19)(2007)

    Article  Google Scholar 

  7. Dye, S., et al.: The UKIRT Infrared Deep Sky Survey early data release. Mon. Not. R. Astron. Soc. 372, 1227–1252 (2006)

    Article  Google Scholar 

  8. Warren, S.J., et al.: The United Kingdom Infrared Telescope Infrared Deep Sky Survey first data release. Mon. Not. R. Astron. Soc. 375, 213–226 (2007)

    Article  Google Scholar 

  9. Casali, M., et al.: The UKIRT wide-field camera. Astron. Astrophys. 467, 777–784 (2007)

    Article  Google Scholar 

  10. Hewett, P.C., Warren, S.J., Leggett, S.K., Hodgkin, S.T.: The UKIRT Infrared Deep Sky Survey ZY JHK photometric system: passbands and synthetic colours. Mon. Not. R. Astron. Soc. 367, 454–468 (2006)

    Article  Google Scholar 

  11. Skrutskie, M.F., Cutri, R.M., Stiening, R., Weinberg, M.D., Schneider, S., Carpenter, J.M., Beichman, C., Capps, R., Chester, T., Elias, J., Huchra, J., Liebert, J., Lonsdale, C., Monet, D.G., Price, S., Seitzer, P., Jarrett, T., Kirkpatrick, J.D., Gizis, J.E., Howard, E., Evans, T., Fowler, J., Fullmer, L., Hurt, R., Light, R., Kopan, E.L., Marsh, K.A., McCallon, H.L., Tam, R., Van Dyk, S., Wheelock, S.: The Two Micron All Sky Survey (2MASS). Astron. J. 131, 1163–1183 (2006)

    Article  Google Scholar 

  12. Lintott, C.J., et al.: Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Mon. Not. R. Astron. Soc. 389, 1179–1189 (2008)

    Article  Google Scholar 

  13. Irwin, M.J.: Automatic analysis of crowded fields. Mon. Not. R. Astron. Soc. 214, 575–604 (1985)

    Google Scholar 

  14. Bertin, E., Arnouts, S.: SExtractor: software for source extraction. Astron. Astrophys. Suppl. Ser. 117, 393–404 (1996)

    Article  Google Scholar 

  15. Henrion, M., Mortlock, D.J., Hand, D.J., Gandy, A.: A Bayesian approach to star-galaxy classification. Mon. Not. R. Astron. Soc. 412, 2286–2302 (2011)

    Article  Google Scholar 

  16. Bazell, D., Peng, Y.: A comparison of neural network algorithms and preprocessing methods for star-galaxy discrimination. Astrophys. J. Suppl. Ser. 116, 47–55 (1998)

    Article  Google Scholar 

  17. Cortiglioni, F., Mähönen, P., Hakala, P., Frantti, T.: Automated star-galaxy discrimination for large surveys. Astrophys. J. 556, 937–943 (2001)

    Article  Google Scholar 

  18. Wolf, C., Meisenheimer, K, Röser, HJ.: Object classification in astronomical multi-color surveys. Astron. Astrophys. 365(3), 660–680 (2001)

    Article  Google Scholar 

  19. Aihara, H., et al.: The eighth data release of the Sloan Digital Sky Survey: first data from SDSS-III. Astrophys. J. Suppl. Ser. 193, 29–45 (2011)

    Article  Google Scholar 

  20. Richards, G.T., Nichol, R.C., Gray, A.G., Brunner, R.J., Lupton, R.H., Vanden Berk, D.E., Chong, S.S., Weinstein, M.A., Schneider, D.P., Anderson, S.F., Munn, J.A., Harris, H.C., Strauss, M.A., Fan, X., Gunn, J.E., Ivezić, Ž., York, D.G., Brinkmann, J., Moore, A.W.: Efficient photometric selection of quasars from the Sloan Digital Sky Survey: 100,000 z < 3 quasars from data release one. Astrophys. J. Suppl. Ser. 155, 257–269 (2004)

    Article  Google Scholar 

  21. Richards, G.T., Deo, R.P, Lacy, M., Myers, A.D., Nichol, R.C., Zakamska, N.L., Brunner, R.J., Brandt, W.N., Gray, A.G., Parejko, J.K., Ptak, A., Schneider, D.P, Storrie-Lombardi, L.J., Szalay, A.S.: Eight-dimensional mid-infrared/optical Bayesian quasar selection. Astron. J. 137, 3884–3899 (2009)

    Article  Google Scholar 

  22. Wolf, C., Meisenheimer, K, Röser, HJ., Beckwith, SVW., Chaffee Jr., F.H., Fried, J., Hippelein, H., Huang, J.S., Kümmel, M., von Kuhlmann, B., Maier, C., Phleps, S., Rix, H.W., Thommes, E., Thompson, D.: Multi-color classification in the Calar Alto Deep Imaging Survey. Astron. Astrophys. 365, 681–698 (2001)

    Article  Google Scholar 

  23. Bazell, D., Miller, D J.: Class discovery in galaxy classification. Astrophys. J. 618, 723–732 (2005)

    Article  Google Scholar 

  24. Suchkov, A.A., Hanisch, R.J., Margon, B.: A census of object types and redshift estimates in the SDSS photometric catalog from a trained decision tree classifier. Astron. J. 130, 2439–2452 (2005)

    Article  Google Scholar 

  25. Ball, N.M., Brunner, R.J., Myers, A.D., Tcheng, D.: Robust machine learning applied to astronomical data sets. I. Star-galaxy classification of the Sloan Digital Sky Survey DR3 using decision trees. Astrophys. J. 650, 497–509 (2006)

    Article  Google Scholar 

  26. Irwin, M., Lewis, J., Riello, M., Hodgkin, S., Gonzales-Solares, E., Wyn Evans, D., Bunclark, P.: Pipeline processing of wide-field near-infrared data from WFCAM (in preparation)

    Google Scholar 

  27. Odewahn, S.C., de Carvalho, R.R., Gal, R.R., Djorgovski, S.G., Brunner, R., Mahabal, A., Lopes, P.A.A., Moreira, J.L.K., Stalder, B.: The Digitized Second Palomar Observatory Sky Survey (DPOSS). III. Star-galaxy separation. Astron. J. 128, 3092–3107 (2004)

    Article  Google Scholar 

  28. Philip, N.S., Wadadekar, Y., Kembhavi, A., Joseph, K.B.: A difference boosting neural network for automated star-galaxy classification. Astron. Astrophys. 385, 1119–1126 (2002)

    Article  Google Scholar 

  29. Miller, D.J., Browning, J.: A mixture model and EM-based algorithm for class discovery, robust classification, and outlier rejection in mixed labeled/unlabeled data sets. IEEE T. Pattern. Anal. 25, 1468–1483 (2003)

    Article  Google Scholar 

  30. Bardeau, S., Kneib, J.P., Czoske, O., Soucail, G., Smail, I., Ebeling, H., Smith, G.P.: A CFH12k lensing survey of X-ray luminous galaxy clusters. I. Weak lensing methodology. Astron. Astrophys. 434, 433–448 (2005)

    Article  Google Scholar 

  31. Scranton, R., Johnston, D., Dodelson, S., Frieman, J.A., Connolly, A., Eisenstein, D.J., Gunn, J.E., Hui, L., Jain, B., Kent, S., Loveday, J., Narayanan, V., Nichol, R.C., O–Connell, L., Scoc-cimarro, R., Sheth, R.K., Stebbins, A., Strauss, M.A., Szalay, A.S., Szapudi, I., Tegmark, M., Vogeley, M., Zehavi, I., Annis, J., Bahcall, N.A., Brinkman, J., Csabai, I., Hindsley, R., Ivezic, Z., Kim, R.S.J., Knapp, G.R., Lamb, D.Q., Lee, B.C., Lupton, R.H., McKay, T., Munn, J., Peoples, J., Pier, J., Richards, G.T., Rockosi, C., Schlegel, D., Schneider, D.P., Stoughton, C., Tucker, D.L., Yanny, B., York, D.G.: Analysis of systematic effects and statistical uncertainties in angular clustering of galaxies from early Sloan Digital Sky Survey data. Astrophys. J. 579(1), 48–75 (2002)

    Article  Google Scholar 

  32. Mortlock, D.J., Patel, M., Warren, S.J., Hewett, P.C., Venemans, B.P., McMahon, R.G., Simpson, C.: Probabilistic selection of high-redshift quasars. Mon. Not. R. Astron. Soc. 419, 390–410 (2012)

    Article  Google Scholar 

  33. Sérsic, J.L.: Influence of the atmospheric and instrumental dispersion on the brightness distribution in a galaxy. La Plata Bol 6, 41 (1963)

    Google Scholar 

  34. Yasuda, N., Fukugita, M., Narayanan, V.K., Lupton, R.H., Strateva, I., Strauss, M.A., Ivezić, Z., Kim, R.S.J., Hogg, D.W., Weinberg, D.H., Shimasaku, K., Loveday, J., Annis, J., Bahcall, N.A., Blanton, M., Brinkmann, J., Brunner, R.J., Connolly, A.J., Csabai, I., Doi, M., Hamabe, M., Ichikawa, S.I., Ichikawa, T., Johnston, D.E., Knapp G. R. andKunszt, P.Z., Lamb, D.Q., McKay, T.A., Munn, J.A., Nichol, R.C., Okamura, S., Schneider, D.P., Szokoly, G.P., Vogeley, M.S., Watanabe, M., York, D.G.: Galaxy number counts from the Sloan Digital Sky Survey commissioning data. Astron. J. 122, 1104–1124 (2001)

    Article  Google Scholar 

  35. Henrion, M., Hand, D.J., Gandy, A., Mortlock, D.: CASOS: a Subspace Method for Anomaly Detection in High Dimensional Astronomical Databases (2011). Submitted

    Google Scholar 

  36. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high himensional space. In: Van den Bussche, J., Vianu, V. (eds.) Database Theory — ICDT 2001, Lecture Notes in Computer Science, vol. 1973, pp. 420–434. Springer (2001)

    Chapter  Google Scholar 

  37. Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Buneman, P. (eds.) Database Theory — ICDT –99, Lecture Notes in Computer Science, vol. 1540, pp. 217–235. Springer (1999)

    Google Scholar 

  38. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data 29(2), 93–104 (2000)

    Article  Google Scholar 

  39. Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J. 8, 237–253 (2000)

    Article  Google Scholar 

  40. Rebbapragada, U., Protopapas, P., Brodley, C., Alcock, C.: Finding anomalous periodic time series. Mach. Learn. 74, 281–313 (2009). 10.1007/s10994–008–5093–3

    Article  Google Scholar 

  41. Dutta, H., Gianella, C., Borne, K., Kargupta, h.: Distributed top-K outlier detection in astronomy catalogs using the DEMAC system. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 473–478 (2007)

    Google Scholar 

  42. Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer Series in Statistics. Springer (2002)

    Google Scholar 

  43. Mahule, T., Borne, K., Dey, S., Arora, S., Kargupta, H.: PADMINI: a peer-to-peer distributed astronomy data mining system and a case study. In: Proceedings of the Conference on Intelligent Data Understanding 2010 (2010)

    Google Scholar 

  44. Aggarwal, C.C., Yu, P.S.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14(2), 211–221 (2005)

    Article  Google Scholar 

  45. Latecki, L., Lazarevic, A., Pokrajac, D.: Outlier detection with kernel density functions. In: Perner, P. (ed.) Machine Learning and Data Mining in Pattern Recognition, Lecture Notes in Computer Science, vol. 4571, pp. 61–75. Springer (2007)

    Chapter  Google Scholar 

  46. Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: fast outlier detection using the local correlation integral. In: Proceedings of the IEEE 19th International Conference on Data Engineering (ICDE–03). IEEE Computer Society (2003)

    Google Scholar 

  47. Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD–08) (2008)

    Google Scholar 

  48. Hambly, N.C., et al.: The WFCAM Science Archive. Mon. Not. R. Astron. Soc. 384, 637–662 (2008)

    Article  Google Scholar 

Download references

Acknowledgments

The results presented here would not have been possible without the efforts of the many people involved in the SDSS and UKIDSS projects.

Marc Henrion was supported by an EPSRC research studentship, and David Hand was partially supported by a Royal Society Wolfson Research Merit Award.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marc Henrion .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Henrion, M., Mortlock, D.J., Hand, D.J., Gandy, A. (2013). Classification and Anomaly Detection for Astronomical Survey Data. In: Hilbe, J. (eds) Astrostatistical Challenges for the New Astronomy. Springer Series in Astrostatistics, vol 1. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3508-2_8

Download citation

Publish with us

Policies and ethics