Abstract
We present two statistical techniques for astronomical problems: a star-galaxy separator for the UKIRT Infrared Deep Sky Survey (UKIDSS) and a novel anomaly detection method for cross-matched astronomical datasets. The star-galaxy separator is a statistical classification method which outputs class membership probabilities rather than class labels and allows the use of prior knowledge about the source populations. Deep Sloan Digital Sky Survey (SDSS) data from the multiply imaged Stripe 82 region are used to check the results from our classifier, which compares favourably with the UKIDSS pipeline classification algorithm. The anomaly detection method addresses the problem posed by objects having different sets of recorded variables in cross-matched datasets. This prevents the use of methods unable to handle missing values and makes direct comparison between objects difficult. For each source, our method computes anomaly scores in subspaces of the observed feature space and combines them to an overall anomaly score. The proposed technique is very general and can easily be used in applications other than astronomy. The properties and performance of our method are investigated using both real and simulated datasets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Borne, K.D.: Data-Driven Discovery through e-Science Technologies. In: 2nd IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT–06), pp. 251–256. IEEE Computer Society (2006)
York, D.G., et al.: The Sloan Digital Sky Survey: technical summary. Astron. J. 120, 1579–1587 (2000)
Gunn, J.E., et al.: The 2.5 m telescope of the Sloan Digital Sky Survey. Astrophys. J. 131, 2332–2359 (2006)
Fukugita, M., Ichikawa, T., Gunn, J.E., Doi, M., Shimasaku, K., Schneider, D.P.: The Sloan Digital Sky Survey photometric system. Astron. J. 111, 1748–1756 (1996)
Lupton, R.H., Gunn, J.E., Szalay, A.S.: A modified magnitude system that produces well-behaved magnitudes, colors, and errors even for low signal-to-noise ratio measurements. Astron. J. 118, 1406–1410 (1999)
Lawrence, A., Warren, S.J., Almaini, O., Edge, A.C., Hambly, N.C., Jameson, R.F., Lucas, P., Casali, M., Adamson, A., Dye, S., Emerson, J.P., Foucaud, S., Hewett, P., Hirst, P., Hodgkin, S.T., Irwin, M.J., Lodieu, N., McMahon, R.G., Simpson, C., Smail, I., Mortlock, D., Folger, M.: The UKIRT Infrared Deep Sky Survey (UKIDSS). Mon. Not. R. Astron. Soc. 379, 1599–1617(19)(2007)
Dye, S., et al.: The UKIRT Infrared Deep Sky Survey early data release. Mon. Not. R. Astron. Soc. 372, 1227–1252 (2006)
Warren, S.J., et al.: The United Kingdom Infrared Telescope Infrared Deep Sky Survey first data release. Mon. Not. R. Astron. Soc. 375, 213–226 (2007)
Casali, M., et al.: The UKIRT wide-field camera. Astron. Astrophys. 467, 777–784 (2007)
Hewett, P.C., Warren, S.J., Leggett, S.K., Hodgkin, S.T.: The UKIRT Infrared Deep Sky Survey ZY JHK photometric system: passbands and synthetic colours. Mon. Not. R. Astron. Soc. 367, 454–468 (2006)
Skrutskie, M.F., Cutri, R.M., Stiening, R., Weinberg, M.D., Schneider, S., Carpenter, J.M., Beichman, C., Capps, R., Chester, T., Elias, J., Huchra, J., Liebert, J., Lonsdale, C., Monet, D.G., Price, S., Seitzer, P., Jarrett, T., Kirkpatrick, J.D., Gizis, J.E., Howard, E., Evans, T., Fowler, J., Fullmer, L., Hurt, R., Light, R., Kopan, E.L., Marsh, K.A., McCallon, H.L., Tam, R., Van Dyk, S., Wheelock, S.: The Two Micron All Sky Survey (2MASS). Astron. J. 131, 1163–1183 (2006)
Lintott, C.J., et al.: Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey. Mon. Not. R. Astron. Soc. 389, 1179–1189 (2008)
Irwin, M.J.: Automatic analysis of crowded fields. Mon. Not. R. Astron. Soc. 214, 575–604 (1985)
Bertin, E., Arnouts, S.: SExtractor: software for source extraction. Astron. Astrophys. Suppl. Ser. 117, 393–404 (1996)
Henrion, M., Mortlock, D.J., Hand, D.J., Gandy, A.: A Bayesian approach to star-galaxy classification. Mon. Not. R. Astron. Soc. 412, 2286–2302 (2011)
Bazell, D., Peng, Y.: A comparison of neural network algorithms and preprocessing methods for star-galaxy discrimination. Astrophys. J. Suppl. Ser. 116, 47–55 (1998)
Cortiglioni, F., Mähönen, P., Hakala, P., Frantti, T.: Automated star-galaxy discrimination for large surveys. Astrophys. J. 556, 937–943 (2001)
Wolf, C., Meisenheimer, K, Röser, HJ.: Object classification in astronomical multi-color surveys. Astron. Astrophys. 365(3), 660–680 (2001)
Aihara, H., et al.: The eighth data release of the Sloan Digital Sky Survey: first data from SDSS-III. Astrophys. J. Suppl. Ser. 193, 29–45 (2011)
Richards, G.T., Nichol, R.C., Gray, A.G., Brunner, R.J., Lupton, R.H., Vanden Berk, D.E., Chong, S.S., Weinstein, M.A., Schneider, D.P., Anderson, S.F., Munn, J.A., Harris, H.C., Strauss, M.A., Fan, X., Gunn, J.E., Ivezić, Ž., York, D.G., Brinkmann, J., Moore, A.W.: Efficient photometric selection of quasars from the Sloan Digital Sky Survey: 100,000 z < 3 quasars from data release one. Astrophys. J. Suppl. Ser. 155, 257–269 (2004)
Richards, G.T., Deo, R.P, Lacy, M., Myers, A.D., Nichol, R.C., Zakamska, N.L., Brunner, R.J., Brandt, W.N., Gray, A.G., Parejko, J.K., Ptak, A., Schneider, D.P, Storrie-Lombardi, L.J., Szalay, A.S.: Eight-dimensional mid-infrared/optical Bayesian quasar selection. Astron. J. 137, 3884–3899 (2009)
Wolf, C., Meisenheimer, K, Röser, HJ., Beckwith, SVW., Chaffee Jr., F.H., Fried, J., Hippelein, H., Huang, J.S., Kümmel, M., von Kuhlmann, B., Maier, C., Phleps, S., Rix, H.W., Thommes, E., Thompson, D.: Multi-color classification in the Calar Alto Deep Imaging Survey. Astron. Astrophys. 365, 681–698 (2001)
Bazell, D., Miller, D J.: Class discovery in galaxy classification. Astrophys. J. 618, 723–732 (2005)
Suchkov, A.A., Hanisch, R.J., Margon, B.: A census of object types and redshift estimates in the SDSS photometric catalog from a trained decision tree classifier. Astron. J. 130, 2439–2452 (2005)
Ball, N.M., Brunner, R.J., Myers, A.D., Tcheng, D.: Robust machine learning applied to astronomical data sets. I. Star-galaxy classification of the Sloan Digital Sky Survey DR3 using decision trees. Astrophys. J. 650, 497–509 (2006)
Irwin, M., Lewis, J., Riello, M., Hodgkin, S., Gonzales-Solares, E., Wyn Evans, D., Bunclark, P.: Pipeline processing of wide-field near-infrared data from WFCAM (in preparation)
Odewahn, S.C., de Carvalho, R.R., Gal, R.R., Djorgovski, S.G., Brunner, R., Mahabal, A., Lopes, P.A.A., Moreira, J.L.K., Stalder, B.: The Digitized Second Palomar Observatory Sky Survey (DPOSS). III. Star-galaxy separation. Astron. J. 128, 3092–3107 (2004)
Philip, N.S., Wadadekar, Y., Kembhavi, A., Joseph, K.B.: A difference boosting neural network for automated star-galaxy classification. Astron. Astrophys. 385, 1119–1126 (2002)
Miller, D.J., Browning, J.: A mixture model and EM-based algorithm for class discovery, robust classification, and outlier rejection in mixed labeled/unlabeled data sets. IEEE T. Pattern. Anal. 25, 1468–1483 (2003)
Bardeau, S., Kneib, J.P., Czoske, O., Soucail, G., Smail, I., Ebeling, H., Smith, G.P.: A CFH12k lensing survey of X-ray luminous galaxy clusters. I. Weak lensing methodology. Astron. Astrophys. 434, 433–448 (2005)
Scranton, R., Johnston, D., Dodelson, S., Frieman, J.A., Connolly, A., Eisenstein, D.J., Gunn, J.E., Hui, L., Jain, B., Kent, S., Loveday, J., Narayanan, V., Nichol, R.C., O–Connell, L., Scoc-cimarro, R., Sheth, R.K., Stebbins, A., Strauss, M.A., Szalay, A.S., Szapudi, I., Tegmark, M., Vogeley, M., Zehavi, I., Annis, J., Bahcall, N.A., Brinkman, J., Csabai, I., Hindsley, R., Ivezic, Z., Kim, R.S.J., Knapp, G.R., Lamb, D.Q., Lee, B.C., Lupton, R.H., McKay, T., Munn, J., Peoples, J., Pier, J., Richards, G.T., Rockosi, C., Schlegel, D., Schneider, D.P., Stoughton, C., Tucker, D.L., Yanny, B., York, D.G.: Analysis of systematic effects and statistical uncertainties in angular clustering of galaxies from early Sloan Digital Sky Survey data. Astrophys. J. 579(1), 48–75 (2002)
Mortlock, D.J., Patel, M., Warren, S.J., Hewett, P.C., Venemans, B.P., McMahon, R.G., Simpson, C.: Probabilistic selection of high-redshift quasars. Mon. Not. R. Astron. Soc. 419, 390–410 (2012)
Sérsic, J.L.: Influence of the atmospheric and instrumental dispersion on the brightness distribution in a galaxy. La Plata Bol 6, 41 (1963)
Yasuda, N., Fukugita, M., Narayanan, V.K., Lupton, R.H., Strateva, I., Strauss, M.A., Ivezić, Z., Kim, R.S.J., Hogg, D.W., Weinberg, D.H., Shimasaku, K., Loveday, J., Annis, J., Bahcall, N.A., Blanton, M., Brinkmann, J., Brunner, R.J., Connolly, A.J., Csabai, I., Doi, M., Hamabe, M., Ichikawa, S.I., Ichikawa, T., Johnston, D.E., Knapp G. R. andKunszt, P.Z., Lamb, D.Q., McKay, T.A., Munn, J.A., Nichol, R.C., Okamura, S., Schneider, D.P., Szokoly, G.P., Vogeley, M.S., Watanabe, M., York, D.G.: Galaxy number counts from the Sloan Digital Sky Survey commissioning data. Astron. J. 122, 1104–1124 (2001)
Henrion, M., Hand, D.J., Gandy, A., Mortlock, D.: CASOS: a Subspace Method for Anomaly Detection in High Dimensional Astronomical Databases (2011). Submitted
Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high himensional space. In: Van den Bussche, J., Vianu, V. (eds.) Database Theory — ICDT 2001, Lecture Notes in Computer Science, vol. 1973, pp. 420–434. Springer (2001)
Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is “nearest neighbor” meaningful? In: Beeri, C., Buneman, P. (eds.) Database Theory — ICDT –99, Lecture Notes in Computer Science, vol. 1540, pp. 217–235. Springer (1999)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data 29(2), 93–104 (2000)
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J. 8, 237–253 (2000)
Rebbapragada, U., Protopapas, P., Brodley, C., Alcock, C.: Finding anomalous periodic time series. Mach. Learn. 74, 281–313 (2009). 10.1007/s10994–008–5093–3
Dutta, H., Gianella, C., Borne, K., Kargupta, h.: Distributed top-K outlier detection in astronomy catalogs using the DEMAC system. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 473–478 (2007)
Jolliffe, I.T.: Principal Component Analysis, 2nd edn. Springer Series in Statistics. Springer (2002)
Mahule, T., Borne, K., Dey, S., Arora, S., Kargupta, H.: PADMINI: a peer-to-peer distributed astronomy data mining system and a case study. In: Proceedings of the Conference on Intelligent Data Understanding 2010 (2010)
Aggarwal, C.C., Yu, P.S.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14(2), 211–221 (2005)
Latecki, L., Lazarevic, A., Pokrajac, D.: Outlier detection with kernel density functions. In: Perner, P. (ed.) Machine Learning and Data Mining in Pattern Recognition, Lecture Notes in Computer Science, vol. 4571, pp. 61–75. Springer (2007)
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: fast outlier detection using the local correlation integral. In: Proceedings of the IEEE 19th International Conference on Data Engineering (ICDE–03). IEEE Computer Society (2003)
Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD–08) (2008)
Hambly, N.C., et al.: The WFCAM Science Archive. Mon. Not. R. Astron. Soc. 384, 637–662 (2008)
Acknowledgments
The results presented here would not have been possible without the efforts of the many people involved in the SDSS and UKIDSS projects.
Marc Henrion was supported by an EPSRC research studentship, and David Hand was partially supported by a Royal Society Wolfson Research Merit Award.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Henrion, M., Mortlock, D.J., Hand, D.J., Gandy, A. (2013). Classification and Anomaly Detection for Astronomical Survey Data. In: Hilbe, J. (eds) Astrostatistical Challenges for the New Astronomy. Springer Series in Astrostatistics, vol 1. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-3508-2_8
Download citation
DOI: https://doi.org/10.1007/978-1-4614-3508-2_8
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-3507-5
Online ISBN: 978-1-4614-3508-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)