Exploratory Learning

Dalvi, Bhavana; Cohen, William W.; Callan, Jamie

doi:10.1007/978-3-642-40994-3_9

Exploratory Learning

Bhavana Dalvi²³,
William W. Cohen²³ &
Jamie Callan²³

Conference paper

6209 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8190))

Abstract

In multiclass semi-supervised learning (SSL), it is sometimes the case that the number of classes present in the data is not known, and hence no labeled examples are provided for some classes. In this paper we present variants of well-known semi-supervised multiclass learning methods that are robust when the data contains an unknown number of classes. In particular, we present an “exploratory” extension of expectation-maximization (EM) that explores different numbers of classes while learning. “Exploratory” SSL greatly improves performance on three datasets in terms of F1 on the classes with seed examples—i.e., the classes which are expected to be in the data. Our Exploratory EM algorithm also outperforms a SSL method based non-parametric Bayesian clustering.

Download to read the full chapter text

Chapter PDF

References

Banerjee, A., Dhillon, I.S., Ghosh, J., Sra, S.: Clustering on the unit hypersphere using von mises-fisher distributions. In: JMLR (2005)
Google Scholar
Basu, S., Banerjee, A., Mooney, R.: Semi-supervised clustering by seeding. In: ICML (2002)
Google Scholar
Bouveyron, C.: Adaptive mixture discriminant analysis for supervised learning with unobserved classes (2010)
Google Scholar
Burnham, K.P., Anderson, D.R.: Multimodel inference understanding aic and bic in model selection. Sociological Methods & Research (2004)
Google Scholar
Carlson, A., Betteridge, J., Wang, R.C., Hruschka Jr., E.R., Mitchell, T.M.: Coupled semi-supervised learning for information extraction. In: WSDM (2010)
Google Scholar
Celeux, G., Govaert, G.: A classification em algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis (1992)
Google Scholar
Chiang, M.M.-T., Mirkin, B.: Intelligent choice of the number of clusters in k-means clustering: An experimental study with different cluster spreads. J. Classification (2010)
Google Scholar
Dalvi, B., Cohen, W.: Very fast similarity queries on semi-structured data from the web. In: SDM (2013)
Google Scholar
Dalvi, B., Cohen, W., Callan, J.: Websets: Extracting sets of entities from the web using unsupervised information extraction. In: WSDM (2012)
Google Scholar
Deng Cai, X.W., He, X.: Probabilistic dyadic data analysis with local and global consistency. In: ICML (2009)
Google Scholar
Dutta, H., Passonneau, R., Lee, A., Radeva, A., Xie, B., Waltz, D., Taranto, B.: Learning parameters of the k-means algorithm from subjective human annotation. In: FLAIRS (2011)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall. In: WWW (2004)
Google Scholar
Friedman, N., Ninio, M., Pe’er, I., Pupko, T.: A structural em algorithm for phylogenetic inference. Journal of Computational Biology (2002)
Google Scholar
Griffiths, D., Tenenbaum, M.: Hierarchical topic models and the nested chinese restaurant process. In: NIPS (2004)
Google Scholar
Hamerly, G., Elkan, C.: Learning the k in k-means. In: NIPS (2003)
Google Scholar
Kasiviswanathan, S.P., Melville, P., Banerjee, A., Sindhwani, V.: Emerging topic detection using dictionary learning. In: CIKM (2011)
Google Scholar
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.: Integrating novel class detection with classification for concept-drifting data streams. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 79–94. Springer, Heidelberg (2009)
Chapter Google Scholar
McIntosh, T.: Unsupervised discovery of negative categories in lexicon bootstrapping. In: EMNLP (2010)
Google Scholar
Menasce, D.A., Almeida, V.A.F., Fonseca, R., Mendes, M.A.: A methodology for workload characterization of e-commerce sites. In: EC (1999)
Google Scholar
Mohamed, T., Hruschka Jr., E., Mitchell, T.: Discovering relations between noun categories. In: EMNLP (2011)
Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em. Machine Learning (2000)
Google Scholar
Pelleg, D., Moore, A., et al.: X-means: Extending k-means with efficient estimation of the number of clusters. In: ICML (2000)
Google Scholar
Rennie, J.: 20-newsgroup dataset (2008)
Google Scholar
Schölkopf, B., Williamson, R.C., Smola, A.J., Shawe-Taylor, J., Platt, J.: Support vector method for novelty detection. In: NIPS (2000)
Google Scholar
Talukdar, P.P., Crammer, K.: New regularized algorithms for transductive learning. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 442–457. Springer, Heidelberg (2009)
Chapter Google Scholar
Wagstaff, K., Cardie, C., Rogers, S., Schrodl, S.: Constrained k-means clustering with background knowledge. In: ICML (2001)
Google Scholar
Welling, M., Kurihara, K.: Bayesian k-means as a maximization-expectation algorithm. In: ICDM (2006)
Google Scholar
Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., Soderland, S.: Textrunner: Open information extraction on the web. In: NAACL (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Bhavana Dalvi, William W. Cohen & Jamie Callan

Authors

Bhavana Dalvi
View author publications
You can also search for this author in PubMed Google Scholar
William W. Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Jamie Callan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Leuven, Belgium
Hendrik Blockeel
Fraunhofer IAIS, Department of Knowledge Discovery, Schloss Birlinghoven, University of Bonn, 53754, Sankt Augustin, Germany
Kristian Kersting
LIACS, Universiteit Leiden, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
Siegfried Nijssen
Department of Computer Science and Engineering, Czech Technical University, Technicka 2, 16627, Prague 6, Czech Republic
Filip Železný

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dalvi, B., Cohen, W.W., Callan, J. (2013). Exploratory Learning. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science(), vol 8190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40994-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-40994-3_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40993-6
Online ISBN: 978-3-642-40994-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics