Abstract
In this chapter, we briefly touch on topics that may increase in importance for text mining, but are not yet central to prediction. These include summarization, active learning, learning with unlabeled data, learning with multiple samples or models, online learning, cost-sensitive learning, unbalanced samples and rare events, distributed text mining, rank learning and question answering.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
X. Bao, L. Bergman, and R. Thompson. Stacking recommendation engines with additional meta-features. In RecSys’09: Proceedings of the Third ACM Conference on Recommender Systems, pages 109–116. ACM, New York, 2009.
R. Bell, J. Bennett, Y. Koren, and C. Volinsky. The million dollar programming prize. IEEE Spectrum, pages 28–33, 2009.
A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 92–100. ACM, New York, 1998.
C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML’05, 2005.
M. Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of EMNLP’02. ACL, East Stroudsburg, 2002.
D. Cossock and T. Zhang. Statistical analysis of Bayes optimal subset ranking. IEEE Transactions on Information Theory, 54(11):5140–5154, 2008.
F. Damerau. Problems and some solutions in customization of natural language database front ends. ACM Transactions on Information Systems, 3(2):165–184, 1985.
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.
S. Dzeroski and B. Ženko. Is combining classifiers with stacking better than selecting the best one? Machine Learning, 54(3):255–273, 2004.
Y. Freund, R. Iyer, R. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. JMLR, 4:933–969, 2003.
S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file system. SIGOPS Operating Systems Review, 37(5):29–43, 2003.
R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In B. Schölkopf, A. Smola, P. Bartlett and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 115–132. MIT Press, Cambridge, 2000.
V. Iyengar, C. Apté, and T. Zhang. Active learning using adaptive resampling. In The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 91–98. ACM, New York, 2000.
K. Jarvelin and J. Kekalainen. IR evaluation methods for retrieving highly relevant documents. In SIGIR’00, pages 41–48, 2000.
D. Lewis and J. Catlett. Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 148–156. Morgan Kaufmann, San Francisco, 1994.
R. Liere and P. Tadepalli. Active learning with committees for text categorization. In Proceedings of the 14th National Conference on Artificial Intelligence, pages 591–596. AAAI Press, Menlo Park, 1997.
N. Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988.
H. Luhn. The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2):159–165, 1958.
K. Nigam. Using unlabeled data to improve text classification. Ph.D. thesis, Carnegie Mellon University, 2001.
D. Radev and S. Tenfel, editors. Proceedings of the HLT NAACL 2003 Workshop on Text Summarization. ACL, East Stroudsburg, 2003.
D. Radev, M. Topper, and A. Winkel. Multi-document centroid-based text summarization. In Proceedings of ACL-02 Demo Session, pages 112–113. ACL, East Stroudsburg, 2002.
F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan, New York, 1962.
R. Schapire and Y. Singer. BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2/3):135–168, 2000.
S. Seshasai. Winston, Katz sue Ask Jeeves: AI lab researchers attempt to enforce natural language patent. The Tech (MIT), 2000. http://www-tech.mit.edu/V119/N66/.
E. Voorhees and L. Buckland, editors. NIST Special Publication 500-251: The Eleventh Text Retrieval Conference (TREC 2002), Gaithersburg, Maryland, 19–22 November 2002. NIST Press, Washington, 2002. Co-sponsored by DARPA and ARDA.
S. Weiss, C. Apté, F. Damerau, D. Johnson, F. Oles, T. Goetz, and T. Hampp, Maximizing text-mining performance. IEEE Intelligent Systems, 14(4):63–69, 1999.
T. White. Hadoop: The Definitive Guide. O’Reilly Media, Sebastopol, 2009.
T. Zhang and F. Oles. A probability analysis on the value of unlabeled data for classification problems. In Proceedings of ICML-00, pages 1191–1198. Morgan Kaufmann, San Francisco, 2000.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2010 Springer-Verlag London Limited
About this chapter
Cite this chapter
Weiss, S.M., Indurkhya, N., Zhang, T. (2010). Emerging Directions. In: Fundamentals of Predictive Text Mining. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-84996-226-1_9
Download citation
DOI: https://doi.org/10.1007/978-1-84996-226-1_9
Publisher Name: Springer, London
Print ISBN: 978-1-84996-225-4
Online ISBN: 978-1-84996-226-1
eBook Packages: Computer ScienceComputer Science (R0)