Abstract
Multitask learning, the concept of solving multiple related tasks in parallel promises to improve generalization performance over the traditional divide-and-conquer approach in machine learning. The training signals of related tasks induce a bias that helps to find better hypotheses. This paper reviews the concept of multitask learning and prior work on it. An experimental evaluation is done on a large scale text classification problem. A deep neural network is trained to classify English newswire stories by their overlapping topics in parallel. The results are compared to the traditional approach of training a separate deep neural network for each topic separately. The results confirm the initial hypothesis that multitask learning improves generalization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Caruana, R.: Multitask learning. Mach.Learn. 28(1), 41–75 (1997)
Collobert, R., Weston, J.: A unified architecture for natural language processing. In: Proceedings of the 25th International Conference on Machine learning—ICML ’08, pp. 160–167. ACM Press, New York, USA (2008)
Hinton, G.E., Bengio, Y., Lecun, Y.: Deep Learning: NIPS 2015 Tutorial (2015)
LeCun, Y., Bottou, L., Orr, G.B., Müller, K.R.: Efficient BackProp. In: Neural Networks: Tricks of the Trade, vol. 1524, chap. Efficient BackProp, pp. 9–50. Springer, Berlin, Heidelberg (1998)
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout : a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. (JMLR) 15, 1929–1958 (2014)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language Processing, vol. 28, p. 1 (2013)
Hochreiter, S.: Recurrent neural net learning and vanishing gradient. Int. J. Uncertainity Fuzziness Knowl. Based Syst. 6(2), 8 (1998)
Bengio, Y., Simard, P., Frasconi, P.: Learning long term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)
Glorot, X., Bordes, A., Bengio, Y.: Deep Sparse Rectifier Neural Networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS-11), vol. 15, pp. 315–323 (2011). (Journal of Machine Learning Research—Workshop and Conference Proceedings)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 19(1), 153–160 (2007)
Hinton, G.E.: Learning multiple layers of representation. Trends Cogn. Sci. 11(10), 428–434 (2007)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Jin, X., Xu, C., Feng, J., Wei, Y., Xiong, J., Yan, S.: Deep Learning with S-shaped Rectified Linear Activation Units. arXiv preprint arXiv:1512.07030 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034. IEEE (2015)
Witten, I.H.: Text mining. Practical Handbook of Internet Computing, pp. 14–1 (2005)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR 2013), pp. 1–12 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL-HLT, pp. 746–751 (2013)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Powers, D.M.W.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2012)
Theano Development team: theano: a python framework for fast computation of mathematical expressions. arXiv e-prints (2016)
Chollet, F.: Keras. https://github.com/fchollet/keras (2015)
Acknowledgments
We thank Microsoft Research for supporting this work by providing a grant for the Microsoft Azure cloud platform.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Noushahr, H.G., Ahmadi, S. (2016). Multitask Learning for Text Classification with Deep Neural Networks. In: Bramer, M., Petridis, M. (eds) Research and Development in Intelligent Systems XXXIII. SGAI 2016. Springer, Cham. https://doi.org/10.1007/978-3-319-47175-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-47175-4_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47174-7
Online ISBN: 978-3-319-47175-4
eBook Packages: Computer ScienceComputer Science (R0)