Abstract
Distinguishing uncertain information from factual ones in online texts is of essential importance in information extraction, because uncertain information would mislead systems to find useless even fault information. In this paper, we propose a method for detecting uncertain sentences with multiple instance learning (MIL). Based on the basic assumption, we derive two new constraints for estimating the weight vector by defining a probability margin, which is used in an online learning algorithm known as Passive-Aggressive algorithm. To demonstrate the effectiveness of our method, we experiment on the biomedical corpus. Compared with an intuitive method with conventional single instance learning (SIL), our method provide higher performance by raising the performance from 79.07% up to 82.55%, over 3% improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Collins, M.: Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In: EMNLP, pp. 1–8. ACL (2002)
Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive-aggressive algorithms. Journal of Machine Learning Research 7, 551–585 (2006)
Farkas, R., Vincze, V., Móra, G., Csirik, J., Szarvas, G.: The conll-2010 shared task: Learning to detect hedges and their scope in natural language text. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pp. 1–12. ACL, Uppsala (2010)
Georgescul, M.: A hedgehop over a max-margin framework using hedge cues. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pp. 26–31. ACL, Uppsala (2010)
Ji, F., Qiu, X., Huang, X.: Detecting hedge cues and their scopes with average perceptron. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pp. 32–39. ACL, Uppsala (2010)
Maron, O., Lozano-Prez, T.: A framework for multiple-instance learning. In: Advances in Neural Information Processing Systems, pp. 570–576. MIT Press, Cambridge (1998)
Medlock, B.: Exploring hedge identification in biomedical literature. Journal of Biomedical Informatics 41(4), 636–654 (2008)
Medlock, B., Briscoe, T.: Weakly supervised learning for hedge classification in scientific literature. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 992–999. ACL, Prague (2007)
Morante, R., Daelemans, W.: Learning the scope of hedge cues in biomedical texts. In: Proceedings of the BioNLP 2009 Workshop, pp. 28–36. ACL, Boulder (2009)
Tang, B., Wang, X., Wang, X., Yuan, B., Fan, S.: A cascade method for detecting hedges and their scope in natural language text. In: Proceedings of the Fourteenth Conference on Computational Natural Language Learning, pp. 13–17. ACL, Uppsala (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ji, F., Qiu, X., Huang, X. (2010). Mining Uncertain Sentences with Multiple Instance Learning. In: Cao, L., Feng, Y., Zhong, J. (eds) Advanced Data Mining and Applications. ADMA 2010. Lecture Notes in Computer Science(), vol 6440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17316-5_49
Download citation
DOI: https://doi.org/10.1007/978-3-642-17316-5_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17315-8
Online ISBN: 978-3-642-17316-5
eBook Packages: Computer ScienceComputer Science (R0)