On Text Preprocessing for Opinion Mining Outside of Laboratory Environments

Petz, Gerald; Karpowicz, Michał; Fürschuß, Harald; Auinger, Andreas; Winkler, Stephan M.; Schaller, Susanne; Holzinger, Andreas

doi:10.1007/978-3-642-35236-2_62

Gerald Petz²²,
Michał Karpowicz²²,
Harald Fürschuß²²,
Andreas Auinger²²,
Stephan M. Winkler²³,
Susanne Schaller²³ &
…
Andreas Holzinger²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7669))

Included in the following conference series:

International Conference on Active Media Technology

2436 Accesses
17 Citations

Abstract

Opinion mining deals with scientific methods in order to find, extract and systematically analyze subjective information. When performing opinion mining to analyze content on the Web, challenges arise that usually do not occur in laboratory environments where prepared and preprocessed texts are used. This paper discusses preprocessing approaches that help coping with the emerging problems of sentiment analysis in real world situations. After outlining the identified shortcomings and presenting a general process model for opinion mining, promising solutions for language identification, content extraction and dealing with Internet slang are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A survey on classification techniques for opinion mining and sentiment analysis

Article 18 December 2017

Data Analysis: Opinion Mining and Sentiment Analysis of Opinionated Unstructured Data

Recent Trends in Opinion Mining using Machine Learning Techniques

References

Alby, T.: Web 2.0. Konzepte, Anwendungen, Technologien, 3rd edn. Hanser, München (2008)
Book Google Scholar
Nelles, O.: Nonlinear system identification: from classical approaches to neural networks and fuzzy models. Springer (2001)
Google Scholar
Liu, B.: Web data mining. Exploring hyperlinks, contents, and usage data, 2nd edn. Data-centric systems and applications. Springer, Berlin (2008)
Google Scholar
Steinecke, U., Straub, W.: Unstrukturierte Daten im Business Intelligence. Vorgehen, Ergebnisse und Erfahrungen in der praktischen Umsetzung. HMD - Praxis der Wirtschaftsinformatik 47(271), 91–101 (2010)
Google Scholar
Guozheng, Z., Faming, Z., Fang, W., Jian, L.: Knowledge Creation in Marketing Based on Data Mining. In: International Conference on Intelligent Computation Technology and Automation (ICICTA), vol. 1, pp. 782–786 (2008)
Google Scholar
Holzinger, A.: Weakly Structured Data in Health-Informatics. In: Proceedings of INTERACT 2011 International Conference on Human-Computer Interaction, Workshop: Promoting and Supporting Healthy Living by Design, pp. 5–7 (2011)
Google Scholar
Holzinger, A.: On Knowledge Discovery and Interactive Intelligent Visualization of Biomedical Data. In: Proceedings of the 9th International Joint Conference on e-Business and Telecommunications (ICETE 2012), pp. IS9–IS20 (2012)
Google Scholar
Holzinger, A., Geierhofer, R., Modritscher, F., Tatzl, R.: Semantic Information in Medical Information Systems: Utilization of Text Mining Techniques to Analyze Medical Diagnoses. Journal of Universal Computer Science 14(22), 3781–3795 (2008)
Google Scholar
Maynard, D., Bontcheva, K., Rout, D.: Challenges in developing opinion mining tools for social media. In: Proceedings of @NLP can u tag #user_generated_content?! Workshop at LREC 2012, Istanbul, Turkey (May 2012)
Google Scholar
Abbasi, A., Chen, H., Salem, A.: Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums. ACM Trans. Inf. Syst. 26(3), 12:1–12:34 (2008)
Article Google Scholar
Yi, L., Liu, B.: Web page cleaning for web mining through feature weighting. In: Proceedings of the 18th International Joint Conference on Artificial Intelligence, pp. 43–48. Morgan Kaufmann Publishers Inc., San Francisco (2003)
Google Scholar
Gamon, M.: Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis. In: Proceedings of the 20th International Conference on Computational Linguistics (2004)
Google Scholar
Kaiser, C.: Opinion Mining im Web 2.0 – Konzept und Fallbeispiel. HMD - Praxis der Wirtschaftsinformatik 46(268), 90–99 (2009)
Google Scholar
Kim, S.-M., Hovy, E.: Determining the Sentiment of Opinions. In: Proceedings of 20th International Conference on Computational Linguistics, Geneva, Switzerland, pp. 1367–1373 (2004)
Google Scholar
Nadali, S., Masrah, A.A.M., Rabiah, A.K.: Sentiment Classification of Customer Reviews Based on Fuzzy logic. In: Mahmood, A.K. (ed.) International Symposium in Information Technology (ITSim), pp. 1037–1044. IEEE (2010)
Google Scholar
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)
Google Scholar
Jindal, N., Liu, B.: Mining comparative sentences and relations. In: Proceedings of the 21st National Conference on Artificial Intelligence, vol. 2, pp. 1331–1336. AAAI Press (2006)
Google Scholar
Hatzivassiloglou, V., Wiebe, J.: Effects of Adjective Orientation and Gradability on Sentence Subjectivity. In: Proceedings of the 18th Conference on Computational Linguistics, pp. 299–305 (2000)
Google Scholar
Turney, P.D.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, pp. 417–424 (2002)
Google Scholar
Wiebe, J., Mihalcea, R.: Word Sense and Subjectivity. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1065–1072 (2006)
Google Scholar
Ding, X., Liu, B., Yu, P.S.: A Holistic Lexicon-Based Approach to Opinion Mining. In: International Conference on Web Search & Data Mining, Palo Alto, California, February 11-12. ACM, New York (2008)
Google Scholar
Popescu, A.-M., Etzioni, O.: Extracting Product Features and Opinions from Reviews. In: Proceedings of Human Language Technology Conference, pp. 339–346 (2005)
Google Scholar
Tong, S., Koller, D.: Support Vector Machine Active Learning with Applications to Text Classification. Journal of Machine Learning Research 2, 45–66 (2001)
Google Scholar
Weisberg, S.: Applied linear regression, vol. 528. Wiley (2005)
Google Scholar
Vapnik, V.: The nature of statistical learning theory. Springer (2000)
Google Scholar
Witten, I., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann (2005)
Google Scholar
Kreuzthaler, M., Bloice, M.D., Faulstich, L., Simonic, K.M., Holzinger, A.: A Comparison of Different Retrieval Strategies Working on Medical Free Texts. Journal of Universal Computer Science 17(7), 1109–1133 (2011)
Google Scholar
Holzinger, A., Simonic, K.M., Yildirim, P.: Disease-disease relationships for rheumatic diseases Web-based biomedical textmining and knowledge discovery to assist medical decision making. In: IEEE COMPSAC, pp. 573–580 (2012)
Google Scholar
Koza, J.: Genetic programming II: automatic discovery of reusable programs (1994)
Google Scholar
Affenzeller, M., Wagner, S., Winkler, S.: Genetic Algorithms and Genetic Programming: Modern Concepts and Practical Applications. Numerical Insights. Taylor & Francis (2009)
Google Scholar
Bai, X.: Predicting consumer sentiments from online text. Decision Support Systems 50(4), 732–742 (2011)
Article Google Scholar
Lin, C., He, Y.: Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 375–384. ACM, New York (2009)
Chapter Google Scholar
Faschang, P., Petz, G., Dorfer, V., Kern, T., Winkler, S.M.: An Approach to Mining Consumer’s Opinion on the Web. In: 13th International Conference on Computer Aided Systems Theory, Eurocast 2011, pp. 37–39 (2011)
Google Scholar
Faschang, P., Petz, G., Wimmer, M., Dorfer, V., Winkler, S.M.: Evaluation of Tools for Opinion Mining. In: EEE (ed.) Proceedings of the 2011 International Conference on E-Learning, E-Business, Enterprise Information Systems & E-Government, Las Vegas, pp. 3–9 (2011)
Google Scholar
Schaller, S., Winkler, S.M., Dorfer, V., Petz, G., Fürschuß, H.: A Machine Learning Suite for Opinion Mining in Web. In: Proceedings of the 14th International Asia Pacific Conference on Computer Aided System Theory, IEEE APCast (2012)
Google Scholar
Mihalcea, R., Banea, C., Wiebe, J.: Learning Multilingual Subjective Language via Cross-Lingual Projections. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 976–983 (2007)
Google Scholar
Benedetto, D., Caglioti, E., Loreto, V.: Language Trees and Zipping. Phys. Rev. Lett. 88(4), 48702 (2002), doi:10.1103/PhysRevLett.88.048702
Article Google Scholar
Grefenstette, G.: Comparing two language identification schemes. In: Proceedings of the 3rd International Conference on Statistical Analysis of Textual Data (JADT 1995), pp. 263–268 (1995)
Google Scholar
Řehůřek, R., Kolkus, M.: Language Identification on the Web: Extending the Dictionary Method. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 357–368. Springer, Heidelberg (2009)
Chapter Google Scholar
Cavnar, W.B., Trenkle, J.M.: Trenkle: N-Gram-Based Text Categorization. In: Proceedings of SDAIR 1994, 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 161–175 (1994)
Google Scholar
Dunning, T.: Statistical Identification of Language (1994)
Google Scholar
Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. SIGMOD Rec. 31(2), 84–93 (2002), doi:10.1145/565117.565137
Article Google Scholar
Weninger, T., Hsu, W.H.: Text Extraction from the Web via Text-to-Tag Ratio. In: Proceedings of the 2008 19th International Conference on Database and Expert Systems Application, pp. 23–28. IEEE Computer Society, Washington, DC (2008), doi:10.1109/DEXA.2008.12
Google Scholar
Weninger, T., Hsu, W.H., Han, J.: CETR: content extraction via tag ratios. In: Proceedings of the 19th International Conference on World Wide Web, pp. 971–980. ACM, New York (2010), doi:10.1145/1772690.1772789
Chapter Google Scholar
Pasternack, J., Roth, D.: Extracting article text from the web with maximum subsequence segmentation. In: Proceedings of the 18th International Conference on World Wide Web, pp. 971–980 (2009)
Google Scholar
Kohlschütter, C., Fankhauser, P., Nejdl, W.: Boilerplate detection using shallow text features. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 441–450. ACM, New York (2010)
Chapter Google Scholar
van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworth-Heinemann, Newton, MA, USA (1979)
Google Scholar
Baroni, M., Chantree, F., Kilgarriff, A., Sharoff, S.: Cleaneval: a Competition for Cleaning Web Pages
Google Scholar
Schmid, H.: TreeTagger - a language independent part-of-speech tagger, http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/ (accessed March 10, 2011)

Download references

Author information

Authors and Affiliations

University of Applied Sciences Upper Austria, Campus Steyr, Austria
Gerald Petz, Michał Karpowicz, Harald Fürschuß & Andreas Auinger
University of Applied Sciences Upper Austria, Campus Hagenberg, Austria
Stephan M. Winkler & Susanne Schaller
Medical Informatics, Statistics and Documentation, Medical University Graz, Austria
Andreas Holzinger

Authors

Gerald Petz
View author publications
You can also search for this author in PubMed Google Scholar
Michał Karpowicz
View author publications
You can also search for this author in PubMed Google Scholar
Harald Fürschuß
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Auinger
View author publications
You can also search for this author in PubMed Google Scholar
Stephan M. Winkler
View author publications
You can also search for this author in PubMed Google Scholar
Susanne Schaller
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Holzinger
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Computer and Information Sciences, Hosei University, 3-7-2, Kajino-cho, 184-8584, Koganei-shi, Tokyo, Japan
Runhe Huang
Faculty of Computer Science, University of New Brunswick, Box 440, E3B 5A3, Fredicton, NB, Canada
Ali A. Ghorbani
Department of Informatics, Systems and Communication, University of Milano Bicocca, Viale Sarca 336, 20126, Milano, Italy
Gabriella Pasi
Department of Administration Engineering, Keio University, 3-14-1 Hiyoshi, 223-8522, Kohoku-ku, Yokohama, Japan
Takahira Yamaguchi
University of Aizu, 102-C, Research Quadrangles, Tsuruga, Ikki-machi, 965-8580, Aizu-Wakamatsu City, Fukushima, Japan
Neil Y. Yen
Institute of Software, Chinese Academy of Sciences, no.4 Nan Sie Jie, Zhong Guan Cun, Hai Dian Qu, 100190, Beijing, China
Beijing Jin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Petz, G. et al. (2012). On Text Preprocessing for Opinion Mining Outside of Laboratory Environments. In: Huang, R., Ghorbani, A.A., Pasi, G., Yamaguchi, T., Yen, N.Y., Jin, B. (eds) Active Media Technology. AMT 2012. Lecture Notes in Computer Science, vol 7669. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35236-2_62

Download citation

DOI: https://doi.org/10.1007/978-3-642-35236-2_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35235-5
Online ISBN: 978-3-642-35236-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Text Preprocessing for Opinion Mining Outside of Laboratory Environments

Abstract

Access this chapter

Preview

Similar content being viewed by others

A survey on classification techniques for opinion mining and sentiment analysis

Data Analysis: Opinion Mining and Sentiment Analysis of Opinionated Unstructured Data

Recent Trends in Opinion Mining using Machine Learning Techniques

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On Text Preprocessing for Opinion Mining Outside of Laboratory Environments

Abstract

Access this chapter

Preview

Similar content being viewed by others

A survey on classification techniques for opinion mining and sentiment analysis

Data Analysis: Opinion Mining and Sentiment Analysis of Opinionated Unstructured Data

Recent Trends in Opinion Mining using Machine Learning Techniques

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation