Data Sources for Prediction: Databases, Hybrid Data and the Web

Weiss, Sholom M.; Indurkhya, Nitin; Zhang, Tong

doi:10.1007/978-1-84996-226-1_7

Sholom M. Weiss⁵,
Nitin Indurkhya⁶ &
Tong Zhang⁷

Part of the book series: Texts in Computer Science ((TCS))

3316 Accesses

Abstract

Data for automated prediction comes from many sources. In previous chapters, discussions centered on pure text mining. Here, we expand our horizons to encompass both text and structured numerical data. Initially, we review the ideal data representations for prediction using either numerical or text data. We consider numerous sources of data including databases, the web, and hybrid forms of text and numerical data. Prototypical examples of blended numerical and text data are given. Using the web as a source of data for prediction is examined. Among the examples presented of web-sourced data are downloaded scientific publications formatted in XML, stock price data and related newswire headlines. Sentiment and opinion analysis are considered with examples from online product reviews. Predictive mining of electronic medical records mining is presented as an example of mixed-data mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

R. D’Agostino, R. Vasan, M. Pencina, P. Wolf, M. Cobain, J. Massaro, and W. Kannel. General cardiovascular risk profile for use in primary care: the framingham heart study. Circulation, 743–753, 2008. http://www.framinghamheartstudy.org/risk/gencardio.html.
R. Feldman and L. Ungar. Applied text mining, tutorial. In Proceedings of KDD-2009. ACM, New York, 2009. http://www.cis.upenn.edu/~ungar/KDD/KDD_tutorial.pdf.
Google Scholar
M. Hu and B. Liu. Mining and summarizing customer reviews. In KDD’04: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 168–177. ACM, New York, 2004.
Google Scholar
J. Leskovec, L. Backstrom, and J. Kleinberg. Meme-tracking and the dynamics of the news cycle. In Proceedings of KDD-2009, page 297. ACM, New York, 2009.
Google Scholar
P. Melville, W. Gryc, and R. Lawrence. Sentiment analysis of blogs by combining lexical knowledge with text classification. In Proceedings of KDD-2009, page 1275. ACM, New York, 2009.
Google Scholar
B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2):1–135, 2008.
Article Google Scholar
H. Rui, A. Whinston, and E. Winkler. Follow the tweets. Wall Street Journal, Technology section, 30 November 2009.
Google Scholar

Download references

Author information

Authors and Affiliations

T.J. Watson Research Center, IBM Corporation, Kitchawan Road 1101, Yorktown Heights, 10598, NY, USA
Sholom M. Weiss
School of Computer Science & Engg., University of New South Wales, Sydney, 2052, NSW, Australia
Nitin Indurkhya
Dept. Statistics, Hill Center, Rutgers University, Piscataway, 08854-8019, NJ, USA
Tong Zhang

Authors

Sholom M. Weiss
View author publications
You can also search for this author in PubMed Google Scholar
Nitin Indurkhya
View author publications
You can also search for this author in PubMed Google Scholar
Tong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sholom M. Weiss .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Weiss, S.M., Indurkhya, N., Zhang, T. (2010). Data Sources for Prediction: Databases, Hybrid Data and the Web. In: Fundamentals of Predictive Text Mining. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-84996-226-1_7

Download citation

DOI: https://doi.org/10.1007/978-1-84996-226-1_7
Publisher Name: Springer, London
Print ISBN: 978-1-84996-225-4
Online ISBN: 978-1-84996-226-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics