Abstract
Massive social event relevant messages are generated in online social media, which makes the filtering and screening a great challenge. In order to obtain massages with high quality, a high quality information extraction framework based on kernel principal component analysis and wavelet transformation (KPCA-WT) is proposed. First, based on multiple features fusion, we design an algorithm to extract the microblogs of high quality, which transforms the features into wavelet domain to capture the detailed differences between the feature signals. Then the weights of the features are evaluated by EM algorithm and fused further to get a comprehensive value of each message. In addition, to reduce the effect of noisy features and speed up the operation, these features are processed through kernel principal component analysis before transforming into wavelet domain. Experimental results show that the proposed framework can extract information with higher quality, less redundancy, and greatly reduce the time consumption.
This research is supported by the Natural Science Foundation of China under contract No. 61472291, and Natural Science Foundation of Hubei Province, China under contract No. ZRY2014000901.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Peng, M., Zhu, J., Li, X., et al.: Central topic model for event-oriented topics mining in microblog stream. In: CIKM 2015, pp. 1611–1620 (2015)
Peng, M., Huang, J., Fu, H., Zhu, J., Zhou, L., He, Y., Li, F.: High quality microblog extraction based on multiple features fusion and time-frequency transformation. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013, Part II. LNCS, vol. 8181, pp. 188–201. Springer, Heidelberg (2013)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. 39(1), 1–38 (1977)
Scholkopf, B., Smola, A., Mller, K.R.: Kernel principal component analysis. In: ICANN 1997, pp. 583–588 (1997)
O’Connor, B., Krieger, M., Ahn, D.: Tweetmotif: exploratory search and topic summarization for twitter. In: ICWSM 2010, pp. 384–385 (2010)
Yang, X., Ghoting, A., Ruan, Y., et al.: A framework for summarizing and analyzing twitter feeds. In: KDD 2012, pp. 370–378 (2012)
Sharifi, B., Hutton, M.A., Kalita, J.K.: Experiments in microblog summarization. In: SocialCom 2010, pp. 49–56 (2010)
Takamura, H., Yokono, H., Okumura, M.: Summarizing a document stream. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 177–188. Springer, Heidelberg (2011)
Zhu, J., et al.: Coherent topic hierarchy: a strategy for topic evolutionary analysis on microblog feeds. In: Li, J., Sun, Y., Yu, X., Sun, Y., Dong, X.L., Dong, X.L. (eds.) WAIM 2015. LNCS, vol. 9098, pp. 70–82. Springer, Heidelberg (2015). doi:10.1007/978-3-319-21042-1_6
Chen, Y., Cheng, X., Yang, S.: Finding high quality threads in web forums. J. Softw. 22(8), 1785–1804 (2011)
Xi, W., Lind, J., Brill, E.: Learning effective ranking functions for newsgroup search. In: SIGIR 2004, pp. 394–401 (2004)
Ghose, A., Ipeirotis, P.G.: Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. TKDE 23(10), 1498–1512 (2011)
Fox, E.A., Shaw, J.A.: Combination of multiple searches. In: NIST SP, pp. 243–243 (1994)
Ogilvie, P., Callan, J.: Combining document representations for known-item search. In: SIGIR 2003, pp. 143–150 (2003)
Fan, W., Gordon, M.D., Pathak, P.: A generic ranking function discovery framework by genetic programming for information retrieval. Inf. Process. Manage. 40(4), 587–602 (2004)
He, Q., Chang, K., Lim, E.P.: Analyzing feature trajectories for event detection. In: SIGIR 2007, pp. 207–214 (2007)
Daubechies, I.: Ten Lectures on Wavelets. Society for Industrial and Applied Mathematic, Philadelphia (1992)
Chipman, H.A., Kolaczyk, E.D., McCulloch, R.E.: Adaptive bayesian wavelet shrinkage. J. Am. Stat. Assoc. 92(440), 1413–1421 (1977)
Burstei, J., Wolska, M.: Toward evaluation of writing style: finding overly repetitive word use in student essays. In: EACL 2003, pp. 35–42 (2003)
Becker, H., Naaman, M., Gravano, L.: Selecting quality twitter content for events. In: ICWSM 2011 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Peng, M. et al. (2016). KPCA-WT: An Efficient Framework for High Quality Microblog Extraction in Time-Frequency Domain. In: Cui, B., Zhang, N., Xu, J., Lian, X., Liu, D. (eds) Web-Age Information Management. WAIM 2016. Lecture Notes in Computer Science(), vol 9659. Springer, Cham. https://doi.org/10.1007/978-3-319-39958-4_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-39958-4_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39957-7
Online ISBN: 978-3-319-39958-4
eBook Packages: Computer ScienceComputer Science (R0)