Two Phase Extraction Method for Extracting Real Life Tweets Using LDA

  • Shuhei Yamamoto
  • Tetsuji Satoh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7808)


Nowadays, many twitter users tweet their personal affairs. Some of these posts can be quite beneficial for real life, for example, Eating, Appearance, Living, Disasters, and so on. In this paper, we propose a two phase extracting method for selecting beneficial tweets. In the first phase, many topics are extracted from a sea of tweets using Latent Dirichlet Allocation (LDA). In the second phase, associations between many topics and fewer aspects is built using a small set of labeled tweets. To enhance accuracy, the weight of feature words is calculated by information gain. Our prototype system demonstrates that the proposed method can extract the aspects of each unknown tweet.


Twitter Real Life LDA Two Phase Extraction 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Yamamoto, M., Ogasawara, H., Suzuki, I., Furukawa, M.: Tourism informatics:9. information propagation network for 2012 tohoku earthquake and tsunami on twitter. IPSJ Magazine 53(11), 1184–1191 (2012) (in Japanese)Google Scholar
  2. 2.
    Yamamoto, S., Satoh, T.: Real life information extraction method from twitter. In: The 4th Forum on Data Engineering and Information Management (DEIM 2012) F3-4 (2012) (in Japanese)Google Scholar
  3. 3.
    Kurashima, T., Tezuka, T., Tanaka, K.: Blog map of experiences: Extracting and geographically mapping visitor experiences from urban blogs. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 496–503. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
    Inui, K., Abe, S., Morita, H., Eguchi, M., Sumida, A., Sao, C., Hara, K., Murakami, K., Matsuyoshi, S.: Experience mining: Building a large-scale database of personal experiences and opinions from web documents. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 314–321 (2008)Google Scholar
  5. 5.
    Ramage, D., Dumais, S., Liebling, D.: Characterizing microblogs with topic models. In: Proceedings of ICWSM 2010, pp. 130–137 (2010)Google Scholar
  6. 6.
    Bollen, J., Pepe, A., Mao, H.: Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In: Proceedings of WWW 2010, pp. 450–453 (2010)Google Scholar
  7. 7.
    Diakopoulous, N.A., Shamma, D.A.: Characterizing debate performance via aggregated twitter sentiment. In: Proceedings of CHI 2010, pp. 1195–1198 (2010)Google Scholar
  8. 8.
    Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: Real-time event detection by social sensors. In: Proceedings of 18th International World Wide Web Conference, WWW 2010, pp. 851–860 (2010)Google Scholar
  9. 9.
    Zhao, X., Jiang, J., He, J., Song, Y., Achananuparp, P., Lim, E.P., Li, X.: Topical key phrase extraction from twitter. In: The 49th Annual Meeting of the Association for Computational Linguistics, pp. 379–388 (2011)Google Scholar
  10. 10.
    Mathioudakis, M., Koudas, N.: Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 International Conference on Management of Data, pp. 1155–1158 (2010)Google Scholar
  11. 11.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  12. 12.
    Griffiths, T.L., Steyvers, M.: Finding scientific topics. Proceedings of the National Academy of Science 101, 5228–5235 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Shuhei Yamamoto
    • 1
  • Tetsuji Satoh
    • 1
  1. 1.Graduate School of Library, Information and Media StudiesUniversity of TsukubaJapan

Personalised recommendations