Skip to main content

Automatic Document Tagging in Social Semantic Digital Library

  • Conference paper
Neural Information Processing (ICONIP 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5864))

Included in the following conference series:

  • 1715 Accesses

Abstract

The emergence of Web 2.0 has created a lot of annotation and personalization information about web resources. Extracting and utilizing these information to enhance the quality of services is a key target of modern digital libraries. In this paper, we present a novel Automatic Document Tagging (ADT) approach for digital libraries. In our approach, the ADT problem is formulated as a variant of multi-class classification problem. But differently, the training data for ADT is collected from the user’s historic tags and only partially labeled. The incompleteness of the training data makes the training a more challenging problem. To overcome this problem, an efficient randomized online training algorithm (RPL) is proposed. RPL algorithm has two phases: (i) random exploitation and (ii) classifier update. The experimental results from both synthetic and real-word data demonstrate the effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive aggressive algorithms. Journal of Machine Learning Research 7, 551–585 (2006)

    MathSciNet  Google Scholar 

  2. Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research 3(951) (2003)

    Google Scholar 

  3. Duda, R., Hart, P.: Pattern Classification and Scene Analysis. Wiley, Chichester (1973)

    MATH  Google Scholar 

  4. Fink, M., Shalev-Shwartz, S., Singer, Y., Ullman, S.: On- line multiclass learning by interclass hypothesis sharing. In: Proceedings of the 23rd International Conference on Machine Learning (2006)

    Google Scholar 

  5. Fox, E.: The digital libraries initiative - update and discussion. In: Bulletin of the America Society of Information Science, October/November 1999, vol. 26 (1999)

    Google Scholar 

  6. Freund, Y., Schapire, R.: Large margin classification using the perceptron algorithm. Machine Learning 37(3), 277–296 (1999)

    Article  MATH  Google Scholar 

  7. Geroimenko, V.: A semantic web primer. Computer Journal 48(1) (2006)

    Google Scholar 

  8. Kahn, R., Cerf, V.: An open architecture for digital library system and a plan for its development. Digital Libary Project 1 (1998)

    Google Scholar 

  9. Kivinen, J., Warmuth, M.: Exponentiated gradient versus gradient descent for linear predictors. Information and Computation 132 (January 1997)

    Google Scholar 

  10. Kruk, S.R., Decker, S., Zieborak, L.: JeromeDL - adding semantic web technologies to digital libraries. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 716–725. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  11. Kruk, S., Woroniecki, T., Gzella, A., Dabrowski, M., McDaniel, B.: Anatomy of a social semantic library. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519. Springer, Heidelberg (2007)

    Google Scholar 

  12. Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 331–339. Morgan Kaufmann, San Francisco (1995)

    Google Scholar 

  13. Langford, J., Zhang, T.: The epoch-greedy algorithm for contextual multi-armed bandits. In: NIPS (2007)

    Google Scholar 

  14. Langville, A., Carl, D.: Google’s PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, Princeton (2006)

    MATH  Google Scholar 

  15. Mika, P.: Ontologies are us: A unified model of social networks and semantics. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 522–536. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web (1999)

    Google Scholar 

  17. Rosenblatt, F.: The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65, 386–407 (1988)

    Article  Google Scholar 

  18. Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)

    MATH  Google Scholar 

  19. Weston, J., Watkins, C.: Support vector machines for multi-class pattern recognition. In: Proceedings of the Seventh European Symposium on Artificial Neural Networks (April 1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xu, X., Niu, Z. (2009). Automatic Document Tagging in Social Semantic Digital Library. In: Leung, C.S., Lee, M., Chan, J.H. (eds) Neural Information Processing. ICONIP 2009. Lecture Notes in Computer Science, vol 5864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10684-2_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10684-2_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10682-8

  • Online ISBN: 978-3-642-10684-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics