Skip to main content

Towards Next Generation CiteSeer: A Flexible Architecture for Digital Library Deployment

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4172))

Abstract

CiteSeer began as the first search engine for scientific literature to incorporate Autonomous Citation Indexing, and has since grown to be a well-used, open archive for computer and information science publications, currently indexing over 730,000 academic documents. However, CiteSeer currently faces significant challenges that must be overcome in order to improve the quality of the service and guarantee that CiteSeer will continue to be a valuable, up-to-date resource well into the foreseeable future. This paper describes a new architectural framework for CiteSeer system deployment, named CiteSeer Plus. The new framework supports distributed indexing and storage for load balancing and fault-tolerance as well as modular service deployment to increase system flexibility and reduce maintenance costs. In order to facilitate novel approaches to information extraction, a blackboard framework is built into the architecture.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buteau, B.L.: A generic framework for distributed, cooperating blackboard systems. In: Proceedings of the 1990 ACM annual conference on Cooperation, February 20-22, pp. 358–365 (1990)

    Google Scholar 

  2. Chen, H., Dhar, V.: A knowledge-based approach to the design of document-based retrieval systems. ACM SIGOIS Bulletin 11(2-3), 281–290 (1990)

    Article  Google Scholar 

  3. Garfield, E.: Science Citation Index - A new dimension in indexing. Science 144, 649–654 (1964)

    Article  Google Scholar 

  4. Giles, C.L., Bollacker, K., Lawrence, S.: CiteSeer: An Automatic Citation Indexing System. In: Digital Libraries 1998: Third ACM Conf. on Digital Libraries, pp. 89–98. ACM Press, New York (1998)

    Chapter  Google Scholar 

  5. Giles, C.L., Councill, I.G.: Who gets acknowledged: measuring scientific contributions through automatic acknowledgement indexing. PNAS 101(51), 17599–17604 (2004)

    Article  Google Scholar 

  6. Han, H., Lee Giles, C., Manavoglu, E., Zha, H., Zhang, Z., Fox, E.A.: Automatic Document Metadata Extraction using Support Vector Machines. In: Proceedings of the 2003 Joint Conference on Digital Libraries, JCDL 2003 (2003)

    Google Scholar 

  7. Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning (2001)

    Google Scholar 

  8. Lawrence, S., Lee Giles, C.: Searching the World Wide Web. Science 280(5360), 98–100 (1998)

    Article  Google Scholar 

  9. Leek, T.R.: Information extraction using hidden Markov models. Masters thesis, UC San Diego (1997)

    Google Scholar 

  10. Penny Nii, H.: Blackboard systems: The blackboard model of problem solving and the evolution of blackboard architectures. The AI Magazine VII(2), 38–53 (summer 1986)

    Google Scholar 

  11. O’Reilly, T.: What Is Web 2.0 Design Patterns and Business Models for the Next Generation of Software, http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html

  12. Peng, F., McCallum, A.: Accurate information extraction from research papers using conditional random fields. In: Proceedings of Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp. 329–336 (2004)

    Google Scholar 

  13. Petinot, Y., Lee Giles, C., Bhatnagar, V., Teregowda, P.B., Han, H., Councill, I.: A Service-Oriented Architecture for Digital Libraries. In: ICSOC 2004, November 15-19 (2004)

    Google Scholar 

  14. Seymore, K., McCallum, A., Rosenfeld, R.: Learning hidden Markov model structure for information extraction. In: Papers from the AAAI 1999 Workshop on Machine Learning for Information Extration, July 1999, pp. 37–42 (1999)

    Google Scholar 

  15. Stribling, J., Councill, I.G., Kaashoek, M.F., Morris, R., Shenker, S.: Overcite: A cooperative digital research library. In: Castro, M., van Renesse, R. (eds.) IPTPS 2005. LNCS, vol. 3640, pp. 69–79. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Van de Sompel, H., Hochstenbach, P.: Reference linking in a hybrid library environment. Part 1: Frameworks for linking. D-Lib Magazine 5(4) (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Councill, I.G., Giles, C.L., Di Iorio, E., Gori, M., Maggini, M., Pucci, A. (2006). Towards Next Generation CiteSeer: A Flexible Architecture for Digital Library Deployment. In: Gonzalo, J., Thanos, C., Verdejo, M.F., Carrasco, R.C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2006. Lecture Notes in Computer Science, vol 4172. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11863878_10

Download citation

  • DOI: https://doi.org/10.1007/11863878_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44636-1

  • Online ISBN: 978-3-540-44638-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics