Building an Open Document Management System with Components for Trust

  • Gábor Knapp
  • Gábor Magyar
  • Gergely Németh
Conference paper


Nowadays the Web can be considered as the largest database or more precisely, the largest document archive ever built. None the less the accessible information resources are in the deepness (principally in structured databases), and cannot be accessed by a simple free text search, the agents of Google index more than three thousand million pages. As technology and networking becomes more and more accessible and affordable, the number of documents increases exponentially.


Application Programming Interface Copy Detection Document Management Portable Document Format Metadata Schema 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    . T. Berners-Lee et al., Uniform Resource Identifiers (URI): Generic Syntax, The intemet Society, 1998,
  2. 2.
    K. Shafer, S. Weibel, E.Jul, and J.Fausey, Introduction to Persistent Resource Locators,
  3. 3.
    . J. Brassil, S. Low, N. Maxemchuk, and L. O’Gorman, Document marking and identification using both line and word shifting. Technical report, ATandT Bell Labratories, 1994.Google Scholar
  4. 1.
    N. Shivakumar and H. Garcia-Molina, Building a Scalable and Accurate Copy Detection Mechanism.Google Scholar
  5. 2.
    K. Monostori, A. Zaslaysky, and H. Schmidt, MatchDetectReveal: Finding Overlapping and Similar Digital Documents, Information Resources Management Association International Conference (IRMA2000), 2124 May, 2000 at Anchorage Hilton Hotel, Anchorage, Alaska, USA.Google Scholar
  6. 6.
  7. 7.
  8. 8.
    Robert M. Losee, Term Dependence: A Basis for Luhn and Zipf Models, in:Press Journal of the American Society for Information Science and Technology, 2001.Google Scholar
  9. 9.
    N. Shivakumar and H. Garcia-Molina, The SCAM Approach to Copy Detection in Digital Libraries, D-Lib Program,
  10. 10.
    The Dublin Core Metadata Element Set, NISO Press, 2000,
  11. 11.
    . G. Magyar and G. Knapp, The Impact of Technological Paradigm Shift on Information System Design, ISD 2002, Rigue.Google Scholar
  12. 12.
    . Tim Berners-Lee, James Hendler and Ora Lassila, The Semantic Web,Scientific American, 2001.Google Scholar
  13. 13.
  14. 14.
    G. Magyar et al., SQL Tranparency, Tecnhical report, 2003 (to be published)Google Scholar
  15. 15.
    C. Lagoze et al., The Open Archives Initiative Protocol for Metadata Harvesting, June 2002,
  16. 16.
    “In the web of words”, Hungarian national project 2002–2005,
  17. 17.
    “Feasibility study of the Hungarian National Digital Archive”, 2003Google Scholar
  18. 18.
    “Open source Hungarian morphology analyzer”, 2003–2004,

Copyright information

© Springer Science+Business Media New York 2004

Authors and Affiliations

  • Gábor Knapp
    • 1
  • Gábor Magyar
    • 2
  • Gergely Németh
    • 2
  1. 1.Metainfo Co. ÉrdHungary
  2. 2.Budapest University of Technology and EconomicsBudapestHungary

Personalised recommendations