Abstract
On the Web and network environments, fast and precise data transmission is the most essential part, and many additional elements should be supported for this. Data discovery is one of the elements. This paper focuses on data discovery strategies, especially on the main factors related to the discovery strategies for Markup data, which takes considerable amount of Web and network data. For the evaluation, we declared the factors and simulated the factors on our data discovery system to show how much they effect for the performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Extensible Markup Language (XML) 1.0 (1998), http://www.w3c.org/TR/REC-xml
Lian, W., Cheung, D.W., Yiu, S.: An Efficient and Scalable Algorithm for Clustering XML Documents by Structure. IEEE Transactions on Knowledge and Data Engineering 16(1) (2004)
Costa, G., Manco, G., Ortale, R., Tagarelli, A.: A tree-based approach to clustering XML documents by structure. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS, vol. 3202, pp. 137–148. Springer, Heidelberg (2004)
Klein, P., Tirthapura, S., Sharvit, D., Kimia, B.: A Tree-edit -distance Algorithm for Comparing Simple, Closed Shapes. In: Proceedings of the 11th Annual ACM SIAM Symposium of Discrete Algorithms, pp. 696–704 (2000)
Borenstein, E., Sharon, E., Ullman, S.: Combining Top-down and Bottom-up Segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2004)
Ekram, R.A., Adma, A., Baysal, O.: diffX: An Algorithm to Detect Changes in Multi-Version XML Documents. In: Proceedings of the 2005 Conference on the Centre for Advanced Studies on Collaborative Research (2005)
Zhang, K., Wang, J.T., Shasha, D.: On the Editing Distance between Undirected Acyclic Grahphs and Related Problems. In: Proceedings of the 6th Annual Symposium of Combinatorial Pattern Matching (1995)
Rafiei, D., Mendelzon, A.: Similarity-Based Queries for Time Series Data. In: Proceedings of the ACM International Conference on Management of Data, pp. 13–24 (1997)
Lee, M.L., Yang, L.H., Hsu, W., Yang, X.: XClust: Clustering XML Schemas for Effective Integration. In: Proceedings of the 11th ACM International Conference on Information and Knowledge Management, pp. 292–299 (2002)
Moon, H.J., Kim, K.J., Park, C.G., Yoo, C.W.: Effective Similarity Discovery from Semi-structured Documents. International Journal of Multimedia and Ubiquitous Engineering 1(4), 12–18 (2006)
Yang, R., Kalnis, P., Tung, A.K.H.: Similarity Evaluation on Tree-Structured Data. In: Proceedings of the ACM International Conference on Management of Data, pp. 754–765 (2005)
Flesca, S., Manco, G., Masciari, E., Pontieri, L., Pugliese, A.: Detecting Structural Similarities between XML Documents. In: Proceedings of the International Workshop on the Web and Databases (2002)
Flesca, S., Manco, G., Masciari, E., Pontieri, L., Pugliese, A.: Fast Detection of XML Structural Similarity. IEEE Transactions on Knowledge and Data Engineering 17(2) (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moon, HJ., Yeom, SH., Choi, J., Yoo, CW. (2009). Data Discovery and Related Factors of Documents on the Web and the Network. In: Gervasi, O., Taniar, D., Murgante, B., Laganà, A., Mun, Y., Gavrilova, M.L. (eds) Computational Science and Its Applications – ICCSA 2009. ICCSA 2009. Lecture Notes in Computer Science, vol 5592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02454-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-02454-2_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02453-5
Online ISBN: 978-3-642-02454-2
eBook Packages: Computer ScienceComputer Science (R0)