Advertisement

Learning Domain-Specific Ontologies from the Web

  • Wenkai Mo
  • Peng Wang
  • Haiyue Song
  • Jianyu Zhao
  • Xiang Zhang
Part of the Communications in Computer and Information Science book series (CCIS, volume 406)

Abstract

This paper proposes an approach of learning domain-specific ontologies from the Web. First, a webpage is segmented into text blocks by analyzing the visual features and DOM structures. Second, text blocks will be labeled by Conditional Random Fields (CRFs) model. Third, a local ontology of the webpage is constructed based on vision tree and labeled text blocks. Finally, the ontology for a website is generated by merging the local ontologies. Our experimental results on real world datasets show that the proposed method is effective and efficient for domain-specific ontology learning, and the results have average 0.91 F-measure for concepts, instances and subclass-of relations.

Keywords

ontology ontology learning condition random fields page segmentation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Wong, W., Liu, W., Bennamoun, M.: Ontology Learning from Text: A Look Back and into the Future. ACM Computing Surveys 44(4), 20–36 (2012)CrossRefGoogle Scholar
  2. 2.
    Du, T.C., Li, F., King, I.: Managing knowledge on the Web - Extracting ontology from HTML Web. Decision Support Systems 47(4), 319–331 (2009)CrossRefGoogle Scholar
  3. 3.
    Alani, H., Kim, S., Millard, D.E., et al.: Automatic Ontology-Based Knowledge Extraction from Web Documents. IEEE Intelligent Systems 18(1), 14–21 (2003)CrossRefGoogle Scholar
  4. 4.
    Arasu, A., Garcia-Molin, H.: Extracting Structured Data from Web Pages. In: ACM SIGMOD International Conference on Management of Data, New York, USA (2003)Google Scholar
  5. 5.
    Cai, D., Yu, S., Wen, J.-R., Ma, W.-Y.: VIPS: a Vision-based Page Segmentation Algorithm. Microsoft Technical Report (2003)Google Scholar
  6. 6.
    Wang, P., You, Y., Xu, B., Zhao, J.: Extracting Academic Information from Conference Web Pages. In: 23rd IEEE International Conference on Tools with Artificial Intelligence, Boca Raton, USA (2011)Google Scholar
  7. 7.
    Zhu, J., Zhang, B., Nie, Z., Wen, J.-R.: Webpage Understanding: an Integrated Approach. In: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA (2007)Google Scholar
  8. 8.
    Nie, Z., Wen, J.-R., Ma, W.-Y.: Webpage Understanding: Beyond Page-level Search. ACM SIGMOD Record 37(4), 48–54 (2009)CrossRefGoogle Scholar
  9. 9.
    Yao, L., Tang, J., Li, J.: A Unified Approach to Researcher Profiling. In: IEEE/WIC/ACM International Conference on Web Intelligence, Fremont, USA (2007)Google Scholar
  10. 10.
    Brickley, D., Miller, L.: FOAF Vocabulary Specification, Namespace Document, http://xmlns.com/foaf/0.1/
  11. 11.
    Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: 18th International Conference on Machine Learning, Williamstown, USA (2001)Google Scholar
  12. 12.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc. (1993)Google Scholar
  13. 13.
    Hand, D.J., Yu, K.: Idiot’s Bayes—Not So Stupid After All? International Statistical Review 69(3), 385–398 (2001)zbMATHGoogle Scholar
  14. 14.
    Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)zbMATHGoogle Scholar
  15. 15.
    Eddy, S.R.: Hidden Markov Models. Current Opinion in Structural Biology 6(3), 361–365 (1996)CrossRefGoogle Scholar
  16. 16.
    McCallum, A., Freitag, D., Pereira, F.: Maximum Entropy Markov Models for Information Extraction and Segmentation. In: 17th International Conference on Machine Learning, Stanford, CA, USA (2000)Google Scholar
  17. 17.
    Forney Jr., G.D.: The Viterbi Algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Zhu, C., Byrd, R.H., Lu, P., et al.: Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-Scale Bound-Constrained Optimization. ACM Transactions on Mathematical Software 23(4), 550–560 (1997)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Wang, P., Xu, B.: Lily: Ontology alignment results for OAEI 2009. In: Proceedings of the Fourth International Workshop on Ontology Matching (OM 2009), Washington, D.C., USA (2009)Google Scholar
  20. 20.
    Wang, P., Xu, B.: Debugging ontology mappings: a static approach. Computing and Informatics 27(1), 21–36 (2008)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Wenkai Mo
    • 2
  • Peng Wang
    • 1
    • 2
  • Haiyue Song
    • 2
  • Jianyu Zhao
    • 2
  • Xiang Zhang
    • 1
    • 2
  1. 1.School of Computer Science and EngineeringSoutheast UniversityNanjingChina
  2. 2.College of Software EngineeringSoutheast UniversityNanjingChina

Personalised recommendations