Learning Domain-Specific Ontologies from the Web

  • Wenkai Mo
  • Peng Wang
  • Haiyue Song
  • Jianyu Zhao
  • Xiang Zhang
Part of the Communications in Computer and Information Science book series (CCIS, volume 406)


This paper proposes an approach of learning domain-specific ontologies from the Web. First, a webpage is segmented into text blocks by analyzing the visual features and DOM structures. Second, text blocks will be labeled by Conditional Random Fields (CRFs) model. Third, a local ontology of the webpage is constructed based on vision tree and labeled text blocks. Finally, the ontology for a website is generated by merging the local ontologies. Our experimental results on real world datasets show that the proposed method is effective and efficient for domain-specific ontology learning, and the results have average 0.91 F-measure for concepts, instances and subclass-of relations.


ontology ontology learning condition random fields page segmentation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wong, W., Liu, W., Bennamoun, M.: Ontology Learning from Text: A Look Back and into the Future. ACM Computing Surveys 44(4), 20–36 (2012)CrossRefGoogle Scholar
  2. 2.
    Du, T.C., Li, F., King, I.: Managing knowledge on the Web - Extracting ontology from HTML Web. Decision Support Systems 47(4), 319–331 (2009)CrossRefGoogle Scholar
  3. 3.
    Alani, H., Kim, S., Millard, D.E., et al.: Automatic Ontology-Based Knowledge Extraction from Web Documents. IEEE Intelligent Systems 18(1), 14–21 (2003)CrossRefGoogle Scholar
  4. 4.
    Arasu, A., Garcia-Molin, H.: Extracting Structured Data from Web Pages. In: ACM SIGMOD International Conference on Management of Data, New York, USA (2003)Google Scholar
  5. 5.
    Cai, D., Yu, S., Wen, J.-R., Ma, W.-Y.: VIPS: a Vision-based Page Segmentation Algorithm. Microsoft Technical Report (2003)Google Scholar
  6. 6.
    Wang, P., You, Y., Xu, B., Zhao, J.: Extracting Academic Information from Conference Web Pages. In: 23rd IEEE International Conference on Tools with Artificial Intelligence, Boca Raton, USA (2011)Google Scholar
  7. 7.
    Zhu, J., Zhang, B., Nie, Z., Wen, J.-R.: Webpage Understanding: an Integrated Approach. In: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, USA (2007)Google Scholar
  8. 8.
    Nie, Z., Wen, J.-R., Ma, W.-Y.: Webpage Understanding: Beyond Page-level Search. ACM SIGMOD Record 37(4), 48–54 (2009)CrossRefGoogle Scholar
  9. 9.
    Yao, L., Tang, J., Li, J.: A Unified Approach to Researcher Profiling. In: IEEE/WIC/ACM International Conference on Web Intelligence, Fremont, USA (2007)Google Scholar
  10. 10.
    Brickley, D., Miller, L.: FOAF Vocabulary Specification, Namespace Document,
  11. 11.
    Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: 18th International Conference on Machine Learning, Williamstown, USA (2001)Google Scholar
  12. 12.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc. (1993)Google Scholar
  13. 13.
    Hand, D.J., Yu, K.: Idiot’s Bayes—Not So Stupid After All? International Statistical Review 69(3), 385–398 (2001)zbMATHGoogle Scholar
  14. 14.
    Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20(3), 273–297 (1995)zbMATHGoogle Scholar
  15. 15.
    Eddy, S.R.: Hidden Markov Models. Current Opinion in Structural Biology 6(3), 361–365 (1996)CrossRefGoogle Scholar
  16. 16.
    McCallum, A., Freitag, D., Pereira, F.: Maximum Entropy Markov Models for Information Extraction and Segmentation. In: 17th International Conference on Machine Learning, Stanford, CA, USA (2000)Google Scholar
  17. 17.
    Forney Jr., G.D.: The Viterbi Algorithm. Proceedings of the IEEE 61(3), 268–278 (1973)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Zhu, C., Byrd, R.H., Lu, P., et al.: Algorithm 778: L-BFGS-B: Fortran Subroutines for Large-Scale Bound-Constrained Optimization. ACM Transactions on Mathematical Software 23(4), 550–560 (1997)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Wang, P., Xu, B.: Lily: Ontology alignment results for OAEI 2009. In: Proceedings of the Fourth International Workshop on Ontology Matching (OM 2009), Washington, D.C., USA (2009)Google Scholar
  20. 20.
    Wang, P., Xu, B.: Debugging ontology mappings: a static approach. Computing and Informatics 27(1), 21–36 (2008)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Wenkai Mo
    • 2
  • Peng Wang
    • 1
    • 2
  • Haiyue Song
    • 2
  • Jianyu Zhao
    • 2
  • Xiang Zhang
    • 1
    • 2
  1. 1.School of Computer Science and EngineeringSoutheast UniversityNanjingChina
  2. 2.College of Software EngineeringSoutheast UniversityNanjingChina

Personalised recommendations