Finding, Extracting, and Building Academic Linked Data

Wang, Peng; Zhang, Xiang

doi:10.1007/978-1-4614-6880-6_3

Peng Wang⁶ &
Xiang Zhang⁶

Part of the book series: Springer Proceedings in Complexity ((SPCOM))

1785 Accesses

Abstract

This paper addresses the problem of finding and extracting academic information from conference Web pages, then organizing academic information as ontologies, and finally generating academic linked data by matching these ontologies. The main contributions include (1) a topic-crawling method and lightweight crawling method based on search engine is presented. Crawling seeds, relevant websites filter, and crawling update strategy are discussed. (2) A new vision-based approach for extracting academic information is proposed. It first segments Web pages into text blocks and then classifies these text blocks into predefined categories. The initial classification results are improved by post-processing. Finally, academic information is extracted from the classified text blocks. (3) A global ontology is used to describe the background domain knowledge, and then the extracted academic information of each website is organized as local ontologies. Finally, academic linked data is generated by matching all local ontologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bizera, C., Lehmannb, J., Kobilarova, G., et al.: DBpedia – a crystallization point for the Web of Data. J. Web Semant. 7, 154–165 (2009)
Article Google Scholar
Tang, J., Zhang, J., Yao, L., Li, J., et al.: ArnetMiner: extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV (2008)
Google Scholar
Tang, J., Zhang, D., Yao, L.: Social network extraction of academic researchers. In: Proceedings of 2007 IEEE International Conference on Data Mining, Omaha, NE (2007)
Google Scholar
Chang, C.-H., Kayed, M., Girgis, M.R., Shaalan, K.: A survey of web information extraction systems. IEEE Trans. Knowl. Data Eng. 18, 1411–1428 (2006)
Article Google Scholar
Laender, A., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. SIGMOD Record 31, 84–93 (2002)
Article Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. In: Proceedings of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA (1998)
Google Scholar
Flake, G.W., Lawrence, S., Lee Giles, C., Coetzee, F.M.: Self-organization and identification of web communities. IEEE Comp. 35, 66–71 (2002)
Article Google Scholar
Cai, D., Yu, S., Wen, J.-R., Ma, W.-Y.: VIPS: a vision-based page segmentation algorithm. Microsoft Technical Report (2003)
Google Scholar
Liu, W., Meng, X., Meng, W.: ViDE: a vision-based approach for deep web data extraction. IEEE Trans. Knowl. Data Eng. 22, 447–460 (2010)
Article Google Scholar
Wang, P., Xu, B.: Lily: ontology alignment results for OAEI 2009. In: The 4th International Workshop on Ontology Matching (OM2009), Washington, DC (2009)
Google Scholar

Download references

Acknowledgments

This work is supported by the NSF of China (61003156 and 61003055) and the Natural Science Foundation of Jiangsu Province (BK2009136 and BK2011335).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Southeast University, Nanjing, China
Peng Wang & Xiang Zhang

Authors

Peng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peng Wang .

Editor information

Editors and Affiliations

, Dept. of Computer Science and Technology, Tsinghua University, Room 10-206, East main building, Beijing, 100084, China, People's Republic
Juanzi Li
, School of Comp. Sci. & Eng., Southeast University, Dongda Road 2, Nanjing, 211189, Jiangsu, China, People's Republic
Guilin Qi
Peking University, Inst. of Computer Science & Tech., North Zhongguancun Street 128, Beijing, 100871, China, People's Republic
Dongyan Zhao
L3S Research Center, Leibniz University Hannover, Appelstr. 4, Hannover, 30167, Germany
Wolfgang Nejdl
Tsinghua Campus H202B, Shenzhen City, 518055, China, People's Republic
Hai-Tao Zheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, P., Zhang, X. (2013). Finding, Extracting, and Building Academic Linked Data. In: Li, J., Qi, G., Zhao, D., Nejdl, W., Zheng, HT. (eds) Semantic Web and Web Science. Springer Proceedings in Complexity. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6880-6_3

Download citation

DOI: https://doi.org/10.1007/978-1-4614-6880-6_3
Published: 02 May 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6879-0
Online ISBN: 978-1-4614-6880-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics