An Involuntary Data Extraction and Information Summarization Expending Ontology

  • R. DeepaEmail author
  • R. Manicka Chezian
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 394)


The World Wide Web is a repository of huge data that are the web pages. The web pages are acquired using a query given by the user. The web pages may sometimes be unstructured and unequal. The main objective of the study is information extraction and summarization using ontology. The system proposes a new method named as Structural Semantic Domain Ontology (SSDO) for effective information retrieval. The proposed system automatically extracts the unstructured information from the repository and stores it in the search buffer. The information extraction will be performed using domain ontology. The main disadvantage of the existing system is, the information which is extracted from various sources is not aligned properly. The system may fail to know, where the exact information is located on the website. The current proposal overcomes the above problem by adopting the technologies that are named as pair alignment, top-down alignment, and loop structure algorithms. The proposed system will acquire things such as if the user needs to know any data, then the user will type the detail known as a label. Then the web page will extract the information with a proper description and additional details.


SSDO Ontology domain storage DELTA Information retrieval Pair Alignment Technologies 


  1. 1.
    Suresh Babu A, Premchand P, Govardhan A. Record-level information extraction from a web page based on visual features. Int J Comput Technol Electron Eng. (IJCTEE). 2012;2 2:99–105. ISSN 2249-6343.Google Scholar
  2. 2.
    Arasu A, Garcia-Molina H. Extracting structured data from web pages. In: SIGMOD 2003, pp. 337–348, San Diego, CA, 9–12 June 2003.Google Scholar
  3. 3.
    Su W, Wang J, Lochovsky FH. ODE: ontology-assisted data extraction. ACM Trans Database Syst. 2009;34 2. Article 12, Publication date: June 2009.Google Scholar
  4. 4.
    Chen K, Zhang F, He FL. Extracting data records based on global schema. Appl Mech Mater. (AMM). 2010;20–23:553–558.Google Scholar
  5. 5.
    Bing L, Lam W, Gu Y. Towards a unified solution: data record region detection and segmentation. In: CIKM’11, Glasgow, Scotland, UK, 24–28 Oct 2011.Google Scholar
  6. 6.
    Su W, Wang J, Lochovsky FH, Liu Y. Combining tag and value similarity for data extraction and alignment. IEEE Trans Knowl. Data Eng. 2012;24 7:1186–1200.Google Scholar
  7. 7.
    Deepika J. Non-duplicate data extraction in web databases by combining tag and value similarity. Int J Adv Inform Sci Technol. (IJAIST). 2013;9 9:16–22. ISSN: 2319:2682.Google Scholar
  8. 8.
    Jude Victor M, John Aravindhar D, Dheepa V. Web data extraction and alignment. Int J Sci Res. (IJSR). 2013;2 3:129–132. India Online ISSN: 2319‐7064.Google Scholar
  9. 9.
    Manonmani K, Kalidass M. Automated data extraction and arrangement using segentation based tag and value resemblance analysis. Int J Comput Sci Manag Res. 2013;2 4:2211–2216. ISSN 2278-733X.Google Scholar
  10. 10.
    da Costa MG, Zhiguo J. Web structure mining: an introduction. In: Proceedings of the 2005 IEEE International Conference on Information Acquisition, Hong Kong and Macau, China, 27 June–July 3 2005.Google Scholar
  11. 11.
    Oro E, Ruffolo, M. Sila: a spatial instance learning approach for deep webpages. Proceedings of the 20th ACM international conference on Information and knowledge management. ACM, 2011.Google Scholar
  12. 12.
    Ruiz EJ, Hristidis V, Ipeirotis PG. Facilitating document annotation using content and querying value. IEEE Trans Knowl Data Eng. 2014;26 2:336–349.Google Scholar
  13. 13.
    Vinod Kumar R, Kumar Somayajula SP. Automatic template extraction from heterogeneous web pages. Int J Adv Res Comput Sci Softw Eng. 2012;2 8:408–418. ISSN: 2277 128X,.Google Scholar
  14. 14.
    Baldonado M, Chang C-CK, Gravano L, Paepcke A. The stanford digital library metadata architecture. Int J Digit Libr. 1997;1:108–21.CrossRefGoogle Scholar
  15. 15.
    Bruce KB, Cardelli L, Pierce BC. Comparing object encodings. In: Abadi M, Ito T editors. Theoretical aspects of computer software. Lecture notes in computer science, vol. 1281. Berlin: Springer; 1997. pp. 415–438.Google Scholar

Copyright information

© Springer India 2016

Authors and Affiliations

  1. 1.Research Department of Computer ScienceNGM CollegePollachi, CoimbatoreIndia

Personalised recommendations