Skip to main content

Mining Ambiguities Using Pixel-Based Content Extraction

  • Conference paper
  • First Online:
Proceedings of the International Conference on Soft Computing Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 398))

  • 1206 Accesses

Abstract

Internet and mobile computing have become a major societal force in that down-to-earth issues are being addressed and sorted out whether they relate to online shopping or securing driving information in unknown places. Here the major concern of communication is that the Web content should reach the user in a short period of time. So information extraction needs to be at a basic level and easier to implement without depending on any major software. The present study focuses on extraction of information from the available text and media-type data after it is converted into digital form. The approach uses the basic pixel map representation of data and converting them through numerical means, so that issues of language, text script and format do not pose problems. With the numerically converted data, key clusters similar to keywords used in any search method are developed and content is extracted through different approaches making it computation-intensive for easiness. One approach is that statistical features of the images are extracted from the pixel map of the image. The extracted features are presented to the fuzzy clustering algorithm. The similarity metric being Euclidean distance and the accuracy is compared and presented. The concept of ambiguity is introduced in the paper, by comparing objects like ‘computer,’ which have explicit content representation possible to an abstract subject like ‘soft-computing,’ where vagueness and ambiguity are possible in representation. With this as the objective, the approach used for content extraction is compared and how within certain bounds it could be possible to extract the content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ross Mounce Content Mining University of Bath, Open Knowledge Foundation Panton Fellow PPT presention

    Google Scholar 

  2. Gottron T (2008) Content code blurring: a new approach to content extraction, DEXA ’08: 19th international workshop on database and expert systems applications. IEEE Computer Society, pp 29–33

    Google Scholar 

  3. Gupta S, Kaiser G, Neistadt D, Grimm G (2003) DOM based content extraction of HTML documents, WWW ’03: Proceedings of the 12th international conference on world wide web. ACM Press, New York, pp 207– 214

    Google Scholar 

  4. Charulatha BS, Rodrigues P, Chitralekha T, Rajaraman (2014) A clustering for knowledgeable web mining, ICAEES2014, a Springer international conference

    Google Scholar 

  5. Moreno J, Deschacht K, Moens M (2009) Language independent content extraction from web pages. In: Proceeding of the 9th Dutch-Belgian information retrieval workshop, pp 50–55

    Google Scholar 

  6. Charulatha BS, Rodrigues P, Chitralekha T, Rajaraman A (2014) Heterogeneous Clustering, ICICES 2014 published by IEEE. ISBN 978-1-4799-3835-3

    Google Scholar 

  7. Charulatha BS, Rodrigues P, Chitralekha T, Rajaraman A (2015) Content Extraction in traditional web pages related to defense, bilingual international conference on information technology: yesterday, today and tomorrow, 19–21 Feb 2015 organised by DRDO India

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. S. Charulatha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer India

About this paper

Cite this paper

Charulatha, B.S., Rodrigues, P., Chitralekha, T., Rajaraman, A. (2016). Mining Ambiguities Using Pixel-Based Content Extraction. In: Suresh, L., Panigrahi, B. (eds) Proceedings of the International Conference on Soft Computing Systems. Advances in Intelligent Systems and Computing, vol 398. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2674-1_50

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2674-1_50

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2672-7

  • Online ISBN: 978-81-322-2674-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics