Abstract
Internet and mobile computing have become a major societal force in that down-to-earth issues are being addressed and sorted out whether they relate to online shopping or securing driving information in unknown places. Here the major concern of communication is that the Web content should reach the user in a short period of time. So information extraction needs to be at a basic level and easier to implement without depending on any major software. The present study focuses on extraction of information from the available text and media-type data after it is converted into digital form. The approach uses the basic pixel map representation of data and converting them through numerical means, so that issues of language, text script and format do not pose problems. With the numerically converted data, key clusters similar to keywords used in any search method are developed and content is extracted through different approaches making it computation-intensive for easiness. One approach is that statistical features of the images are extracted from the pixel map of the image. The extracted features are presented to the fuzzy clustering algorithm. The similarity metric being Euclidean distance and the accuracy is compared and presented. The concept of ambiguity is introduced in the paper, by comparing objects like ‘computer,’ which have explicit content representation possible to an abstract subject like ‘soft-computing,’ where vagueness and ambiguity are possible in representation. With this as the objective, the approach used for content extraction is compared and how within certain bounds it could be possible to extract the content.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ross Mounce Content Mining University of Bath, Open Knowledge Foundation Panton Fellow PPT presention
Gottron T (2008) Content code blurring: a new approach to content extraction, DEXA ’08: 19th international workshop on database and expert systems applications. IEEE Computer Society, pp 29–33
Gupta S, Kaiser G, Neistadt D, Grimm G (2003) DOM based content extraction of HTML documents, WWW ’03: Proceedings of the 12th international conference on world wide web. ACM Press, New York, pp 207– 214
Charulatha BS, Rodrigues P, Chitralekha T, Rajaraman (2014) A clustering for knowledgeable web mining, ICAEES2014, a Springer international conference
Moreno J, Deschacht K, Moens M (2009) Language independent content extraction from web pages. In: Proceeding of the 9th Dutch-Belgian information retrieval workshop, pp 50–55
Charulatha BS, Rodrigues P, Chitralekha T, Rajaraman A (2014) Heterogeneous Clustering, ICICES 2014 published by IEEE. ISBN 978-1-4799-3835-3
Charulatha BS, Rodrigues P, Chitralekha T, Rajaraman A (2015) Content Extraction in traditional web pages related to defense, bilingual international conference on information technology: yesterday, today and tomorrow, 19–21 Feb 2015 organised by DRDO India
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Charulatha, B.S., Rodrigues, P., Chitralekha, T., Rajaraman, A. (2016). Mining Ambiguities Using Pixel-Based Content Extraction. In: Suresh, L., Panigrahi, B. (eds) Proceedings of the International Conference on Soft Computing Systems. Advances in Intelligent Systems and Computing, vol 398. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2674-1_50
Download citation
DOI: https://doi.org/10.1007/978-81-322-2674-1_50
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2672-7
Online ISBN: 978-81-322-2674-1
eBook Packages: EngineeringEngineering (R0)