Mining Ambiguities Using Pixel-Based Content Extraction

Charulatha, B. S.; Rodrigues, Paul; Chitralekha, T.; Rajaraman, Arun

doi:10.1007/978-81-322-2674-1_50

B. S. Charulatha¹⁶,
Paul Rodrigues¹⁷,
T. Chitralekha¹⁸ &
…
Arun Rajaraman¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 398))

1206 Accesses

Abstract

Internet and mobile computing have become a major societal force in that down-to-earth issues are being addressed and sorted out whether they relate to online shopping or securing driving information in unknown places. Here the major concern of communication is that the Web content should reach the user in a short period of time. So information extraction needs to be at a basic level and easier to implement without depending on any major software. The present study focuses on extraction of information from the available text and media-type data after it is converted into digital form. The approach uses the basic pixel map representation of data and converting them through numerical means, so that issues of language, text script and format do not pose problems. With the numerically converted data, key clusters similar to keywords used in any search method are developed and content is extracted through different approaches making it computation-intensive for easiness. One approach is that statistical features of the images are extracted from the pixel map of the image. The extracted features are presented to the fuzzy clustering algorithm. The similarity metric being Euclidean distance and the accuracy is compared and presented. The concept of ambiguity is introduced in the paper, by comparing objects like ‘computer,’ which have explicit content representation possible to an abstract subject like ‘soft-computing,’ where vagueness and ambiguity are possible in representation. With this as the objective, the approach used for content extraction is compared and how within certain bounds it could be possible to extract the content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Image Fuzzy Segmentation Using Aggregated Distance Functions and Pixel Descriptors

REFII Model and Fuzzy Logic as a Tool for Image Classification Based on Image Example

Content Extraction Studies for Multilingual Unstructured Web Documents

References

Ross Mounce Content Mining University of Bath, Open Knowledge Foundation Panton Fellow PPT presention
Google Scholar
Gottron T (2008) Content code blurring: a new approach to content extraction, DEXA ’08: 19th international workshop on database and expert systems applications. IEEE Computer Society, pp 29–33
Google Scholar
Gupta S, Kaiser G, Neistadt D, Grimm G (2003) DOM based content extraction of HTML documents, WWW ’03: Proceedings of the 12th international conference on world wide web. ACM Press, New York, pp 207– 214
Google Scholar
Charulatha BS, Rodrigues P, Chitralekha T, Rajaraman (2014) A clustering for knowledgeable web mining, ICAEES2014, a Springer international conference
Google Scholar
Moreno J, Deschacht K, Moens M (2009) Language independent content extraction from web pages. In: Proceeding of the 9th Dutch-Belgian information retrieval workshop, pp 50–55
Google Scholar
Charulatha BS, Rodrigues P, Chitralekha T, Rajaraman A (2014) Heterogeneous Clustering, ICICES 2014 published by IEEE. ISBN 978-1-4799-3835-3
Google Scholar
Charulatha BS, Rodrigues P, Chitralekha T, Rajaraman A (2015) Content Extraction in traditional web pages related to defense, bilingual international conference on information technology: yesterday, today and tomorrow, 19–21 Feb 2015 organised by DRDO India
Google Scholar

Download references

Author information

Authors and Affiliations

JNTUK, Kakinada, Andhra Pradesh, India
B. S. Charulatha
DMI College of Engineering, Thiruvallur, Chennai, Tamil Nadu, India
Paul Rodrigues
Central University Puduchery, Puduchery, India
T. Chitralekha
IIT Madras, Chennai, Tamil Nadu, India
Arun Rajaraman

Authors

B. S. Charulatha
View author publications
You can also search for this author in PubMed Google Scholar
Paul Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
T. Chitralekha
View author publications
You can also search for this author in PubMed Google Scholar
Arun Rajaraman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. S. Charulatha .

Editor information

Editors and Affiliations

Electrical & Electronics Engineering, Noorul Islam College of Engineering, Kumaracoil, Tamil Nadu, India
L. Padma Suresh
Electrical Engineering, IIT Delhi, New Delhi, India
Bijaya Ketan Panigrahi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Charulatha, B.S., Rodrigues, P., Chitralekha, T., Rajaraman, A. (2016). Mining Ambiguities Using Pixel-Based Content Extraction. In: Suresh, L., Panigrahi, B. (eds) Proceedings of the International Conference on Soft Computing Systems. Advances in Intelligent Systems and Computing, vol 398. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2674-1_50

Download citation

DOI: https://doi.org/10.1007/978-81-322-2674-1_50
Published: 08 December 2015
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2672-7
Online ISBN: 978-81-322-2674-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Mining Ambiguities Using Pixel-Based Content Extraction

Abstract

Access this chapter

Similar content being viewed by others

Image Fuzzy Segmentation Using Aggregated Distance Functions and Pixel Descriptors

REFII Model and Fuzzy Logic as a Tool for Image Classification Based on Image Example

Content Extraction Studies for Multilingual Unstructured Web Documents

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Mining Ambiguities Using Pixel-Based Content Extraction

Abstract

Access this chapter

Similar content being viewed by others

Image Fuzzy Segmentation Using Aggregated Distance Functions and Pixel Descriptors

REFII Model and Fuzzy Logic as a Tool for Image Classification Based on Image Example

Content Extraction Studies for Multilingual Unstructured Web Documents

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation