Incremental Discovery of Sequential Pattern from Semi-structured Document Using Grammatical Inference

Thakur, Ramesh; Jain, Suresh; Chaudhari, Narendra S.

doi:10.1007/978-3-642-28073-3_30

Ramesh Thakur¹⁸,
Suresh Jain¹⁹ &
Narendra S. Chaudhari²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7154))

Included in the following conference series:

International Conference on Distributed Computing and Internet Technology

1203 Accesses

Abstract

On the World Wide Web a large numbers of information is available in the form of semi-structured format. Knowledge discovery in semi-structured document has been recognized as promising task. Since semi structured document is typically hidden within HTML formatting intended for human viewing the details of which vary widely from site to site and frequent changes made to their formatting so we can’t construct a global schema, discovery of interesting rules form it is complex and tedious process. Most of the existing system uses hand-coded wrappers to extract information, which is monotonous and time consuming. An intelligent and automated method is needed for their processing. Learning grammatical information from given sample of semi-structured documents has attracted lots of attention in the past decades. To understand “what say the data” is necessary to know the structure of data to discover the syntactic-semantic knowledge of its language.

The problem of learning the correct grammar for the unknown language form finite example of the language is known as grammatical inference problem. In automated grammar learning, the task is to infer grammar rules from given information about the target language. If example belongs to the target language it is called positive example otherwise it is called negative example. In this paper we propose a grammar inference methodology to automate the construction of grammar rules and facilitate the process of information extraction. We are using hybrid technique of association analysis and sequential algorithm to generate context free grammar rules from semi-structured document (HTML document).

Our algorithm that infers a sequential pattern from a sequence of discrete HTML tags. The basic insight is that sub-string is selected on the basis of high support factor by taking entire sentences into account. Which appears more frequently in string can be replaced by a grammatical rule that generate the sub-string, and this process is repeated many times, producing a single length rules of the sequence. The result is strictly a context-free grammar rules, which provide a compact summary of corpora that aids understanding of its properties.

Download to read the full chapter text

Chapter PDF

Fast Discovery of Generalized Sequential Patterns

A sequential tree approach for incremental sequential pattern mining

Article 25 November 2016

Tree-Miner: Mining Sequential Patterns from SP-Tree

Keywords

Author information

Authors and Affiliations

IIPS, DAVV, Indore, India
Ramesh Thakur
KCB Technical Academy, Indore, India
Suresh Jain
Indian Institute of Technology, Indore, India
Narendra S. Chaudhari

Authors

Ramesh Thakur
View author publications
You can also search for this author in PubMed Google Scholar
Suresh Jain
View author publications
You can also search for this author in PubMed Google Scholar
Narendra S. Chaudhari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Mathematical Sciences, C.I.T. Campus, Taramani, 600113, Chennai, India
R. Ramanujam
Industrial Software Systems, ABB Corporate Research Center, Bangalore, India
Srini Ramaswamy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thakur, R., Jain, S., Chaudhari, N.S. (2012). Incremental Discovery of Sequential Pattern from Semi-structured Document Using Grammatical Inference. In: Ramanujam, R., Ramaswamy, S. (eds) Distributed Computing and Internet Technology. ICDCIT 2012. Lecture Notes in Computer Science, vol 7154. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28073-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-28073-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28072-6
Online ISBN: 978-3-642-28073-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Incremental Discovery of Sequential Pattern from Semi-structured Document Using Grammatical Inference

Abstract

Chapter PDF

Similar content being viewed by others

Fast Discovery of Generalized Sequential Patterns

A sequential tree approach for incremental sequential pattern mining

Tree-Miner: Mining Sequential Patterns from SP-Tree

Keywords

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Incremental Discovery of Sequential Pattern from Semi-structured Document Using Grammatical Inference

Abstract

Chapter PDF

Similar content being viewed by others

Fast Discovery of Generalized Sequential Patterns

A sequential tree approach for incremental sequential pattern mining

Tree-Miner: Mining Sequential Patterns from SP-Tree

Keywords

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation