Abstract
It is becoming increasingly popular to publish data on the web in the form of documents. Segment-search is a semantic search engine for web documents. It presents a query language. It is suitable for skilled and semi-skilled domain experts, who are adept at the use of a specific collection of documents. It returns suitable documents selected by using document fragments, that satisfy user’s query. In contrast to knowledge graph approach, the technique is based on performing web page segmentation as per user perceived objects. Thus, it allows users’ to query without the knowledge of complex query languages or learning about the data organization schemes. The proposed system is scalable and can cater to large scale web document sources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Spink, A., et al.: Health Inform. Libr. J. Health Libr. Group (2004)
Chen, J., Zhou, B., Shi, J., Zhang, H., Wu, Q.: Function-based object model towards website adaptation. In: The 10th International World Wide Conference (2001)
Kohlschütter, C., Nejdl, W.: A densitometric approach to web page segmentation. In: Proceedings of CIKM 2008, 26–30 October (2008)
Braga, D., Campi, A., Ceri, S.: XQBE (XQuery By Example): a visual interface to the standard XML query language. ACM Trans. Database Syst. TODS 30(2), 398–444 (2005). https://doi.org/10.1145/1071610.1071613
Fernandes, D., de Moura, E.S., da Silva, A.S., Ribeiro-Neto, B., Braga, E.: A site oriented method for segmenting web pages. In: Proceedings of SIGIR 2011, 24–28 July 2011
Cai, D., Yu, S., Wen, J.-R., Ma, W.-Y.: Extracting hierarchical structure for web pages based on visual representation. In: Proceedings of 5th Asia-Pacific Web Conference, APWeb 2003, Xian, China, 23–25 April 2003, pp. 596–596 (2003)
Cai, D., He, X., Wen, J.-R., Ma, W.-Y.: Block-based web search. In: Proceedings of SIGIR (2004)
Gu, X., Chen, J., Ma, W., Chen, G.: Visual based content understanding towards web adaptation. In: Proceedings of 2nd International Conference on Adaptive Hypermedia and Adaptive Web-based Systems (AH2002), Spain, pp. 29–31 (2002)
Cao, J., Mao, B., Luo, J.: A segmentation method for web page analysis using shrinking and dividing. Int. J. Parallel Emergent Distrib. Syst. 25(2), 93–104 (2010)
Ramaswamy, L., Iyengar, A., Liu, L., Douglis, F.: Automatic detection of fragments in dynamically generated web pages. In: Proceedings of the 13th International Conference on World Wide Web (2004)
Zloof, M.M.: Query-By-Example: a data base language. IBM Syst. J. 16(4), 324–343 (1977)
El-Shayeb, M.A., El-Beltagy, S.R., Rafea, A.: Extracting the latent hierarchical structure of web documents. In: Proceedings of SITIS (2006)
Asfia, M., Pedram, M.M., Rahmani, A.M.: Main content extraction from detailed web pages. Int. J. Comput. Appl. (IJCA) 4(11), 18–21 (2010)
Song, R., Liu, H., Wen, J.-R., Ma, W.-Y.: Learning block importance models for webpages. In: Proceedings of WWW (2004)
White, R.W., Dumais, S., Teevan, J.: How medical expertise influences web search interaction. In: Proceedings of SIGIR 2008, 20–24 July 2008, Singapore (2008)
Abiteboul, S., Buneman, P., Suciu, D.: Data on the web: from relations to semistructured data and XML (2000)
Saito, T.L., Morishita, S.: Relational-style XML query. In: SIGMOD, pp. 303–314 (2008)
Hong, T.W., Clark, K.L.: Towards a universal web wrapper. In: Proceedings of FLAIRS Conference (2004)
Liu, W., Meng, X., Meng, W.: ViDE: a vision-based approach for deep WebData extraction. IEEE Trans. Knowl. Data Eng. 22, 447–460 (2010). Member, IEEE
Diao, Y., Lu, H., Chen, S., Tian, Z.: Toward learning based web query processing. In: Proceedings of the 26th International Conference on Very Large Databases, Cairo, Egypt (2000)
Nie, Z., Wen, J.-R., Ma, W.-Y.: Webpage understanding: beyond page-level search. Sigmod Rec. 37(4), 48–54 (2008)
Chung, C.Y., Gertz, M., Sundaresan, N.: Reverse engineering for web data: from visual to semantic structures. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002)
Juan, H., Zhiqiang, G., Hui, X., Yuzhong, Q.: DeSeA: a page segmentation based algorithm for information extraction. In: Proceedings of the First International Conference on Semantics, Knowledge, and Grid, SKG 2005
Yang, Y., Zhang, H.J.: HTML Page Analysis Based on Visual Cues. IEEE (2001)
Pnueli, A., Bergman, R., Schein, S., Barkol, O.: Web page layout via visual segmentation. HP Laboratories (2009)
Chakrabarti, D., Kumar, R., Punera, K.: A graph-theoretic approach to webpage segmentation. In: Proceedings of WWW 2008, Refereed Track: Search-corpus Characterization and Search Performance, Beijing, China (2008)
Zou, J., Le, D., Thoma, G.R.: Combining DOM tree and geometric layout analysis for online medical journal article segmentation. In: JCDL 2006, Chapel Hill, North Carolina, USA, 11–15 June 2006
Zhang, C.: Medical students, and healthcare professionals use Wikipedia? UBCMJ, 3(2) (2012)
Cai, D., He, X., Wen, J.-R., Ma, W.-Y.: Block-level link analysis. In: SIGIR 04, Sheffield, South Yorkshire, UK, July 2004
Jenkins, C., Corritore, C.L., Wiedenbeck, S.: Patterns of information seeking on the web: a qualitative study of domain expertise and web expertise. IT & Soc. 1(3), 64–89 (2003)
Chen, H., Lally, A.M., Zhu, B., Chau, M.: HelpfulMed: intelligent searching for medical information over the internet. J. Am. Soc. Inf. Sci. Technol. 54(7), 683–694 (2003)
Cohen, S., Mamou, J., Kanza, Y., Sagiv, Y.: XSEarch: a semantic search engine for XML. In: Proceedings of the 2003 VLDB Conference, Berlin, Germany (2003)
Kandogan, E., Krishnamurthy, R., Raghavan, S., Vaithyanathan, S., Zhu, H.: Avatar semantic search: a database approach to information retrieval. In: SIGMOD 2006, 27–29 June 2006, Chicago, Illinois, USA (2016)
Li, F., Pan, T., Jagadish, H.V.: Schema-free SQL. In: SIGMOD 2014, Snowbird, UT, USA (2014)
Jagadish, H.V., Nandi, A., Qiun, L.: Organic databases. In: DNIS 2014 Workshop, pp. 49–63 (2014)
Kahng, M., Navathe, S.B., Stasko, J.T., Chau, D.H.: Interactive browsing and navigation in relational databases. In: 2016 Proceedings Of VLDB, vol. 9, no. 12, pp. 1017–1028 (2016)
Yang, Y., Agrawal, D., Jagadishy, H.V., Tung, A.K.H., Wu, S.: An efficient parallel keyword search engine on knowledge graphs. In: ICDE, pp. 338–349 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Sarode, R.P., Sachdeva, S., Chu, W., Bhalla, S. (2019). Segment-Search vs Knowledge Graphs: Making a Key-Word Search Engine for Web Documents. In: Madria, S., Fournier-Viger, P., Chaudhary, S., Reddy, P. (eds) Big Data Analytics. BDA 2019. Lecture Notes in Computer Science(), vol 11932. Springer, Cham. https://doi.org/10.1007/978-3-030-37188-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-37188-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37187-6
Online ISBN: 978-3-030-37188-3
eBook Packages: Computer ScienceComputer Science (R0)