Skip to main content

Automatic Generation of Sitemaps Based on Navigation Systems

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Big Data (MOD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10122))

Included in the following conference series:

Abstract

In this paper we present a method to automatically discover sitemaps from websites. Given a website, existing automatic solutions extract only a flat list of urls that do not show the hierarchical structure of its content. Manual approaches, performed by web-masters, extract deeper sitemaps (with respect to automatic methods). However, in many cases, also because of the natural evolution of the websites’ content, generated sitemaps do not reflect the actual content becoming soon helpless and confusing for users. We propose a different approach that is both automatic and effective. Our solution combines an algorithm to extract frequent patterns from navigation systems (e.g. menu, nav-bar, content list, etc.) contained in a website, with a hierarchy extraction algorithm able to discover rich hierarchies that unveil relationships among web pages (e.g. relationships of super/sub topic). Experimental results, show how our approach discovers high quality sitemaps that have a deep hierarchy and are complete in the extracted urls.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://slickplan.com/, http://www.screamingfrog.co.uk, https://www.xml-sitemaps.com/.

References

  1. Fumarola, F., Lanotte, P.F., Ceci, M., Malerba, D.: CloFAST: closed sequential pattern mining using sparse and vertical id-lists. Know. Inf. Syst 48(2), 429–463 (2016)

    Article  Google Scholar 

  2. Fumarola, F., Weninger, T., Barber, R., Malerba, D., Han, J.: Hylien: A hybrid approach to general list extraction on the web. In: Proceedings of the 20th International Conference Companion on World Wide Web, WWW 2011, pp. 35–36. ACM, New York (2011)

    Google Scholar 

  3. Lanotte, P.F., Fumarola, F., Ceci, M., Scarpino, A., Torelli, M.D., Malerba, D.: Automatic extraction of logical web lists. In: Andreasen, T., Christiansen, H., Cubero, J.-C., Raś, Z.W. (eds.) ISMIS 2014. LNCS (LNAI), vol. 8502, pp. 365–374. Springer, Heidelberg (2014). doi:10.1007/978-3-319-08326-1_37

    Google Scholar 

  4. Lie, H.W., Bos, B., Sheets, C.S.: Designing for the Web, 2nd edn. Addison-Wesley Professional, Reading (1999).

    Google Scholar 

  5. Nielsen, J., Loranger, H.: Prioritizing Web Usability. New Riders Publishing, Thousand Oaks (2006)

    Google Scholar 

  6. Weninger, T., Bisk, Y., Han, J.: Document-topic hierarchies from document graphs. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 635–644. ACM, New York (2012)

    Google Scholar 

Download references

Acknowledgment

This project has received funding from the European Commission through the project MAESTRA - Learning from Massive, Incompletely annotated, and Structured Data (Grant number ICT-2013-612944).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pasqua Fabiana Lanotte .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Lanotte, P.F., Fumarola, F., Malerba, D., Ceci, M. (2016). Automatic Generation of Sitemaps Based on Navigation Systems. In: Pardalos, P., Conca, P., Giuffrida, G., Nicosia, G. (eds) Machine Learning, Optimization, and Big Data. MOD 2016. Lecture Notes in Computer Science(), vol 10122. Springer, Cham. https://doi.org/10.1007/978-3-319-51469-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-51469-7_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-51468-0

  • Online ISBN: 978-3-319-51469-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics