Advertisement

Practical Web Scraping for Data Science

Best Practices and Examples with Python

  • Seppe vanden Broucke
  • Bart Baesens

Table of contents

  1. Front Matter
    Pages i-xvi
  2. Web Scraping Basics

    1. Front Matter
      Pages 1-1
    2. Seppe vanden Broucke, Bart Baesens
      Pages 3-23
    3. Seppe vanden Broucke, Bart Baesens
      Pages 25-48
    4. Seppe vanden Broucke, Bart Baesens
      Pages 49-77
  3. Advanced Web Scraping

    1. Front Matter
      Pages 79-79
    2. Seppe vanden Broucke, Bart Baesens
      Pages 81-126
    3. Seppe vanden Broucke, Bart Baesens
      Pages 127-154
    4. Seppe vanden Broucke, Bart Baesens
      Pages 155-172
  4. Managerial Concerns and Best Practices

    1. Front Matter
      Pages 173-173
    2. Seppe vanden Broucke, Bart Baesens
      Pages 175-186
    3. Seppe vanden Broucke, Bart Baesens
      Pages 187-195
    4. Seppe vanden Broucke, Bart Baesens
      Pages 197-298
  5. Back Matter
    Pages 299-306

About this book

Introduction

This book provides a complete and modern guide to web scraping, using Python as the programming language, without glossing over important details or best practices. Written with a data science audience in mind, the book explores both scraping and the larger context of web technologies in which it operates, to ensure full understanding. The authors recommend web scraping as a powerful tool for any data scientist’s arsenal, as many data science projects start by obtaining an appropriate data set.

Starting with a brief overview on scraping and real-life use cases, the authors explore the core concepts of HTTP, HTML, and CSS to provide a solid foundation. Along with a quick Python primer, they cover requests and Beautiful Soup, Selenium for JavaScript-heavy sites, and web crawling in detail. The book finishes with a recap of best practices and a collection of examples that bring together everything you've learned and illustrate various data science use cases.

Keywords

HTTP HTML CSS JavaScript Beautiful Soup Selenium Web Crawling JSON Cookies Python Primer Networking Python

Authors and affiliations

  • Seppe vanden Broucke
    • 1
  • Bart Baesens
    • 2
  1. 1. KU LeuvenLeuvenBelgium
  2. 2.Dept of Decision Sci & Info ManagemKU Leuven Dept of Decision Sci & Info ManagemLeuvenBelgium

Bibliographic information

Industry Sectors
Pharma
Automotive
Chemical Manufacturing
Biotechnology
Finance, Business & Banking
Electronics
IT & Software
Telecommunications
Consumer Packaged Goods
Energy, Utilities & Environment
Aerospace
Engineering