Abstract
In the practical examples so far we have read data from CSV files, placed it into SQL queries, and inserted it into a database. In general, this three step process is known as ETL for Extract, Transform, Load. Extract means getting data out of some non-database file. Transform means converting it to match our ontology and type system. Load means loading it into the database. ETL is often performed on massive scales, with many computers working on the various steps on multiple data sources simultaneously. For example, this happens when a transport client sends you a hard disc with a terabyte of traffic sensor data on it. In the “big data” movement, the transformation step might not be so important, as the philosophy here is to worry about ontology only at runtime, and store the data in whatever form you can manage when it arrives.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Pandas has many powerful features to perform SQL-like operations from inside Python, and to assist with data munging. For direct code translations between SQL and pandas, see http://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html . Or you can use converters like pandasql which let you run actual SQL syntax on their data, without using a database. Also refer to the table in Chap. 2 for some useful Pandas commands for Transport applications.
- 2.
Pronounced “print F”, from the printf command in C-like languages.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Fox, C. (2018). Data Preparation. In: Data Science for Transport. Springer Textbooks in Earth Sciences, Geography and Environment. Springer, Cham. https://doi.org/10.1007/978-3-319-72953-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-72953-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72952-7
Online ISBN: 978-3-319-72953-4
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)