Skip to main content

The Essential Toolbox of Data Science: Python, R, Git, and Docker

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2104))

Abstract

The daily work in data science involves a set of essential tools: the programming languages Python and R, the version control tool Git and the virtualization tool Docker. Proficiency in at least one programming language is required for data science. R is tied to a computing environment that focuses on statistics, in which many new algorithms in genomics and biomedicine are first published. Python has a root in system administration, and is a superb language for general programming. Version control is critical to managing complex projects, even if software development is not involved. Docker container is becoming a key tool for deployment, portability, and reproducibility. This chapter provides a self-contained practical guide of these topics so that readers can use it as a reference and to plan their training.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Acknowledgments

This work has been funded in part by the US National Institutes of Health via grants UH2 AI132345 (Li), U2C ES030163 (Jones, Li, Morgan, Miller), U01 CA235493 (Li, Xia, Siuzdak), U2C ES026560 (Miller), P30 ES019776 (Marsit), P50 ES026071 (McCauley), and the US EPA grant 83615301 (McCauley).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuzhao Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Pittard, W.S., Li, S. (2020). The Essential Toolbox of Data Science: Python, R, Git, and Docker. In: Li, S. (eds) Computational Methods and Data Analysis for Metabolomics. Methods in Molecular Biology, vol 2104. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-0239-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-0239-3_15

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-0238-6

  • Online ISBN: 978-1-0716-0239-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics