Skip to main content

R Tutorial for Oceanographers

  • Chapter
  • First Online:
Oceanographic Analysis with R

Abstract

R comes with an excellent tutorial that, like many fine tutorials, tends to be ignored by people with little patience for material presented in a general manner. This is why the present chapter uses oceanographic examples to explain R concepts, and why code makes up so much of the text. The early examples are designed to encourage readers to become comfortable whilst navigating the R documentation, because this skill can be the key to moving from simple examples to real-world applications. The main concepts of R data types and language features are illustrated here in practical terms, with many of the explanations involving graphical representation. Since experienced R users are unlikely to study this chapter in great depth, specialized methods of oceanographic analysis are mainly deferred to succeeding chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://cran.r-project.org/.

  2. 2.

    The package count was inferred from web archives at http://wayback.archive.org/.

  3. 3.

    http://github.com .

  4. 4.

    The indented lines of the Makefile must start with a tab character, not spaces.

  5. 5.

    R displays a prompt before the input, but this is omitted throughout this book.

  6. 6.

    A subtle point is that R does not always look up the values of variables until they are needed. This is related to R concepts of “lazy evaluation ” and “promises ”.

  7. 7.

    www.iana.org/time-zones.

  8. 8.

    Alternative time origins may be specified to as.POSIXlt() , and this can be helpful in working with times represented in other systems such as SPSS and SAS .

  9. 9.

    This profile results from a nonlinear regression (Sect. 2.5.5.2) of the oxygen profile at station 112 of the section dataset in the ocedata package.

  10. 10.

    See http://www.gnu.org/software/gsl/ for more on GSL.

  11. 11.

    Note the use of the UNESCO equation of state here; with the GSW equation, longitude and latitude would also have to be supplied; see Sect. 5.2.1 and Appendix D.

  12. 12.

    Readers who wish to learn more details of object orientation in R might start with Chambers (2008) or Wickham (2014).

  13. 13.

    For more on performance issues, see Appendix E.

  14. 14.

    See Sect. 5.7 for more on Argo floats.

  15. 15.

    Several coastline resolutions are provided in the ocedata and oce packages.

  16. 16.

    Home electricity provides a dramatic illustration. Although voltage measurements may give a confidence interval on the mean that barely departs from 0V, the measurement uncertainty will indicate that any given measurement could easily be of order 100V. That is why electrical outlets must be covered up, in houses with young children.

  17. 17.

    Note the use of set.seed( ) to let readers reconstruct the example.

  18. 18.

    It is unwise to use hypothesis tests without considering their limitations. Some issues of misapplication are outlined by, e.g., Johnson and Omland (2004) and Hauer (2004), and deep concerns about the misuse of p values are raised in a highly influential editorial in The American Statistician (Wasserstein and Lazar 2016).

  19. 19.

    If p < 2.2 × 10−16, R regression summaries simply reports “p-value: <2.2e-16”.

  20. 20.

    See also the NISTnls package, which provides data and code for statistical test suites developed by researchers at the U.S. National Institute for Standards and Technology.

  21. 21.

    An alternative to stack() is melt() , from the reshape2 package (Wickham 2007). If this is used, then aov() must use value for values and variable for ind.

  22. 22.

    It is not strictly necessary to use as.ctd() to create a "ctd" object, but it makes it easier to create a standardized plot with isopycnals.

  23. 23.

    A test with a 90 Mb file on the author’s machine revealed read_csv() to be nearly 6 times faster than read.csv() .

  24. 24.

    http://www.cgd.ucar.edu/cas/catalog/climind/SOI.signal.ascii.

  25. 25.

    http://www.cgd.ucar.edu/cas/catalog/climind/SOI.signal.ascii.

  26. 26.

    https://www.nodc.noaa.gov/OC5/woa13/woa13data.html.

  27. 27.

    https://www.rstudio.com/products/shiny.

  28. 28.

    http://shiny.rstudio.com/gallery/plot-interaction-exclude.html.

  29. 29.

    RStudio has a variety of other helpful features, e.g. a code-completing editor and a code-analysis tool that can recommend alterations that may make code more robust.

References

  • Albert, J., 2009. Bayesian computation with R. Use R! Springer, New York, NY, USA, second edition.

    Chapter  Google Scholar 

  • Bååth, R., 2012. The state of naming conventions in R. The R Journal, 4(2):74–75.

    Google Scholar 

  • Becker, R. A. and Chambers, J. M., 1984. S: an interactive environment for data analysis and graphics. Wadsworth statistics/probability series. Wadsworth Advanced Book Program, Belmont, CA, USA.

    Google Scholar 

  • Becker, R. A., Chambers, J. M., and Wilks, A. R., 1988. The new S language. Wadsworth & Brooks/Cole, Pacific Grove, CA, USA.

    Google Scholar 

  • Borcard, D., Gillet, F., and Legendre, P., 2011. Numerical Ecology with R. Use R. Springer-Verlag, New York, NY, USA.

    Book  Google Scholar 

  • Boyer, T. P., Antonov, J. I., Baranova, O. K., Garcia, H. E., Johnson, D. R., Locarnini, R. A., Mishonov, A. V., O’Brien, T. D., Seidov, D., Smolyar, V., and Zweng, M. M., 2009. World ocean atlas 2009. Technical report, US Government printing Office.

    Google Scholar 

  • Carr, D. B., 1991. Looking at large data sets using binned data plots. In Buja, A. and Tukey, P. A., editors, Computing and Graphics in Statistics, pages 7–39. Springer-Verlag New York, Inc., New York, NY, USA.

    Chapter  Google Scholar 

  • Chambers, J. M., 2008. Software for data analysis: programming with R. Statistics and computing. Springer-Verlag, New York, NY, USA.

    Book  Google Scholar 

  • Chambers, J. M. and Hastie, T. J., 1992. Statistical models in S. Wadsworth & Brooks/Cole, Pacific Grove, CA, USA.

    Google Scholar 

  • Clarke, A. J. and Van Gorder, S., 2012. On fitting a straight line to data when the “noise” in both variables is unknown. Journal of Atmospheric and Oceanic Technology, 30(1):151–158.

    Article  Google Scholar 

  • Cleveland, W. S. and McGill, R., 1984. Graphical perception: Theory, experimentation, and application to the development of graphical methods. Journal of the American Statistical Association, 79(387):531–554.

    Article  Google Scholar 

  • Dalgaard, P., 2002. Introductory Statistics with R. Statistics and Computing. Springer, New York, NY, USA.

    Google Scholar 

  • De Veaux, R. D., Velleman, P. R., and Bock, D. E., 2006. Intro Stats. Pearson Addison Wesley, Boston, MA, USA, 2nd edition.

    Google Scholar 

  • deYoung, B., Barange, M., Beaugrand, G., Harris, R., Perry, R. I., Scheffer, M., and Werner, F., 2008. Regime shifts in marine ecosystems: detection, prediction and management. Trends in Ecology & Evolution, 23(7):402–409.

    Article  Google Scholar 

  • Estivill-Castro, V., 2002. Why so many clustering algorithms—a position paper. ACM SIGKDD Explorations Newsletter, 4(1):65–75.

    Article  MathSciNet  Google Scholar 

  • Faraway, J. J., 2002. Practical regression and ANOVA using R. The Comprehensive R Archive Network (online).

    Google Scholar 

  • Faraway, J. J., 2005. Linear models with R. Texts in statistical science. Chapman & Hall/CRC, Boca Raton, FL, USA.

    Google Scholar 

  • Gallant, A. R., 1975. Nonlinear regression. The American Statistician, 29(2):pp. 73–81.

    MathSciNet  MATH  Google Scholar 

  • Garratt, J. R., 1977. Review of drag coefficients over oceans and continents. Monthly Weather Review, 105:915–927.

    Article  Google Scholar 

  • Gentleman, R. and Ihaka, R., 2000. Lexical scope and statistical computing. Journal of Computational and Graphical Statistics, 9(3):pp. 491–508.

    MathSciNet  Google Scholar 

  • Grant, H. L., Stewart, R. W., and Moilliet, A., 1962. Turbulence spectra from a tidal channel. Journal of Fluid Mechanics, 12(2):241–268.

    Article  Google Scholar 

  • Grolemund, G. and Wickham, H., 2011. Dates and times made easy with lubridate. Journal of Statistical Software, 40(3):1–25.

    Article  Google Scholar 

  • Hansen, J., Ruedy, R., Sato, M., and Lo, K., 2010. Global surface temperature change. Reviews of Geophysics, 48(4):RG4004.

    Article  Google Scholar 

  • Hartigan, J. A. and Wong, M. A., 1979. A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100–108.

    MATH  Google Scholar 

  • Hauer, E., 2004. The harm done by tests of significance. Accident Analysis and Prevention, 36(495–500).

    Article  Google Scholar 

  • Horton, N. J. and Kleinman, K. P., 2007. Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. American Statistician, 61(1):79–90.

    Article  MathSciNet  Google Scholar 

  • Hothorn, T., Hornik, K., and Zeileis, A., 2006. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3):651–674.

    Article  MathSciNet  Google Scholar 

  • Ihaka, R., 2003. Colour for presentation graphics. In Proceedings of the 3rd international workshop on distributed statistical computing, Technische Universität Wien, Vienna, Austria.

    Google Scholar 

  • Ihaka, R. and Gentleman, R., 1996. R: A language for data analysis and graphics. Journal of Computational & Graphical Statistics, 5(3):pp. 299–314.

    Google Scholar 

  • Johnson, J. B. and Omland, K. S., 2004. Model selection in ecology and evolution. Trends in Ecology & Evolution, 19(2):101–108.

    Article  Google Scholar 

  • Killick, R. and Eckley, I. A., 2014. changepoint: An R package for changepoint analysis. Journal of Statistical Software, 58(3):1–19.

    Article  Google Scholar 

  • Killick, R., Haynes, K., and Eckley, I. A., 2016. changepoint: An R package for changepoint analysis.

    Google Scholar 

  • Lämmel, R., 2008. Google’s MapReduce programming model–revisited. Science of Computer Programming, 70(1):1–30.

    Article  MathSciNet  Google Scholar 

  • Legendre, P., 2014. lmodel2: Model II Regression. Comprehensive R Archive Network.

    Google Scholar 

  • Legendre, P. and Legendre, L., 1998. Numerical Ecology. Developments in environmental modeling 20. Elsevier, Amsterdam, 2nd English edition.

    Google Scholar 

  • Leisch, F., 2002. Sweave: Dynamic generation of statistical reports using literate data analysis. In Härdle, W. and Rönz, B., editors, Compstat 2002 — Proceedings in Computational Statistics, pages 575–580. Physica Verlag, Heidelberg. ISBN 3-7908-1517-9.

    Google Scholar 

  • Lindegren, M., Dakos, V., Gröger, J. P., Gårdmark, A., Kornilovs, G., Otto, S. A., and Möllmann, C., 2012. Early detection of ecosystem regime shifts: A multiple method evaluation for management application. PLoS ONE, 7(7):e38410.

    Article  Google Scholar 

  • Marsden, R. F., 1999. A proposal for a neutral regression. Journal of Atmospheric and Oceanic Technology, 16(7):876–883.

    Article  Google Scholar 

  • McArdle, B. H., 2003. Lines, models, and errors: regression in the field. Limnology and Oceanography, 48(3):1363–1366.

    Article  Google Scholar 

  • Miller, A. J., Cayan, D. R., Barnett, T. P., Graham, N. E., and Oberhuber, J. M., 1994. The 1976–77 climate shift of the Pacific Ocean. Oceanography, 7(1):21–26.

    Article  Google Scholar 

  • Muggeo, V. M. R., 2008. segmented: An R package to fit regression models with broken-line relationships. R News, 8(1):20–25.

    Google Scholar 

  • Murrell, P., 2006. R Graphics. Chapman & Hall/CRC, Boca Raton, FL, USA.

    Google Scholar 

  • R Core Team, 2017. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

    Google Scholar 

  • Ricker, W. E., 1973. Linear regressions in fishery research. Journal of the Fisheries Research Board of Canada, 30:409–434.

    Article  Google Scholar 

  • Ripley, B. D., 1996. Pattern recognition and neural networks. Cambridge University Press, Cambridge, UK.

    Book  Google Scholar 

  • Ripley, B. D. and Hornik, K., 2001. Date-time classes. R News, 1(2):8–11.

    Google Scholar 

  • Rudnick, D. L. and Davis, R. E., 2003. Red noise and regime shifts. Deep Sea Research Part I: Oceanographic Research Papers, 50(6):691–699.

    Article  Google Scholar 

  • Shumway, R. H. and Stoffer, D. S., 2006. Time Series Analysis and its Applications: With R Examples. Springer-Verlag, New York, 2nd edition.

    Google Scholar 

  • Taylor, B. N. and Kuyatt, C. E., 1994. Guidelines for evaluating and expressing the uncertainty of NIST measurement results. NIST Technical Note 1297, U.S. Department of Commerce Technology Administration: National Institute of Standards and Technology, Gaithersburg, MD, USA.

    Google Scholar 

  • Tukey, J. W., 1977. Exploratory Data Analysis. Addison-Wesley, Reading, MA, USA.

    MATH  Google Scholar 

  • Venables, W. N. and Ripley, B. D., 1999. Modern applied statistics with S-plus. Springer-Verlag, New York, NY, USA, third edition.

    Google Scholar 

  • Warton, D. I., Duursma, R. A., Falster, D. S., and Taskinen, S., 2012. smatr 3–an R package for estimation and inference about allometric lines. Methods in Ecology and Evolution, 3:257–259.

    Article  Google Scholar 

  • Warton, D. I., Wright, I. J., Falster, D. S., and Westoby, M., 2006. Bivariate line-fitting methods for allometry. Biological Reviews, 81(2):259–291.

    Article  Google Scholar 

  • Wasserstein, R. L. and Lazar, N. A., 2016. The ASA’s statement on p-values: Context, process, and purpose. The American Statistician, 70(2):129–133.

    Article  MathSciNet  Google Scholar 

  • Wessel, P., Smith, W. H. F., Scharroo, R., Luis, J. F., and Wobbe, F., 2013. Generic mapping tools: improved version released. Transactions, American Geophysical Union, 94:409–410.

    Article  Google Scholar 

  • Wickham, H., 2007. Reshaping data with the reshape package. Journal of Statistical Software, 21(12):1–20.

    Article  Google Scholar 

  • Wickham, H., 2009. ggplot2: elegant graphics for data analysis. Springer, New York, USA.

    Book  Google Scholar 

  • Wickham, H., 2011. The split-apply-combine strategy for data analysis. Journal of Statistical Software, 40(1):1–29.

    Article  MathSciNet  Google Scholar 

  • Wickham, H., 2014. Advanced R. The R Series. Chapman and Hall/CRC.

    Google Scholar 

  • Wood, S. N., 2001. mgcv: GAMs and generalized ridge regression for R. R News, 1(2):20–25.

    MathSciNet  Google Scholar 

  • Zeileis, A., Hornik, K., and Murrell, P., 2009. Escaping RGBland: Selecting colors for statistical graphics. Computational Statistics and Data Analysis, 53(9):3259–3270.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kelley, D.E. (2018). R Tutorial for Oceanographers. In: Oceanographic Analysis with R. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-8844-0_2

Download citation

Publish with us

Policies and ethics