New Ways to Think about Documents
This chapter explores some ideas about how we can think of documents in several different ways from the traditional write–format–publish cycle. Instead, we think about documents in some of the same ways we do about software. We want to be able to programmatically verify that a document is “correct,” i.e., unit testing for documents. We want modular components that we can reuse across documents and contexts. We want to be able to express relationships between elements of the document to map to our actual workflow, e.g., tasks and alternative approaches and not just sections, subsections and code chunks. We go beyond dynamic documents and report generation to try to capture the actual research process, concepts, and workflow, and not just their output. This involves semantic markup in the document. While we talk about XML, many of the ideas apply to any mark-up format which we can query and modify programmatically with convenient tools. However, some aspects exploit the richer hierarchical structure of XML, as opposed to the linear format of most document mark-up systems. The goal of this chapter is not to try to convert people to using the system we use to author documents, although it is available and quite powerful. Instead, we want to encourage people to demand more of the tools they use to author documents and to think about authoring documents in new, richer ways.
KeywordsRegular Expression XPath Query XPath Expression Reproducible Research Actual Research Process
Unable to display preview. Download preview PDF.
- Neil Bradley. The XSL Companion. Addison-Wesley, London, 2000.Google Scholar
- Friedrich Leisch. Dynamic generation of statistical reports using literate data analysis. In W. Härdle and B. Roenz, editors, Compstat 2002, Proceedings in Computational Statistics, pages 575–580. Physika Verlag, Heidelberg, 2002.Google Scholar
- Sal Mangano. XSLT Cookbook: Solutions and Examples for XML and XSLT Developers. O’Reilly Media, Inc., Sebastopol, CA, 2006.Google Scholar
- Deborah Nolan, Roger Peng, and Duncan Temple Lang. Enhanced dynamic documents for reproducible research. In M.F. Ochs, J.T. Casagrande, and R.V. Davuluri, editors, Biomedical Informatics for Cancer Research, pages 335–346. Springer-Verlag, New York, 2009.Google Scholar
- Deborah Nolan and Duncan Temple Lang. Dynamic, interactive documents for teaching statistical practice. International Statistical Review, 75:295–321, 2007.Google Scholar
- Deborah Nolan and Duncan Temple Lang. Learning from the statistician’s lab notebook. In Data and Context in Statistics Education: Towards an Evidence-based Society. Proceedings of the Eighth International Conference on Teaching Statistics (ICOTS8, July, 2010), Ljubljana, Slovenia. Voorburg, 2010.Google Scholar
- Deborah Nolan and Duncan Temple Lang. XDynDocs: Dynamic documents with XML and XSL. http://www.omegahat.org/XDynDocs, 2011. R package version 0.3-1.
- Dave Pawson. XSL-FO: Making XML Look Good in Print. O’Reilly Media, Inc., Sebastopol, CA, 2002.Google Scholar
- Karthik Ram and Duncan Temple Lang. rProv: Provenance tracking in R. https://github.com/karthikram/RProvenance, 2012. R package version 0.2-0.
- Bob Stayton. DocBook XSL: The Complete Guide. Sagehill Enterprises, Santa Cruz, CA, fourth edition, 2007.Google Scholar
- Duncan Temple Lang. RCSS: Facilities for reading and working with CSS files in R. http://www.omegahat.org/RCSS, 2011. R package version 0.2-0.
- Duncan Temple Lang. Sxslt: R extension for liblibxslt. http://www.omegahat.org/Sxslt, 2011. R package version 0.91-1.
- Duncan Temple Lang. XDocTools: Tools for working with XML and XSL documents. http://www.omegahat.org/XDocTools, 2011. R package version 0.1-0.
- Duncan Temple Lang, Roger Peng, and Deborah Nolan. CodeDepends: Analysis of R code for reproducible research and code comprehension. http://www.omegahat.org/CodeDepends, 2011. R package version 0.2-1.
- Jeni Tennison. Beginning XSLT 2.0: From Novice to Professional. Apress, Berkeley, CA, 2005.Google Scholar
- Hadley Wickham, Peter Danenberg, and Manuel Eugster. roxygen2: A Doxygen-like insourcedocumentation system for Rd, collation, and NAMESPACE. http://cran.rproject.org/web/packages/roxygen2/, 2011. R package version 2.2.2.
- Yihui Xie. knitr: A general-purpose package for dynamic report generation in R. http://cran.r-project.org/web/packages/knitr/index.html, 2013. R package version 1.0.