06/30/2014 @ useR! 2014

About me

  • http://yihui.name
  • first language: Chinese
  • second language: R (10 years, authored a few R packages)
  • third language: English
  • graduated from Iowa State U
  • working for RStudio

Abstract

This is an intermediate level tutorial on dynamic reporting with R and the knitr package (http://yihui.name/knitr). It covers the basic idea of literate programming as well as its role in reproducible research.

A variety of document formats will be introduced, including R LaTeX (.Rnw) and R Markdown (.Rmd). We will show useful features of knitr, such as creating tables and plots from data, caching, and cross references.

We will also provide examples of advanced features such as chunk hooks, and calling foreign languages (shell scripts, Python, C++, Julia, …).

Goals

  • learn the basic idea of literate programming and apply it to data analysis using dynamic documents
  • know the basics about knitr
  • understand the meanings of common chunk options
  • learn how to extend knitr using chunk hooks and other languages
  • create projects and build (web) applications based on knitr

Who cares…

Learning new tools

Overview and Introduction

I know you click, click, copy and paste

But imagine you hear these words after you finished a project

Please do that again! (sorry we made a mistake in the data, want to change a parameter, and yada yada)

Basic ideas of dynamic documents

  • code + narratives = report
  • i.e. computing languages + authoring languages

    We built a linear regression model.
    
    ```{r}
    fit <- lm(dist ~ speed, data = cars)
    b   <- coef(fit)
    plot(fit)
    ```
    
    The slope of the regression is `r b[1]`.
  • an example

Automation! Automation! Automation!

Tools

  • WEB (Donald Knuth, Literate Programming)
    • Pascal + LaTeX
  • Noweb (Norman Ramsey)
  • Sweave (Friedrich Leisch and R-core)
    • R + LaTeX
    • extensible, in the sense that you copy 700 lines of code to extend 3 lines
    • cacheSweave, pgfSweave, weaver, …
  • knitr (me and contributors)
  • odfWeave (Max Kuhn, OpenOffice)
  • rapport/pander (Aleksandar Blagotić and Gergely Daróczi, Markdown/Pandoc)
  • slidify (Ramnath Vaidyanathan, knitr/R Markdown)

Tools (cont'd)

  • Org-mode (Emacs)
  • SASweave
  • Python tools
    • IPython
    • Dexy
    • PythonTeX

knitr

  • an R package (install.packages('knitr'))
  • document formats
    • .Rnw (R + LaTeX)
    • .Rmd (R + Markdown)
    • any computing language + any authoring language (in theory)
  • editors
    • RStudio
    • LyX (example later)
  • resources

The knitr book

As an R package

if (!require("knitr")) install.packages("knitr")
library(knitr)
knit("your-document.Rmd")  # compiles a document

Let's try some examples!

Minimal examples

The spin() function

  • give me an R script, and I give you a report
  • internally converted to R Markdown/LaTeX/…

    #' today I built a model
    fit = lm(dist ~ speed, data = cars)
    
    #' and I got the slope `r coef(fit)[2]`
    
    #+ dist-speed, fig.width=5, fig.height=4
    plot(cars)
    abline(fit)
  • demo: knitr::spin('05-knitr-spin.R')

The stitch() function

  • insert an R script in a predefined template
  • LaTeX, HTML or Markdown
  • demo: knitr::stitch('06-stitch-test.R')

Features

Be patient!

Text output

  • echo: TRUE/FALSE, c(i1, i2, …), -i3
  • results: markup, hide, hold, asis
  • collapse: TRUE/FALSE
  • warning, error, message
  • strip.white
  • include
  • demo: 07-test.Rmd

Graphics

  • dev, dev.args, fig.ext
    • PDF, PNG, …
    • tikz
  • fig.width, fig.height
  • out.width, out.height, fig.retina
  • fig.cap
  • fig.path
  • fig.keep
  • fig.show
  • demo: 08-graphics.Rmd

Caching

  • basic idea: use cache if source is not modified
  • implementation: digest
  • demo: 09-cache.Rmd

Chunk references

  • embed chunks with <<label>>
  • reuse whole chunks using the same label
  • use ref.label
  • demo: 10-references.Rmd

Child documents

  • the chunk option child
  • the function knit_child()
  • demo: 11-main.Rmd

Chunk hooks

  • additional tasks attached to chunk options

    knit_hooks$set(opt_name = function(before, options) {
        if (before) 
            do_something() else do_something_else()
    })
  • set the chunk option opt_name to a non-NULL value to activate the chunk hook

Foreign language engines

Foreign language engines

  • the chunk option engine
  • shell scripts
  • Python
  • Julia (experimental)

    names(knitr::knit_engines$get())
    ##  [1] "awk"       "bash"      "coffee"    "gawk"      "haskell"  
    ##  [6] "perl"      "python"    "Rscript"   "ruby"      "sas"      
    ## [11] "scala"     "sed"       "sh"        "zsh"       "highlight"
    ## [16] "Rcpp"      "tikz"      "dot"       "c"         "fortran"  
    ## [21] "asy"       "cat"       "asis"
  • the runr package (experimental)
  • demo: 12-python.Rmd, 14-julia.Rmd

Applications

Websites and blogs

R Package vignettes

Docco style (knitr's Gangam style)

# see list of vignettes
help(package = "knitr", help_type = "html")
# Docco Classic Style

The knitr book

Learn what other cool kids are doing

Miscellaneous

Reproducible research

Repeat: not easy

Repeat: not easy

Repeat: not easy

Repeat: not easy

Repeat: not easy

packrat

  • Reproducible package management for R
  • J.J. Allaire, Packrat - A Dependency Management System for R, Session 5 Wednesday, 16:00, Kaleidoscope

Markdown or LaTeX?

  • LaTeX: precise control, full complexity, horrible readability
  • Markdown: simple, simple, simple
\section{Introduction}

We did a \emph{cool} study,
and our main findings:


\begin{enumerate}
\item You can never remember
  how to escape backslashes.
\item A dollar sign is \$,
  an ampersand \&, and
  a \textbackslash{}.
\item How about ~? Use $\sim$.
\end{enumerate}
# Introduction

We did a _cool_ study, and
our main findings:

1. You do not need to remember
  a lot of rules.
2. A dollar sign is $,
  an ampersand is &, and
  a backslash \.
3. A tilde is ~.

Write content instead of
markup languages.

Or a graphical comparison

LaTeX

Markdown

RPubs

  • http://rpubs.com
  • forget about reproducible research, because you are doing it unconsiously

Is Markdown too simple?

probably, but the real question is

How much do you want?

do you really want that word to be in \textbf{\textsf{}}?

To print or not to print, that is the question

  • LaTeX is for printing
  • HTML is not as powerful in terms of typesetting (not bad either!), but is excellent for interaction
  • examples

R Markdown v2

  • the power of Pandoc and its Markdown extensions
  • Jeff Allen, The Next Generation of R Markdown, Session 6 Thursday, 10:00, Kaleidoscope

The goal

You have done the hard work of research, data collection, and analysis, etc. We hope the last step can be easier.

Open source, open science, open mind