Tutorial 1: Getting Started

DPI R Bootcamp

Jared Knowles

Overview

R

What Does it Look Like?

The R workspace in RStudio

A Bit of HistoRy

The Philosophy

John Chambers, in describing the logic behind the S language said:

[W]e wanted users to be able to begin in an interactive environment, where they did not consciously think of themselves as programming. Then as their needs became clearer and their sophistication increased, they should be able to slide gradually into programming, when the language and system aspects would become more important.

R is Born

Why Use R

Thoughts on Free

  1. R can be run and used for any purpose, commercial or non-commercial, profit or not-for-profit
  2. R's source code is freely available so you can study how it works and adapt it to your needs.
  3. R is free to redistribute so you can share it with your enemies friends
  4. R is free to modify and those modifications are free to redistribute and may be adopted by the rest of the community!

R Advantages Continued

R Can Compliment Other Tools

R's Popularity

R has recently passed Stata on Google Scholar hits and it is catching up to the two major players SPSS and SAS

R Has an Active Web Presence

R is linked to from more and more sites

R Extensions

These links come from the explosion of add-on packages to R

R Has an Active Community

Usage of the R listserv for help has really exploded recently

Data from Bob Muenchen available online

R's Drawbacks

R Vocabulary

Components of an R Setup

Advanced R Setup

Open Source Toolchain

Some Notes about Maintaining R

Self-help

Self-help (2)

foo <- c(1, "b", 5, 7, 0)
bar <- c(1, 2, 3, 4, 5)
foo + bar
Error: non-numeric argument to binary operator

Let's Look at RStudio

The Data Frame

data(mtcars)
mtcars
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2

A Data Frame

R As A Calculator

2 + 2  # add numbers
[1] 4
2 * pi  #multiply by a constant
[1] 6.283
7 + runif(1, min = 0, max = 1)  #add a random variable
[1] 7.375
4^4  # powers
[1] 256
sqrt(4^4)  # functions
[1] 16

Arithmetic Operators

2 + 2
[1] 4
2/2
[1] 1
2 * 2
[1] 4
2^2
[1] 4
2 == 2
[1] TRUE
23%/%2
[1] 11
23%%2
[1] 1

Other Key Symbols

foo <- 3
foo
[1] 3
1:10
 [1]  1  2  3  4  5  6  7  8  9 10
# it increments by one
a <- 100:120
a
 [1] 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116
[18] 117 118 119 120

Comments in R

# Something I want to keep from R
# Like my secret from the R engine
# Maybe intended for a human and not the computer
# Like: Look at this cool plot!

myplot(readSS,mathSS,data=df)

R Advanced Math

Using the Workspace

Using the Workspace (2)

x <- 5  #store a variable with <-
x  #print the variable
[1] 5
z <- 3
ls()  #list all variables
[1] "x" "z"
ls.str()  #list and describe variables
x :  num 5
z :  num 3
rm(x)  # delete a variable
ls()
[1] "z"

R as a Language

  1. Case sensitivity matters
a <- 3
A <- 4
print(c(a, A))
[1] 3 4
  1. What happens if I type print(a,A)?

c is our friend

A <- c(3, 4)
print(A)
[1] 3 4

Language

a <- runif(100)  # Generate 100 random numbers
b <- runif(100)  # 100 more
c <- NULL  # Setup for loop (declare variables)
for (i in 1:100) {
    # Loop just like in Java or C
    c[i] <- a[i] * b[i]
}
d <- a * b
identical(c, d)  # Test equality
[1] TRUE

More Language Bugs Features

Objects

summary(df[, 28:31])  #summary look at df object
   schoollow         readSS        mathSS           proflvl    
 Min.   :0.000   Min.   :252   Min.   :210   advanced   : 788  
 1st Qu.:0.000   1st Qu.:430   1st Qu.:418   basic      : 523  
 Median :0.000   Median :495   Median :480   below basic: 210  
 Mean   :0.242   Mean   :496   Mean   :483   proficient :1179  
 3rd Qu.:0.000   3rd Qu.:562   3rd Qu.:543                     
 Max.   :1.000   Max.   :833   Max.   :828                     
summary(df$readSS)  #summary of a single column
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    252     430     495     496     562     833 

-The $ says to look for object readSS in object df

Graphics too

library(ggplot2) # Load graphics Package
library(eeptools)
qplot(readSS,mathSS,data=df,geom='point',alpha=I(0.3))+theme_dpi()+
  opts(title='Test Score Relationship')+
  geom_smooth()