31 October 2017

1. Matrices, Data.Frames, Arrays and Tables

1.1 Matrix

Syntax

matrix(data=.., nrow=.., ncol=.., byrow=..)

Example

my_matrix <- matrix(data=1:20, nrow=5, ncol=4, byrow=T)
my_matrix
##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    4
## [2,]    5    6    7    8
## [3,]    9   10   11   12
## [4,]   13   14   15   16
## [5,]   17   18   19   20

1.1 Matrix (Select)

Selecting rows, column or specific values for a matrix:

my_matrix[1,]      # Select the first row.
## [1] 1 2 3 4
my_matrix[,3]      # Select the third column.
## [1]  3  7 11 15 19
my_matrix[1,3]     # Select the value that is one the first row and third column.
## [1] 3
my_matrix[4:3,2:3] # Any combination will do. Even changing the order works!
##      [,1] [,2]
## [1,]   14   15
## [2,]   10   11

1.2 Data.frames

Syntax

data.frame(Col1=.., Col2=.., Col3=.., Coln=..)

Example

my_frame <- data.frame(months=factor(x=month.abb,levels=month.abb,ordered=T), 
                       Temperature=rnorm(n=12, mean=15, sd=5), 
                       Precipitation=rnorm(n=12, mean=300, sd=100))
my_frame
##    months Temperature Precipitation
## 1     Jan   20.960114    248.648934
## 2     Feb   18.573364    170.906698
## 3     Mar   13.466823    148.408431
## 4     Apr   22.303799    176.502949
## 5     May   14.264720    281.423981
## 6     Jun   10.712264    292.135588
## 7     Jul    8.155556      5.243982
## 8     Aug   14.257900    380.249401
## 9     Sep   21.628892    293.033401
## 10    Oct   11.228825    393.139266
## 11    Nov   22.831847    465.816874
## 12    Dec   15.577507    333.001194

1.2 Data.frames (Select)

Data.frames can be selected with [] or $.

my_frame[,2]         # Selecting the second column using []
##  [1] 20.960114 18.573364 13.466823 22.303799 14.264720 10.712264  8.155556
##  [8] 14.257900 21.628892 11.228825 22.831847 15.577507
names(my_frame)
## [1] "months"        "Temperature"   "Precipitation"
my_frame$Temperature # Selecting the second column using $
##  [1] 20.960114 18.573364 13.466823 22.303799 14.264720 10.712264  8.155556
##  [8] 14.257900 21.628892 11.228825 22.831847 15.577507

1.3 Arrays

Syntax

array(data=...,dim=..., dimnames=...)

Example

my_array <- array(data=rnorm(3*5*2), dim=c(3,5,2))
my_array
## , , 1
## 
##            [,1]      [,2]      [,3]      [,4]      [,5]
## [1,]  0.9628125 0.9174549 0.8079921  1.599454 0.2388015
## [2,] -0.6624656 0.4226749 0.3820714 -1.216029 0.8143577
## [3,]  0.3649159 1.5566189 0.3993461 -1.304832 0.1618946
## 
## , , 2
## 
##              [,1]       [,2]       [,3]       [,4]       [,5]
## [1,] -0.002752986  2.1382743  0.2575696 -0.9788402  0.7800519
## [2,]  0.318456376 -0.6697899 -0.1644174 -0.4206419 -1.2127583
## [3,] -0.176875791 -1.8715617 -0.9363993  1.4093132  0.5750295

1.3 Arrays (Select)

Arrays can be selected with []. Just provide a field for each dimension. For example:

my_array[,,1]
##            [,1]      [,2]      [,3]      [,4]      [,5]
## [1,]  0.9628125 0.9174549 0.8079921  1.599454 0.2388015
## [2,] -0.6624656 0.4226749 0.3820714 -1.216029 0.8143577
## [3,]  0.3649159 1.5566189 0.3993461 -1.304832 0.1618946
my_array[2,,2]
## [1]  0.3184564 -0.6697899 -0.1644174 -0.4206419 -1.2127583

2. Loops and Conditional statements

2.1 Basic loops in R (1)

Syntax

  • for ( conditions ) { expressions }
  • while ( conditions ) { expressions }
  • repeat ( expression and break )

2.1 Basic loops in R (2)

To make things simpler let's just focus on the for loop.

Example

for (n in 1:10) { print(paste("This is the ",n," loop",sep="")) }
## [1] "This is the 1 loop"
## [1] "This is the 2 loop"
## [1] "This is the 3 loop"
## [1] "This is the 4 loop"
## [1] "This is the 5 loop"
## [1] "This is the 6 loop"
## [1] "This is the 7 loop"
## [1] "This is the 8 loop"
## [1] "This is the 9 loop"
## [1] "This is the 10 loop"
  • n is the looping variable.
  • print() Prints results to the console.
  • paste() Combines "characters" and variables

2.1 Basic loops in R (3)

This time instead of using the : expression for looping lets use the function seq()

Example

nall <- 5
for (n in seq(2,10,2)) {
  nall <- nall * n
  print(nall)
  }
## [1] 10
## [1] 40
## [1] 240
## [1] 1920
## [1] 19200

Rememember! Every variable you use in a loop has to have an initial value. Thus we set nall <- 5 at the start of our script. Try to comment the first line using the symbol # and rerun everything from the top. Read the error!

2.1 Basic loops in R (4)

Using loops in a 2D matrix just for 1 dimension.

Example

mydata <- matrix(data=c(1:12),nrow=3,ncol=4) # Create a 3*4 matrix
mydata
##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12
for (i in 1:dim(mydata)[1]) {
  mydata[i,1] <- mydata[i,1]+100
}
mydata
##      [,1] [,2] [,3] [,4]
## [1,]  101    4    7   10
## [2,]  102    5    8   11
## [3,]  103    6    9   12

2.1 Basic loops in R (5)

Using loops in a 2D matrix for 2 dimensions.

Example

mydata <- matrix(data=c(1:8),nrow=2,ncol=4) # Create a 2*4 matrix
mydata                                      # Print the initial matrix
##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    7
## [2,]    2    4    6    8
for (i in 1:dim(mydata)[1]) {               # Start the 1st loop i
  for (j in 2:3) {                          # Start the 2nd loop j
    mydata[i,j] <- mydata[i,j]+100          # Do something within the loops
  }                                         # Stop the 1st loop i
}                                           # Stop the 2nd loop j
mydata                                      # Display the altered matrix
##      [,1] [,2] [,3] [,4]
## [1,]    1  103  105    7
## [2,]    2  104  106    8

2.2 Filters (1)

Filtering data in R can be done in one line. In most cases filters are applied for the whole length of an object (for example for the whole length a vector). Although sometimes that's not the case and an if and for combinations is necessary. The typical statements for filtering and conditions are the following:

  • == Checks if A and B are equal.
  • != Checks if A and B are NOT equal.
  • > Checks if A is greater than B.
  • < Checks if A is lower than B.
  • >= Checks if A is greater than OR equal to B.
  • <= Checks if A is lower than OR equal to B.

Multiple conditions can be tested in one if using the statements & and |.

  • & stands for AND. All conditions should be TRUE.
  • | stands for OR. Just one condition should be TRUE.

2.2 Filters (2)

Example

v <- seq(0,100,10)
v
##  [1]   0  10  20  30  40  50  60  70  80  90 100
v[v==60]
## [1] 60
which(v==60)
## [1] 7
v[v==60]=600 ; v
##  [1]   0  10  20  30  40  50 600  70  80  90 100

2.2 Filters (3)

Example

mydata <- data.frame(months=factor(x=month.abb,levels=month.abb,ordered=T), 
                     Temperature=rnorm(n=12, mean=15, sd=5), 
                     Precipitation=rnorm(n=12, mean=300, sd=100))
mydata$months[mydata$months=="Jun"]
## [1] Jun
## 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
mydata[mydata$months=="Jun",]
##   months Temperature Precipitation
## 6    Jun    19.49332      177.6891
mydata[mydata$Temperature>=15 & mydata$Precipitation>=300,]
##    months Temperature Precipitation
## 10    Oct     27.2562      345.1603

2.3 Conditional statements in R (1)

Syntax

  • if ( ..conditions.. ) { ..expressions.. }

Example

When the condition is FALSE nothing happens.

if (1 == 2) {5+5}

While when it is TRUE the expression is executed.

if (1 == 1) {5+5}
## [1] 10

Try executing in R the following

"1==1" and "1==2"

2.3 Conditional statements in R (2)

Example

Conditions can be variables of a character, numeric or even logical (TRUE or FALSE).

firstname <- "Robb"
surname <- "Stark"
if (firstname == "Jon") {surname <- "Snow"}
surname
## [1] "Stark"
firstname <- "Jon"
surname <- "Stark"
if (firstname == "Jon") {surname <- "Snow"}
surname
## [1] "Snow"

2.3 Conditional statements in R (3)

Example

Multiple conditions can be tested in one if using the statements & and |.

  • & stands for AND. All conditions should be TRUE.
  • | stands for OR. Just one condition should be TRUE.
illegitimacy <- "Bastard"
origin <- "North"
surname <- "?"
if (illegitimacy == "Bastard" & origin == "North") {surname <- "Snow"}
surname
## [1] "Snow"

2.4 Loops + Conditions (1)

Example

mydata <- rnorm(10, mean=0, sd=1)
mydata
##  [1] -1.49062801  0.29118044  0.07845462 -0.51827934  0.21852300
##  [6] -1.10461050 -0.14104141 -0.75907411 -0.95328993 -2.81757128
for (n in 1:length(mydata)) {
  if (mydata[n] <= 0) {mydata[n] <- 0}
  if (mydata[n] > 0) {mydata[n] <- 1}
}
mydata
##  [1] 0 1 1 0 1 0 0 0 0 0

2.4 Loops + Conditions (2)

Example

loop_number <- 10
loop_check <- TRUE
if (loop_check) {
  for (n in 1:loop_number) {
    print(paste("We are on the loop ",n," of 10.",sep=""))
    if (n==10) {loop_number <- 12} # This will NOT affect the loop_number in for()
  }                                # since the loop started with a different initial
}                                  # value!
## [1] "We are on the loop 1 of 10."
## [1] "We are on the loop 2 of 10."
## [1] "We are on the loop 3 of 10."
## [1] "We are on the loop 4 of 10."
## [1] "We are on the loop 5 of 10."
## [1] "We are on the loop 6 of 10."
## [1] "We are on the loop 7 of 10."
## [1] "We are on the loop 8 of 10."
## [1] "We are on the loop 9 of 10."
## [1] "We are on the loop 10 of 10."

3. Handling the Non-Avaliable (NA) Values.

3.1 Non-Avaliable in functions (1)

Use the na.rm=T argument in functions.

Example

v <- c(1:5,NA,7) # Creating a vector that includes a NA.
v
## [1]  1  2  3  4  5 NA  7
mean(v)          # Mmm R doesn't like (or overlikes) NAs.
## [1] NA
mean(v,na.rm=T)  # Thus we have to force R to ignore them!
## [1] 3.666667

3.2 Non-Avaliable in datasets

Use the is.na() and !is.na() functions.

Example

v <- c(1:5,NA,7)               # Creating a vector that includes a NA.
v[!is.na(v)]                   # Get all values that are NOT (!) NA using a filter.
## [1] 1 2 3 4 5 7
mean( v[!is.na(v)] )           # Manually removing NAs
## [1] 3.666667
v[is.na(v)] <- mean(v,na.rm=T) # Replacing all NAs with the mean value of v
v
## [1] 1.000000 2.000000 3.000000 4.000000 5.000000 3.666667 7.000000

4. Save plots

4.1 Save in png()

Syntax

png(filename=.., width=.., height=.., res=..)

plot(...)

dev.off()

Example

dir.create("MSc_Course", showWarnings=FALSE)
dpi <- 300
png(filename="MSc_Course/test.png", width=5*dpi, height=5*dpi, res=dpi)
plot(1:10,pch=21,bg="yellow")
dev.off()

Check the folder MSc_folder under your Workspace folder. if you don't remember the path to your Workspace folder use the getwd() function.

4.2 Save in pdf()

Syntax

pdf(file=.., width=.., height=.., ...)

plot(...)

dev.off()

Example

dir.create("MSc_Course", showWarnings=FALSE)
pdf(file="MSc_Course/test.pdf", width=5*dpi, height=5*dpi)
plot(1:10,pch=21,bg="yellow")
dev.off()

Check the folder MSc_folder under your Workspace folder.

5. Read and write ASCII files

5.1 Write ASCII files with write.table()

Syntax

write.table(x=.., file=.., sep=.., quote=.., row.names=.., col.names=.., ...)

Example

head(my_frame)
##   months Temperature Precipitation
## 1    Jan    20.96011      248.6489
## 2    Feb    18.57336      170.9067
## 3    Mar    13.46682      148.4084
## 4    Apr    22.30380      176.5029
## 5    May    14.26472      281.4240
## 6    Jun    10.71226      292.1356
write.table(my_frame, file="MSc_Course/myframe.txt", quote=F, row.names=F, sep=";")

Check the folder MSc_folder under your Workspace folder.

5.2 Read ASCII files with read.table()

Syntax

read.table(file=.., head=.., sep=.., fill=.., ...)

Example

read_myframe <- read.table(file="MSc_Course/myframe.txt", head=T, sep=";")
head(read_myframe)
##   months Temperature Precipitation
## 1    Jan    20.96011      248.6489
## 2    Feb    18.57336      170.9067
## 3    Mar    13.46682      148.4084
## 4    Apr    22.30380      176.5029
## 5    May    14.26472      281.4240
## 6    Jun    10.71226      292.1356

6. Linear Regression Model

6.1 Linear Regression with lm() (1)

Create a simple linear regression model of the form y=ax+b with lm().

Syntax

lm(formula= y~x)

lm(a=.., b=..)

Example

my_x <- 1:length(my_frame$Temperature)
my_y <- my_frame$Temperature
my_lm <- lm(my_y~my_x)
my_lm
## 
## Call:
## lm(formula = my_y ~ my_x)
## 
## Coefficients:
## (Intercept)         my_x  
##     17.1294      -0.1486

6.1 Linear Regression with lm() (2)

You can select the a and b coefficients with $:

Example

names(my_lm)
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "xlevels"       "call"          "terms"         "model"
my_lm$coefficients
## (Intercept)        my_x 
##  17.1293769  -0.1486014
my_lm$coefficients[2]
##       my_x 
## -0.1486014

7. Meteorological Analysis

7.1 Meteorological Analysis

8. Airquality Analysis

8.1 Air quality dataset

8.2 Questions

  1. Explore the dataset airquality. Understand the structure.
  2. Make sure everything is in SI units.
  3. Do we have NAs?
  4. Check the relationship between ozone and meteorology.

8.2.1 Explore - Understand (1)

First step! Save our dataset with a new name.

mydata <- airquality

Important functions to understand the structure of your dataset: head(), tail(), dim() and names().

head(mydata)
##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6

8.2.1 Explore - Understand (2)

tail(mydata)
##     Ozone Solar.R Wind Temp Month Day
## 148    14      20 16.6   63     9  25
## 149    30     193  6.9   70     9  26
## 150    NA     145 13.2   77     9  27
## 151    14     191 14.3   75     9  28
## 152    18     131  8.0   76     9  29
## 153    20     223 11.5   68     9  30
dim(mydata)
## [1] 153   6
names(mydata)
## [1] "Ozone"   "Solar.R" "Wind"    "Temp"    "Month"   "Day"
mydata$Year <- 1973

8.2.2 Convert Units (1)

help(airquality)

[,1] Ozone numeric Ozone (ppb)

[,2] Solar.R numeric Solar R (lang) from 0800 to 1200 hours

[,3] Wind numeric Wind (mph)

[,4] Temp numeric Temperature (degrees F)

[,5] Month numeric Month (1--12)

[,6] Day numeric Day of month (1--31)

8.2.2 Convert Units (2)

Let's start converting the Lang/4hours to Watt/m^2.

  • 1 Lang/hour = 11.6300 Watt/m2 -> Get more radiation convertions here.
mydata$SR_SI <- mydata$Solar.R*11.63/4

Then the wind velocity from miles/hour to m/sec.

  • 1 miles/hour = 1609.34/3600 m/sec = 0.44704 m/sec
mydata$Wind_SI <- mydata$Wind*0.44704

And finally the temperature from Fahrenheit to Celsious. \[T(°C)=(T(°F) - 32) × 5/9\]

mydata$Temp_SI <- (mydata$Temp-32)*(5/9)

8.2.2 Convert Units (3)

names(mydata)
##  [1] "Ozone"   "Solar.R" "Wind"    "Temp"    "Month"   "Day"     "Year"   
##  [8] "SR_SI"   "Wind_SI" "Temp_SI"
mydata <- mydata[,c("Year","Month","Day","Ozone","SR_SI","Wind_SI","Temp_SI")]
summary(mydata)
##       Year          Month            Day           Ozone       
##  Min.   :1973   Min.   :5.000   Min.   : 1.0   Min.   :  1.00  
##  1st Qu.:1973   1st Qu.:6.000   1st Qu.: 8.0   1st Qu.: 18.00  
##  Median :1973   Median :7.000   Median :16.0   Median : 31.50  
##  Mean   :1973   Mean   :6.993   Mean   :15.8   Mean   : 42.13  
##  3rd Qu.:1973   3rd Qu.:8.000   3rd Qu.:23.0   3rd Qu.: 63.25  
##  Max.   :1973   Max.   :9.000   Max.   :31.0   Max.   :168.00  
##                                                NA's   :37      
##      SR_SI           Wind_SI         Temp_SI     
##  Min.   : 20.35   Min.   :0.760   Min.   :13.33  
##  1st Qu.:336.54   1st Qu.:3.308   1st Qu.:22.22  
##  Median :596.04   Median :4.336   Median :26.11  
##  Mean   :540.60   Mean   :4.451   Mean   :25.49  
##  3rd Qu.:752.32   3rd Qu.:5.141   3rd Qu.:29.44  
##  Max.   :971.11   Max.   :9.254   Max.   :36.11  
##  NA's   :7

8.2.3 Not Avaliable Values AKA NAs (1)

mydata[which(is.na(mydata$Ozone)),]
##     Year Month Day Ozone    SR_SI  Wind_SI  Temp_SI
## 5   1973     5   5    NA       NA 6.392672 13.33333
## 10  1973     5  10    NA 564.0550 3.844544 20.55556
## 25  1973     5  25    NA 191.8950 7.420864 13.88889
## 26  1973     5  26    NA 773.3950 6.660896 14.44444
## 27  1973     5  27    NA       NA 3.576320 13.88889
## 32  1973     6   1    NA 831.5450 3.844544 25.55556
## 33  1973     6   2    NA 834.4525 4.336288 23.33333
## 34  1973     6   3    NA 703.6150 7.197344 19.44444
## 35  1973     6   4    NA 540.7950 4.112768 28.88889
## 36  1973     6   5    NA 639.6500 3.844544 29.44444
## 37  1973     6   6    NA 767.5800 6.392672 26.11111
## 39  1973     6   8    NA 793.7475 3.084576 30.55556
## 42  1973     6  11    NA 753.0425 4.872736 33.88889
## 43  1973     6  12    NA 726.8750 4.112768 33.33333
## 45  1973     6  14    NA 965.2900 6.169152 26.66667
## 46  1973     6  15    NA 936.2150 5.140960 26.11111
## 52  1973     6  21    NA 436.1250 2.816352 25.00000
## 53  1973     6  22    NA 171.5425 0.759968 24.44444
## 54  1973     6  23    NA 264.5825 2.056384 24.44444
## 55  1973     6  24    NA 726.8750 2.816352 24.44444
## 56  1973     6  25    NA 392.5125 3.576320 23.88889
## 57  1973     6  26    NA 369.2525 3.576320 25.55556
## 58  1973     6  27    NA 136.6525 4.604512 22.77778
## 59  1973     6  28    NA 284.9350 5.140960 26.66667
## 60  1973     6  29    NA  90.1325 6.660896 25.00000
## 61  1973     6  30    NA 401.2350 3.576320 28.33333
## 65  1973     7   4    NA 293.6575 4.872736 28.88889
## 72  1973     7  11    NA 404.1425 3.844544 27.77778
## 75  1973     7  14    NA 846.0825 6.660896 32.77778
## 83  1973     7  22    NA 750.1350 4.336288 27.22222
## 84  1973     7  23    NA 857.7125 5.140960 27.77778
## 102 1973     8  10    NA 645.4650 3.844544 33.33333
## 103 1973     8  11    NA 398.3275 5.140960 30.00000
## 107 1973     8  15    NA 186.0800 5.140960 26.11111
## 115 1973     8  23    NA 741.4125 5.632704 23.88889
## 119 1973     8  27    NA 444.8475 2.548128 31.11111
## 150 1973     9  27    NA 421.5875 5.900928 25.00000

8.2.3 Not Avaliable Values AKA NAs (2)

mydata[which(is.na(mydata$SR_SI)),]
##    Year Month Day Ozone SR_SI  Wind_SI  Temp_SI
## 5  1973     5   5    NA    NA 6.392672 13.33333
## 6  1973     5   6    28    NA 6.660896 18.88889
## 11 1973     5  11     7    NA 3.084576 23.33333
## 27 1973     5  27    NA    NA 3.576320 13.88889
## 96 1973     8   4    78    NA 3.084576 30.00000
## 97 1973     8   5    35    NA 3.308096 29.44444
## 98 1973     8   6    66    NA 2.056384 30.55556

8.2.4 Ozone-Meteorology (1)

plot(mydata[4:7])

pairs(mydata[4:7], pch=21, bg=c(NA,NA,NA,NA,"white","yellow","orange","red","white")[airquality$Month])

8.2.4 Ozone-Meteorology (2)

pairs(mydata[4:7], pch=21, 
      bg=c(NA,NA,NA,NA,"white","yellow","orange","red","white")[airquality$Month])