Matt Shirley

October 24 2013

- interacting with R
- using R as a calculator
- variables
- data structures
- summarizing data
- loops, flow-control
- apply
- basic stats in R
- reading and writing delimited data
- plotting with base R graphics
- loading and installing packages
- plotting with ggplot2

- command-line interpreter
- GUI interpreter: RStudio

- everyone has one
- just type
`R`

at your command-line shell:

```
R version 3.0.2 -- "Frisbee Sailing"
Platform: x86_64-apple-darwin13.0.0 (64-bit)
...
Type 'q()' to quit R.
>
```

- The carat (
`>`

) is your prompt for entering commands - I will omit the carat for the rest of the presentation

- Download from http://www.rstudio.com/

RStudio is an integrated development environment including:

- interpreter with code completion
- text editor with syntax highlighting and completion
- file browser
- version control manager
- visual object workspace
- command history

```
# This is a comment, which is ignored
```

```
# functions are applied with ()
print("hello")
```

```
[1] "hello"
```

- anything in quotes is a “string”
- anything else is either a number or:
- function
- class
- operator (
`+-/?%&=<>|!^*`

)

Addition

```
2 + 2
```

```
[1] 4
```

Subtraction

```
5 - 2
```

```
[1] 3
```

Division

```
2 * 2
```

```
[1] 4
```

Multiplication

```
5 / 2
```

```
[1] 2.5
```

Exponents

```
2^4
```

```
[1] 16
```

Logorithms

```
log10(100)
```

```
[1] 2
```

```
log2(4)
```

```
[1] 2
```

Order of operations

```
10 / 2 - 1
```

```
[1] 4
```

```
10 - 5 / 5
```

```
[1] 9
```

```
(10 - 5) / 5
```

```
[1] 1
```

Be careful. Evaluation of operators occurs left to right.

```
x <- 1
x
```

```
[1] 1
```

Variables can be assigned (`<-`

) a value

```
x <- 1
y <- 2
x <- y
x
```

```
[1] 2
```

```
y
```

```
[1] 2
```

But be **careful** because they can be re-assigned

```
x <- 0
```

```
x > 1 ## x is greater than 1
```

```
[1] FALSE
```

```
x < 1 ## x is greater than 1
```

```
[1] TRUE
```

```
x == 1
```

```
[1] FALSE
```

```
x == 0
```

```
[1] TRUE
```

```
x != 0
```

```
[1] FALSE
```

Comparisons result in *boolean* values

```
x <- 3
y <- c(1,2,x)
y
```

```
[1] 1 2 3
```

Vectors can hold elements of the *same type*.

```
names(y) <- c("one", "two", "three")
y
```

```
one two three
1 2 3
```

Vectors can also have *names* for each element.

```
z <- y * 3
z
```

```
one two three
3 6 9
```

```
sum(z)
```

```
[1] 18
```

Arithmetic can be performed on a vector, which applies that operation to every element and returns a *new vector*.

```
one two three
3 6 9
```

```
z[1]
```

```
one
3
```

```
z["one"]
```

```
one
3
```

Vectors can be indexed using a *1-based* position, as well as *name*.

```
z
```

```
one two three
3 6 9
```

```
z[2:3]
```

```
two three
6 9
```

*Slicing* a vector is as easy as specifying `start:end`

.

```
z[-1]
```

```
two three
6 9
```

```
z[-2:-3]
```

```
one
3
```

Remove elements from a vector using negative indices.

```
q <- list(y, z)
q
```

```
[[1]]
one two three
1 2 3
[[2]]
one two three
3 6 9
```

Lists can contain vectors.

```
q[[1]]
```

```
one two three
1 2 3
```

```
q[[1]][1]
```

```
one
1
```

You can index a list in the same way as a vector.

```
v <- seq(1,9) ## or 1:9
v
```

```
[1] 1 2 3 4 5 6 7 8 9
```

Let's construct a sequence of 9 numbers.

```
c(v,v)
```

```
[1] 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
```

```
rep(v, times=3)
```

```
[1] 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
```

We can *concatonate* or *repeat* a vector as well.

```
mt <- matrix(v, nrow=3)
mt
```

```
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
```

```
matrix(v, nrow=3, byrow=T)
```

```
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
```

Matrices, created from vectors, are row or column oriented.

```
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
```

```
mt[1,1]
```

```
[1] 1
```

```
mt[3,3]
```

```
[1] 9
```

Matrices are indexed as `[row,col]`

```
dim(mt)
```

```
[1] 3 3
```

```
nrow(mt)
```

```
[1] 3
```

```
ncol(mt)
```

```
[1] 3
```

Dimensionality, number of rows and columns can computed using these functions.

```
df <- data.frame(y, z)
colnames(df) <- c("first","second")
df
```

```
first second
one 1 3
two 2 6
three 3 9
```

Dataframes are like matrices, but contain more structure.

```
first second
one 1 3
two 2 6
three 3 9
```

```
df$first
```

```
[1] 1 2 3
```

Dataframes can be indexed by name to return a vector.

```
first second
one 1 3
two 2 6
three 3 9
```

```
df["first"]
```

```
first
one 1
two 2
three 3
```

Dataframes can be indexed by name to return another dataframe

```
first second
one 1 3
two 2 6
three 3 9
```

```
df$first[1]
```

```
[1] 1
```

Dataframes can be further indexed to return individual elements

```
first second
one 1 3
two 2 6
three 3 9
```

```
df > 3
```

```
first second
one FALSE FALSE
two FALSE TRUE
three FALSE TRUE
```

Dataframes, just like other structures, can be compared, resulting a *boolean* values.

```
first second
one FALSE FALSE
two FALSE TRUE
three FALSE TRUE
```

```
df[df > 3]
```

```
[1] 6 9
```

Passing the boolean result of comparison as an index returns only elements where the comparison was `TRUE`

.

```
first second
one FALSE FALSE
two FALSE TRUE
three FALSE TRUE
```

```
which(df > 3)
```

```
[1] 5 6
```

The `which`

function converts a boolean index to a numeric index.

```
first second
one 1 3
two 2 6
three 3 9
```

```
cbind(df, data.frame("third"=c(9,18,27)))
```

```
first second third
one 1 3 9
two 2 6 18
three 3 9 27
```

Dataframe columns can be bound to form a new dataframe.

```
first second
one 1 3
two 2 6
three 3 9
```

```
rbind(df, data.frame("first"=4, "second"=12, row.names="four"))
```

```
first second
one 1 3
two 2 6
three 3 9
four 4 12
```

Dataframe rows can be bound to form a new dataframe.

```
library(datasets)
dim(cars)
```

```
[1] 50 2
```

```
head(cars)
```

```
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
```

```
mean(cars$speed)
```

```
[1] 15.4
```

```
median(cars$speed)
```

```
[1] 15
```

```
sd(cars$speed)
```

```
[1] 5.288
```

Mean, median and standard deviation.

```
summary(cars)
```

```
speed dist
Min. : 4.0 Min. : 2
1st Qu.:12.0 1st Qu.: 26
Median :15.0 Median : 36
Mean :15.4 Mean : 43
3rd Qu.:19.0 3rd Qu.: 56
Max. :25.0 Max. :120
```

Summarizing a dataframe returns percentiles and mean.

```
for (x in 1:10){
print(x)
}
```

```
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
```

Use *for loops* to repeat a task a certain number of times.

```
x <- 0
if (x == 0) { print("yes") }
```

```
[1] "yes"
```

```
if (x > 1) { print("yes") } else { print("no") }
```

```
[1] "no"
```

- If statements only execute code if the condition evaluates to
`TRUE`

. - Else statements execute when the condition is not satisfied.

```
x <- 0
while (x < 5){
print(x)
x <- x + 1
}
```

```
[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
```

Use *while loops* to repeat a task *while* a condition (`x<5`

) is true.

```
first second
one 1 3
two 2 6
three 3 9
```

```
apply(df, 1, sum)
```

```
one two three
4 8 12
```

```
apply(df, 2, sum)
```

```
first second
6 18
```

*Apply* a function over array columns (1) or rows (2).

```
first second
one 1 3
two 2 6
three 3 9
```

```
sapply(df, sqrt)
```

```
first second
[1,] 1.000 1.732
[2,] 1.414 2.449
[3,] 1.732 3.000
```

*Simple apply* a function to every element, returning the same type of data structure.

```
write.table(df, file = "example.txt")
write.table(df, file = "example.tsv", sep = "\t")
write.csv(df, file = "example.csv")
```

Write 1) space-delimited, 2) tab-delimited, 3) comma-delimited files containing dataframe `df`

.

```
df1 = read.table("example.txt", header=T)
df2 = read.delim("example.tsv", sep = "\t")
df3 = read.csv("example.csv", row.names = 1)
```

```
identical(df1,df2)
```

```
[1] TRUE
```

```
identical(df2,df3)
```

```
[1] TRUE
```

All three files result in equivalent dataframes.

Issues to consider when reading and writing delimited files:

- Do I want/have column names (header)?
- Do I want/have row names?
- What is my delimiter?
- Do I want/have quotes surrounding each value?

Check the **default behavior** of the reading/writing function first.

```
head(cars)
```

```
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
```

```
plot(cars)
```

`plot`

accepts a dataframe with two columns- column 1 = x axis
- column 2 = y axis

```
plot(cars, type="l")
```

- valid plot types:
- “p” for points
- “l” for lines
- “b” for both (“o” for overplotted)
- “h” for â€˜histogramâ€™-like lines
- “s” for stair steps (“S” for other)
- “n” for no plotting.

```
lmcars <- lm(dist ~ speed, cars)
lmcars
```

```
Call:
lm(formula = dist ~ speed, data = cars)
Coefficients:
(Intercept) speed
-17.58 3.93
```

`lm`

fits a linear model:`response ~ terms`

- in this case the response is distance traveled at speed

```
plot(cars)
abline(lmcars)
```

`abline`

draws a line from slope and intercept

```
plot(cars, title="Speed vs. Distance", xlab="Speed", ylab="Distance", ylim=c(0,100))
abline(lmcars)
```

```
plot(cars, col="red", pch=16, cex=2)
abline(lmcars, col="blue")
```

```
hist(cars$speed)
```