Matt Shirley
October 24 2013
R
at your command-line shell:R version 3.0.2 -- "Frisbee Sailing"
Platform: x86_64-apple-darwin13.0.0 (64-bit)
...
Type 'q()' to quit R.
>
>
) is your prompt for entering commandsRStudio is an integrated development environment including:
# This is a comment, which is ignored
# functions are applied with ()
print("hello")
[1] "hello"
+-/?%&=<>|!^*
)Addition
2 + 2
[1] 4
Subtraction
5 - 2
[1] 3
Division
2 * 2
[1] 4
Multiplication
5 / 2
[1] 2.5
Exponents
2^4
[1] 16
Logorithms
log10(100)
[1] 2
log2(4)
[1] 2
Order of operations
10 / 2 - 1
[1] 4
10 - 5 / 5
[1] 9
(10 - 5) / 5
[1] 1
Be careful. Evaluation of operators occurs left to right.
x <- 1
x
[1] 1
Variables can be assigned (<-
) a value
x <- 1
y <- 2
x <- y
x
[1] 2
y
[1] 2
But be careful because they can be re-assigned
x <- 0
x > 1 ## x is greater than 1
[1] FALSE
x < 1 ## x is greater than 1
[1] TRUE
x == 1
[1] FALSE
x == 0
[1] TRUE
x != 0
[1] FALSE
Comparisons result in boolean values
x <- 3
y <- c(1,2,x)
y
[1] 1 2 3
Vectors can hold elements of the same type.
names(y) <- c("one", "two", "three")
y
one two three
1 2 3
Vectors can also have names for each element.
z <- y * 3
z
one two three
3 6 9
sum(z)
[1] 18
Arithmetic can be performed on a vector, which applies that operation to every element and returns a new vector.
one two three
3 6 9
z[1]
one
3
z["one"]
one
3
Vectors can be indexed using a 1-based position, as well as name.
z
one two three
3 6 9
z[2:3]
two three
6 9
Slicing a vector is as easy as specifying start:end
.
z[-1]
two three
6 9
z[-2:-3]
one
3
Remove elements from a vector using negative indices.
q <- list(y, z)
q
[[1]]
one two three
1 2 3
[[2]]
one two three
3 6 9
Lists can contain vectors.
q[[1]]
one two three
1 2 3
q[[1]][1]
one
1
You can index a list in the same way as a vector.
v <- seq(1,9) ## or 1:9
v
[1] 1 2 3 4 5 6 7 8 9
Let's construct a sequence of 9 numbers.
c(v,v)
[1] 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
rep(v, times=3)
[1] 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
We can concatonate or repeat a vector as well.
mt <- matrix(v, nrow=3)
mt
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
matrix(v, nrow=3, byrow=T)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
Matrices, created from vectors, are row or column oriented.
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
mt[1,1]
[1] 1
mt[3,3]
[1] 9
Matrices are indexed as [row,col]
dim(mt)
[1] 3 3
nrow(mt)
[1] 3
ncol(mt)
[1] 3
Dimensionality, number of rows and columns can computed using these functions.
df <- data.frame(y, z)
colnames(df) <- c("first","second")
df
first second
one 1 3
two 2 6
three 3 9
Dataframes are like matrices, but contain more structure.
first second
one 1 3
two 2 6
three 3 9
df$first
[1] 1 2 3
Dataframes can be indexed by name to return a vector.
first second
one 1 3
two 2 6
three 3 9
df["first"]
first
one 1
two 2
three 3
Dataframes can be indexed by name to return another dataframe
first second
one 1 3
two 2 6
three 3 9
df$first[1]
[1] 1
Dataframes can be further indexed to return individual elements
first second
one 1 3
two 2 6
three 3 9
df > 3
first second
one FALSE FALSE
two FALSE TRUE
three FALSE TRUE
Dataframes, just like other structures, can be compared, resulting a boolean values.
first second
one FALSE FALSE
two FALSE TRUE
three FALSE TRUE
df[df > 3]
[1] 6 9
Passing the boolean result of comparison as an index returns only elements where the comparison was TRUE
.
first second
one FALSE FALSE
two FALSE TRUE
three FALSE TRUE
which(df > 3)
[1] 5 6
The which
function converts a boolean index to a numeric index.
first second
one 1 3
two 2 6
three 3 9
cbind(df, data.frame("third"=c(9,18,27)))
first second third
one 1 3 9
two 2 6 18
three 3 9 27
Dataframe columns can be bound to form a new dataframe.
first second
one 1 3
two 2 6
three 3 9
rbind(df, data.frame("first"=4, "second"=12, row.names="four"))
first second
one 1 3
two 2 6
three 3 9
four 4 12
Dataframe rows can be bound to form a new dataframe.
library(datasets)
dim(cars)
[1] 50 2
head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
mean(cars$speed)
[1] 15.4
median(cars$speed)
[1] 15
sd(cars$speed)
[1] 5.288
Mean, median and standard deviation.
summary(cars)
speed dist
Min. : 4.0 Min. : 2
1st Qu.:12.0 1st Qu.: 26
Median :15.0 Median : 36
Mean :15.4 Mean : 43
3rd Qu.:19.0 3rd Qu.: 56
Max. :25.0 Max. :120
Summarizing a dataframe returns percentiles and mean.
for (x in 1:10){
print(x)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
Use for loops to repeat a task a certain number of times.
x <- 0
if (x == 0) { print("yes") }
[1] "yes"
if (x > 1) { print("yes") } else { print("no") }
[1] "no"
TRUE
. x <- 0
while (x < 5){
print(x)
x <- x + 1
}
[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
Use while loops to repeat a task while a condition (x<5
) is true.
first second
one 1 3
two 2 6
three 3 9
apply(df, 1, sum)
one two three
4 8 12
apply(df, 2, sum)
first second
6 18
Apply a function over array columns (1) or rows (2).
first second
one 1 3
two 2 6
three 3 9
sapply(df, sqrt)
first second
[1,] 1.000 1.732
[2,] 1.414 2.449
[3,] 1.732 3.000
Simple apply a function to every element, returning the same type of data structure.
write.table(df, file = "example.txt")
write.table(df, file = "example.tsv", sep = "\t")
write.csv(df, file = "example.csv")
Write 1) space-delimited, 2) tab-delimited, 3) comma-delimited files containing dataframe df
.
df1 = read.table("example.txt", header=T)
df2 = read.delim("example.tsv", sep = "\t")
df3 = read.csv("example.csv", row.names = 1)
identical(df1,df2)
[1] TRUE
identical(df2,df3)
[1] TRUE
All three files result in equivalent dataframes.
Issues to consider when reading and writing delimited files:
Check the default behavior of the reading/writing function first.
head(cars)
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
plot(cars)
plot
accepts a dataframe with two columnsplot(cars, type="l")
lmcars <- lm(dist ~ speed, cars)
lmcars
Call:
lm(formula = dist ~ speed, data = cars)
Coefficients:
(Intercept) speed
-17.58 3.93
lm
fits a linear model: response ~ terms
plot(cars)
abline(lmcars)
abline
draws a line from slope and interceptplot(cars, title="Speed vs. Distance", xlab="Speed", ylab="Distance", ylim=c(0,100))
abline(lmcars)
plot(cars, col="red", pch=16, cex=2)
abline(lmcars, col="blue")
hist(cars$speed)