You'll need R > 2.15
, as well as Python > 2.7
. If you have trouble calling R
from Rpy2
, you might need to compile R
yourself using --enable-R-shlib
. If you have no idea what this means read this.
easy_install rpy2
or pip install rpy2
pip
yourselfRpy2
is a Python interface to the R statistical programming language. Many of Python's core strengths (readable code, flexible I/O, design philosophy) complement R's core strengths (strong statistical heritage, powerful graphics).
import rpy2.robjects as robjects
r = robjects.r
x = robjects.IntVector([1, 2, 3])
y = robjects.IntVector([4, 5, 6])
z = x + y
print z
z_mean = r.mean(z)
print z_mean
## [1] 1 2 3 4 5 6
##
## [1] 3.5
We import the parts of R that we need, and then call our R code as Python functions.
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
gd = importr('grDevices')
r = robjects.r
x = r.rnorm(5000)
y = r.sample(x, replace=True, size=5000)
gd.png(file="figure1.png")
r.plot(x, y, ylab="y", xlab="x", pch=19)
r.title("5,000 random samples from a normal distribution")
gd.dev_off()
These points are kind of over-plotted. Let's see what we can do about that.
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
gd = importr('grDevices')
hb = importr('hexbin')
r = robjects.r
x = r.rnorm(5000)
y = r.sample(x, replace=True, size=5000)
bin=hb.hexbin(x, y, xbins=50)
gd.png(file="figure2.png")
r.plot(bin, ylab="y", xlab="x", \
main="5,000 random samples from a normal distribution")
gd.dev_off()
Binning the data by proximity, we can now see the pattern of density at the center of the scatter-plot.
Adapted from Rpy2 documentation
import rpy2.robjects.lib.ggplot2 as ggplot2
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
gd = importr('grDevices')
stats = importr('stats')
x = ro.r.rnorm(5000)
y = ro.r.sample(x, replace=True, size=5000)
rnorm = stats.rnorm
dataf_rnorm = ro.DataFrame({'x': x, 'y': y})
gd.png(file="figure3.png")
gp = ggplot2.ggplot(dataf_rnorm)
pp = gp + \
ggplot2.aes_string(x='x', y='y') + \
ggplot2.geom_point(alpha = 0.3) + \
ggplot2.geom_density2d(ggplot2.aes_string(col = '..level..')) + \
ggplot2.ggtitle('point + density')
pp.plot()
gd.dev_off()
import rpy2.robjects as robjects
import numpy
r = robjects.r
z = r.rnorm(10)
print z
zz = numpy.array(z)
print zz
print zz.mean()
## [1] -0.8705863 -0.4139811 0.1400550 0.9812354 -0.5370096 0.6348409
## [7] 0.4793611 0.1047836 -0.4170225 -0.1014255
##
## [-0.87058631 -0.41398111 0.14005498 0.9812354 -0.53700964 0.63484085
## 0.47936106 0.10478359 -0.41702254 -0.10142546]
## 2.50828216816e-05
We can process our data in R, and then move the data back to a Python data structure for further computations.
You can write the parts of your program in a language appropriate for the task. Highlights of R include:
stats
library