Interfacing R and Python using Rpy2

Matt Shirley (matt.shirley@jhmi.edu)

02 May 2013

Installing Rpy2

You'll need R > 2.15, as well as Python > 2.7. If you have trouble calling R from Rpy2, you might need to compile R yourself using --enable-R-shlib. If you have no idea what this means read this.

What is Rpy2?

Rpy2 is a Python interface to the R statistical programming language. Many of Python's core strengths (readable code, flexible I/O, design philosophy) complement R's core strengths (strong statistical heritage, powerful graphics).

A (very) simple example

import rpy2.robjects as robjects
r = robjects.r

x = robjects.IntVector([1, 2, 3])
y = robjects.IntVector([4, 5, 6])
z = x + y
print z
z_mean = r.mean(z)
print z_mean
## [1] 1 2 3 4 5 6
## 
## [1] 3.5

We import the parts of R that we need, and then call our R code as Python functions.

A statistics example

import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
gd = importr('grDevices')

r = robjects.r
x = r.rnorm(5000)
y = r.sample(x, replace=True, size=5000)
gd.png(file="figure1.png")
r.plot(x, y, ylab="y", xlab="x", pch=19)
r.title("5,000 random samples from a normal distribution")
gd.dev_off()

These points are kind of over-plotted. Let's see what we can do about that.

Importing another R package (hexbin)

import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
gd = importr('grDevices')
hb = importr('hexbin')

r = robjects.r
x = r.rnorm(5000)
y = r.sample(x, replace=True, size=5000)
bin=hb.hexbin(x, y, xbins=50)
gd.png(file="figure2.png")
r.plot(bin, ylab="y", xlab="x", \
       main="5,000 random samples from a normal distribution")
gd.dev_off()

Binning the data by proximity, we can now see the pattern of density at the center of the scatter-plot.

Plotting using ggplot2

Adapted from Rpy2 documentation

import rpy2.robjects.lib.ggplot2 as ggplot2
import rpy2.robjects as ro
from rpy2.robjects.packages import importr
gd = importr('grDevices')
stats = importr('stats')
x = ro.r.rnorm(5000)
y = ro.r.sample(x, replace=True, size=5000)
rnorm = stats.rnorm
dataf_rnorm = ro.DataFrame({'x': x, 'y': y})
gd.png(file="figure3.png")
gp = ggplot2.ggplot(dataf_rnorm)
pp = gp + \
     ggplot2.aes_string(x='x', y='y') + \
     ggplot2.geom_point(alpha = 0.3) + \
     ggplot2.geom_density2d(ggplot2.aes_string(col = '..level..')) + \
     ggplot2.ggtitle('point + density')
pp.plot()
gd.dev_off()

Converting to a python object

import rpy2.robjects as robjects
import numpy
r = robjects.r

z = r.rnorm(10)
print z
zz = numpy.array(z)
print zz
print zz.mean()
##  [1] -0.8705863 -0.4139811  0.1400550  0.9812354 -0.5370096  0.6348409
##  [7]  0.4793611  0.1047836 -0.4170225 -0.1014255
## 
## [-0.87058631 -0.41398111  0.14005498  0.9812354  -0.53700964  0.63484085
##   0.47936106  0.10478359 -0.41702254 -0.10142546]
## 2.50828216816e-05

We can process our data in R, and then move the data back to a Python data structure for further computations.

Summary

You can write the parts of your program in a language appropriate for the task. Highlights of R include:

Questions?

If you have any questions, feel free to email me at matt.shirley@jhmi.edu

Source