Rcpp - Using C++ and R code Together

This tutorial was presented as part of PH 1930 Statistical Computing. The examples for this session are available on Github at the link.

Overview
Configure the environment
Rcpp and RcppArmadillo
Getting started
First example
Parallelization
MCMC for GLMM

While it is possible to write C or Fortran code for use in R as covered previously in this course, Rcpp provides a clean, approachable API that lets we write high-performance code much easily and joyfully.

##Overview

In general, R is not a fast or most memory efficient language – but it is very easy to use, to share, and makes very pretty output. C++ is very fast, but not so easy to use, and debugging could be painful. Rcpp supports implementing R functions in C++ for high performance computing and to easily interface R with external libraries.

##Configure the environment for using C++ code with R

I only test the tool on Windows and Linux. No matter which platform you prefer, RStudio is highly recommended to do the work. For a Windows user, you will need to make sure you have the latest release of R (3.2.0+) and will also need to install the Rtools library before you can use any packages with C++ code in them. Usually RStudio will help to check whether the Rtools are installed and install it for you when you first use Rcpp. If you are Linux user, everything probably just works. You may also want to visit this blog post which has more information on making C++ work with R under Windows/Mac if you end up encountering any weirdness.

##Rcpp and RcppArmadillo

In this session we’ll learn how to improve our R code performance by rewriting key functions in C++. Typical bottlenecks that C++ can address include:

Loops that can’t be easily vectorized because subsequent iterations depend on previous ones.
Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than that in R.
Problems that require advanced data structures and algorithms that R doesn’t provide. Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues.

To use Rcpp, a working knowledge of C++ is helpful, but not essential. Many good tutorials and references are freely available, including http://www.learncpp.com/ and http://www.cplusplus.com/. You may also enjoy Dirk Eddelbuettel’s Seamless R and C++ integration with Rcpp, which goes into much detail into all aspects of Rcpp.

RcppArmadillo is an add-on package that gives you access to tons of useful linear algebra functionality in C++. In particular it makes passing in and working with arrays, matrices and vectors pretty easy. You can check out the Armadillo docs here: http://arma.sourceforge.net/docs.html. This is where I go to look up functions to see how to use them or if they exist.

Although the base Rcpp package provides its own data structures (e.g., array, matrix, list) that can be passed easily between R and C++, the Armadillo data structures provided by the RcppArmadillo package are really nice and easier to use. Additionally, the Armadillo data structures are native C++ data structures, have multi-thread support and less unpredictable bad errors are encountered in practice. So I suggest using Armadillo data structures in most cases.

##Getting started

Before we start the first example, we install the two R package using


install.packages("Rcpp")
install.packages("RcppArmadillo")

in Rstudio command line.

We start by creating a new file in RStudio by clicking on the icon in the top left corner of the screen and select C++ File from the drop down menu.

A sample timesTwo* C++ program will pop up. We will now need to save that file somewhere. Once we have done so, we can click on the Source button and you will see that RStudio automatically calls Rcpp::sourceCpp() on the file.

This will compile the function and make it so we can access the C++ code from R - much easier than Calling C from R we taught before. The example can help you to check everything needed are correctly installed on your computer.

Some other things to note:

This statement include Rcpp header file in the program. The function is similar as library( ) in R. It also tells us that we can call any Rcpp constructs by their given name without the Rcpp:: prefix.


#include <Rcpp.h>
using namespace Rcpp;

We also put this statement before each function we want to make available to R:


//[[Rcpp::export]]

We can also define multiple C++ functions in the same file (not necessarily recommended unless some of them will be used by the main function), so we can put one in front of each one we want to make visible.

We can include R code blocks in C++ files processed with sourceCpp . The R code will be automatically run after the compilation.


/*** R
timesTwo(42)
*/

##First example We introduce our first example of calculating inner product of two vectors, and use data structure of Armadillo. The C++ code is:


#include <RcppArmadillo.h>
// [[Rcpp::depends( RcppArmadillo)]]
using namespace arma;

// [[Rcpp::export]]
arma::mat inner(arma::vec x, arma::vec y){
  arma::mat ip=x.t()*y;
  return(ip);
}

/*** R
vecA = rnorm(100000)
*/

Note that we no longer use the:

#include <Rcpp.h>

statement as in the example code provided by RStudio, but replace it with:


#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]

Also note that for all of the objects, we have to specify their type as we define the argument. This is a feature of C++ that is different from R where we just create objects without having to specify their type. Some basic C++ Armadillo objects we can pass in:

For decimal numbers like 1.2347 we need to use the double declaration, followed by the name of the argument (e.g.. my_double)
For integers (whole numbers) like 26 we use the int declaration, followed by the argument.
For numeric vectors, we use the arma::vec declaration, followed by the argument. This code should crash if you try to pass in anything other than a numeric vector (can contain doubles or integers). When you refer to the i th elements of a vector such as x , we use x(i) instead of x[i] as in R.
For numeric matrices, we use the arma::mat declaration, followed by the argument. Again, make sure it is just numbers in there.

We then write following R code and run it.


> library(Rcpp)
> library(RcppArmadillo)

> sourceCpp('test.cpp')
> vecA = rnorm(100000)

> inner(vecA,vecA)
         [,1]
[1,] 100317.2

##Parallelization in Rcpp via OpenMP

Now we make a small extension of the example and simply introduce using OpenMP to parallelize our C++ function. In order for the compiler to recognize OpenMP, we need to include the OpenMP header flag. Additionally, Rcpp has a plugin that will automatically add the OpenMP compiler flags. Add the following to the top of your C++ program.


#include <omp.h>
// [[Rcpp::plugins(openmp)]]

We also have to set some compiler flags in R code. Add the following to your R code.


Sys.setenv("PKG_CXXFLAGS" = "-fopenmp")
Sys.setenv("PKG_LIBS" = "-fopenmp")

The example we are going to use is element-wised sum square of a vector. It is a special case of the inner product of two same vectors. In serial way, we can realize this in Rcpp like so:


// [[Rcpp::export]]
double sumsq_serial(vec x)
{
  double sum = 0.0;
  for (int i=0; i<x.size(); i++){
    sum += sq(x(i));
  }  
  return sum;
}

The function sq() is simply squaring a number as


double sq(double x){
  return(x*x);
}

I do not put // [[Rcpp::export]] before this function because I will never call it from R. Calling such a function will slow down the performance of the code, comparing with sum += x(i)*x(i); . I include a function here to ascertain that each thread can correctly recognizes the user defined function when doing parallel computing later.

And in parallel using OpenMP, we just add the basic compiler pragma:


// [[Rcpp::export]]
double sumsq_parallel(vec x, int ncores)
{
  double sum = 0.0;
  omp_set_num_threads(ncores);
  #pragma omp parallel for shared(x) reduction(+:sum)
  for (int i=0; i<x.size(); i++){
	// cout << i << ", "omp_get_thread_num() <<  " of " <<  omp_get_num_threads() << endl;
    sum += sq(x(i));
  }
  return sum;
}

Basically, we identify the loop in which we want to parallelize and add the following lines above it


omp_set_num_threads(int)
#pragma omp parallel for

The first line selects the number of cores to use. This should not exceed the number of cores in your system. If we do not write this line, all the cores available on your computer will be used. The only real trick here in this example is making use of the shared() and reduction() keyword, and it pretty much does what it looks like. We’re telling the compiler that while each iteration of the loop should be run in parallel, they share the same information from vector x , and in the end we want the private (by thread) copies of the variable sum to be added up. The code cout << i << ", "omp_get_thread_num() << " of " << omp_get_num_threads() << endl; is C++ standard way to do print() . This code just tells which thread is currently working on the i th iteration, and the total number of thread are working together.

A small experiment is used to compare different methods proposed. A small set of micro-benchmarks in a variety of methods are conducted.


library(rbenchmark)
bench = benchmark(
		  vecA%*%vecA,
          inner(vecA,vecA),
          sumsq_parallel(vecA,1),
          sumsq_parallel(vecA,2),
          sumsq_parallel(vecA,3),
          sumsq_parallel(vecA,4), order = NULL, replications=1000)

print(bench[,1:4])

# test replications elapsed relative
# 1           vecA %*% vecA         1000   0.721    4.682
# 2       inner(vecA, vecA)         1000   0.229    1.487
# 3 sumsq_parallel(vecA, 1)         1000   0.319    2.071
# 4 sumsq_parallel(vecA, 2)         1000   0.203    1.318
# 5 sumsq_parallel(vecA, 3)         1000   0.154    1.000
# 6 sumsq_parallel(vecA, 4)         1000   0.157    1.019

We can see that C++ functions are much faster than R function %*% . As more cores were used in parallel computing, the execution time becomes shorter. Because I only had 4 cores on my desktop, the maximal number of core I tested was 4. It seems requiring all the resources on your computer to do the work may slow down the overall performance.

##MCMC for GLMM

This tutorial session has only touched on a small part of Rcpp, giving you the basic tools to rewrite poorly performing R code in C++. With the basic knowledge of Rcpp, it will be much easier for you to understand the example of MCMC for generalized linear mixed model using Adaptive rejection Metropolis sampling (ARMS) I provide here . Three methods are compared. The first method is pure R code, using the arms function in R Hi package. The second method is writing log density functions in C++, but sampling process is still coded in R with modified arms function. The third method is coding both log density functions and sampling process in Rcpp/C++. The pure R example takes 36.14 mins. The hybrid method takes 9.80 mins and the C++ code used only 1.17 mins.

The log density function coded in R:


logden.beta < - function(x, myu) {
  myu.long < - as.vector(mapply(rep, myu, nobs))
  temp < - as.vector(design.matrix %*% as.matrix(x, ncol=1)) + myu.long
  return(sum(y*temp - log(1+exp(temp))))
}

logden.u < - function(x, mybeta, myy, mycovariate, mytau) {
  temp <- mycovariate %*% as.matrix(mybeta, ncol=1) + x
  return(sum(myy * temp - log(1+exp(temp))) - 0.5*x^2*mytau)
}

The log density function coded in C++:


// [[Rcpp::export]]
double logden_beta(vec x, vec myu_long, mat design_matrix, vec y){
  vec temp = design_matrix*x + myu_long;
  vec one; one.ones(size(temp));
  vec logden = y%temp - log(one + exp(temp));
  return(sum(logden));
}

// [[Rcpp::export]]
double logden_u(double x, vec mybeta, vec myy, mat mycovariate, double mytau){
  vec x_long(size(myy)); x_long.fill(x);
  vec temp = mycovariate*mybeta + x_long;
  vec one; one.ones(size(temp));
  vec logden = myy%temp - log(one + exp(temp));
  return(sum(logden)-0.5*pow(x,2)*mytau);
}

##Common Pitfalls

The number one pitfall I have made while working with C++ and R is forgetting that while R starts vector and matrix indexes from 1, C++ (along with pretty much every other programming language), starts indexes from zero, so you need to plan accordingly.
In C++, we need to define the variable before use it while R does not require that.
Don’t forget to write the semicolon after each line of the code in C++.

Resources

</span>

For additional information and advanced use, following resources provide a helpful introduction:

Kan Li / 2016-11-07
Published under (CC) BY-NC-SA in categories tutorial tagged with Rcpp RcppArmadillo OpenMP