From FarmShare

(Difference between revisions)
Jump to: navigation, search
(Installing CRAN Packages)
(24 intermediate revisions not shown)
Line 1: Line 1:
== Which R are you using? ==
Try run
  which R
Try run
  R --version
As of 2014-07, we have two versions of R installed.  If you do nothing, you'll get the default R that comes with Ubuntu 14.04, which is R v3.0.2, and includes a lot of R libraries which are distributed by Ubuntu.
As of 2015-01 we have a second newer version available, v3.1.2, available via 'module load r'.  This also includes rstudio.
To use the newer version:
  log in with X11 forwarding or via [[FarmVNC]]
  module load r
== Looking at installed packages ==
== Looking at installed packages ==
You can see the list of installed R libraries by the library() call
You can see the list of installed R libraries by the library() call in R
<source lang="r">
<source lang="r">
For example, currently on FarmShare these libraries are installed
We have a lot of packages already installed, you can ask us to install more, or just install them quickly in your homedir.
<source lang="sh">
$ R
R version 2.15.2 (2012-10-26) -- "Trick or Treat"
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
> library()
Packages in library '/usr/lib/R/site-library':
AMORE                  A MORE flexible neural network package
Biobase                Biobase: Base functions for Bioconductor
DBI                    R Database Interface
GenABEL                genome-wide SNP association analysis
HilbertVis              Hilbert curve visualization
Hmisc                  Harrell Miscellaneous
MCMCpack                Markov chain Monte Carlo (MCMC) Package
MNP                    R Package for Fitting the Multinomial Probit
MatchIt                MatchIt
RColorBrewer            ColorBrewer palettes
RGtk2                  R bindings for Gtk 2.8.0 and above
RMySQL                  R interface to the MySQL database
RODBC                  ODBC Database Access
RQuantLib              R interface to the QuantLib library
RSQLite                SQLite interface for R
Rcmdr                  R Commander
Rcpp                    Seamless R and C++ Integration
Rglpk                  R/GNU Linear Programming Kit Interface
Rmpi                    Interface (Wrapper) to MPI (Message-Passing
Rserve                  Binary R server
TeachingDemos          Demonstrations for teaching and learning
VGAM                    Vector Generalized Linear and Additive Models
XML                    Tools for parsing and generating XML within R
                        and S-Plus.
Zelig                  Everyones Statistical Software
abind                  Combine multi-dimensional arrays
bayesm                  Bayesian Inference for
bio3d                  Biological Structure Analysis
bitops                  Functions for Bitwise operations
caTools                Tools: moving window statistics, GIF, Base64,
                        ROC AUC, etc.
cairoDevice            Cairo-based cross-platform antialiased graphics
                        device driver.
car                    Companion to Applied Regression
chron                  Chronological objects which can handle dates
                        and times
coda                    Output analysis and diagnostics for MCMC
colorspace              Color Space Manipulation
combinat                combinatorics utilities
cummeRbund              Analysis, exploration, manipulation, and
                        visualization of Cufflinks high-throughput
                        sequencing data.
date                    Functions for handling dates
digest                  Create cryptographic hash digests of R objects
eco                    R Package for Ecological Inference in 2x2
edgeR                  Empirical analysis of digital gene expression
                        data in R
effects                Effect Displays for Linear, Generalized Linear,
                        Multinomial-Logit, Proportional-Odds Logit
                        Models and Mixed-Effects Models
fAssets                Rmetrics - Assets Selection and Modelling
fBasics                Rmetrics - Markets and Basic Statistics
fCopulae                Rmetrics - Dependence Structures with Copulas
fExtremes              Rmetrics - Extreme Financial Market Data
fGarch                  Rmetrics - Autoregressive Conditional
                        Heteroskedastic Modelling
fMultivar              Multivariate Market Analysis
fOptions                Basics of Option Valuation
fPortfolio              Rmetrics - Portfolio Selection and Optimization
                        - ebook available at www.rmetrics.org
fTrading                Technical Trading Analysis
g.data                  Delayed-Data Packages
gdata                  Various R programming tools for data
genetics                Population Genetics
ggplot2                An implementation of the Grammar of Graphics
gmodels                Various R programming tools for model fitting
gplots                  Various R programming tools for plotting data
gregmisc                Gregs Miscellaneous Functions
gtools                  Various R programming tools
haplo.stats            Statistical Analysis of Haplotypes with Traits
                        and Covariates when Linkage Phase is Ambiguous
happy                  Quantitative Trait Locus genetic analysis in
                        Heterogeneous Stocks
hdf5                    HDF5
its                    Irregular Time Series
latticeExtra            Extra Graphical Utilities Based on Lattice
limma                  Linear Models for Microarray Data
lme4                    Linear mixed-effects models using S4 classes
lmtest                  Testing Linear Regression Models
mapdata                Extra Map Databases
mapproj                Map Projections
maps                    Draw Geographical Maps
misc3d                  Miscellaneous 3D Plots
mnormt                  The multivariate normal and t distributions
msm                    Multi-state Markov and hidden Markov models in
                        continuous time
multcomp                Simultaneous Inference in General Parametric
multicore              Parallel processing of R code on machines with
                        multiple cores or CPUs
mvtnorm                Multivariate Normal and t Distributions
plyr                    Tools for splitting, applying and combining
proto                  Prototype object-based programming
psy                    Various procedures used in psychometry
pvclust                Hierarchical Clustering with P-Values via
                        Multiscale Bootstrap Resampling
qtl                    Tools for analyzing QTL experiments
quadprog                Functions to solve Quadratic Programming
qvalue                  Q-value estimation for false discovery rate
randomForest            Breiman and Cutlers random forests for
                        classification and regression
relimp                  Relative Contribution of Effects in a
                        Regression Model
reshape                Flexibly reshape data.
reshape2                Flexibly reshape data: a reboot of the reshape
rggobi                  Interface between R and GGobi
rgl                    3D visualization device system (OpenGL)
rkward                  Provides functions related to the RKWard GUI
rkwardtests            RKWard Plugin Test Suite Framework
rms                    Regression Modeling Strategies
robustbase              Basic Robust Statistics
rotRPackage            Statistical functions needed by the OpenTURNS
                        project, see www.openturns.org
rsprng                  R interface to SPRNG (Scalable Parallel Random
                        Number Generators)
sandwich                Robust Covariance Matrix Estimators
slam                    Sparse Lightweight Arrays and Matrices
sm                      Smoothing methods for nonparametric regression
                        and density estimation
sn                      The skew-normal and skew-t distributions
snow                    Simple Network of Workstations
sp                      classes and methods for spatial data
stabledist              Stable Distribution Functions
stringr                Make it easier to work with strings.
strucchange            Testing, Monitoring, and Dating Structural
timeDate                Rmetrics - Chronological and Calendar Objects
timeSeries              Rmetrics - Financial Time Series Objects
tkrplot                TK Rplot
tseries                Time series analysis and computational finance
timeSeries              Rmetrics - Financial Time Series Objects
tkrplot                TK Rplot
tseries                Time series analysis and computational finance
vcd                    Visualizing Categorical Data
zoo                    S3 Infrastructure for Regular and Irregular
                        Time Series (Zs ordered observations)
Packages in library '/usr/lib/R/library':
KernSmooth              Functions for kernel smoothing for Wand & Jones
MASS                    Support Functions and Datasets for Venables and
                        Ripleys MASS
Matrix                  Sparse and Dense Matrix Classes and Methods
base                    The R Base Package
boot                    Bootstrap Functions (originally by Angelo Canty
                        for S)
class                  Functions for Classification
cluster                Cluster Analysis Extended Rousseeuw et al.
codetools              Code Analysis Tools for R
compiler                The R Compiler Package
datasets                The R Datasets Package
foreign                Read Data Stored by Minitab, S, SAS, SPSS,
                        Stata, Systat, dBase, ...
grDevices              The R Graphics Devices and Support for Colours
                        and Fonts
graphics                The R Graphics Package
grid                    The Grid Graphics Package
lattice                Lattice Graphics
methods                Formal Methods and Classes
mgcv                    Mixed GAM Computation Vehicle with GCV/AIC/REML
                        smoothness estimation
nlme                    Linear and Nonlinear Mixed Effects Models
nnet                    Feed-forward Neural Networks and Multinomial
                        Log-Linear Models
parallel                Support for Parallel computation in R
rpart                  Recursive Partitioning
spatial                Functions for Kriging and Point Pattern
splines                Regression Spline Functions and Classes
stats                  The R Stats Package
stats4                  Statistical Functions using S4 Classes
survival                Survival Analysis
tcltk                  Tcl/Tk Interface
tools                  Tools for Package Development
utils                  The R Utils Package
You can also use
<source lang="r">
to check which directories R will look in. https://stat.ethz.ch/R-manual/R-devel/library/base/html/libPaths.html
== Which R are you using? ==
Try run
  which R
Try run
  R --version
== Installing CRAN Packages  ==
== Installing CRAN Packages  ==
Most [http://cran.r-project.org/ CRAN] packages can be installed per-user by running install.packages() in an interactive session:  
Most [http://cran.r-project.org/ CRAN] packages can be installed per-user by running install.packages() in an interactive session:  
<pre>install.packages("package_name", dependencies = TRUE)
<source lang="r">
install.packages("package_name", dependencies = TRUE)
R initially attempts to install to /usr/local/lib/R, but will prompt for the creation of a library subdirectory in ~/R (if necessary) and fall back to installation there when the initial attempt fails. If your package requires dependencies available from the standard Ubuntu [http://packages.ubuntu.com/ repositories] you can submit a [https://remedyweb.stanford.edu/helpsu/helpsu?pcat=farmshare HelpSU ticket] requesting installation.  We can install from the Debian/Ubuntu package repositories or into the shared FarmShare fs.
R initially attempts to install to /usr/local/lib/R, and you don't have permissions to write there, so it will prompt for the creation of a library subdirectory in ~/R (if necessary) and fall back to installation there when the initial attempt fails. If your package requires dependencies available from the standard Ubuntu [http://packages.ubuntu.com/ repositories] you can e-mail us requesting installation.
You can, of course, install R libraries into any arbitrary path and just add that path to your R env.  That will probably break the next time R is upgraded to a new version, since your packages are built with the older version.
You can, of course, install R libraries into any arbitrary path and just add that path to your R env.  That will probably break the next time R is upgraded to a new version, since your packages are built with the older version.
NOTE: when you install a package in corn, it will be available to you in Barley.
If you have trouble with some kind of SSL error, you can explicitly specify an HTTP mirror, e.g.
  install.packages("spatstat", dependencies=TRUE, repos="http://cran.cnr.Berkeley.edu/")
== R Sample Job  ==
== R Sample Job  ==
Here's an example R file that generates a large array, fills it with some random numbers, then sleeps for 5mins. This happens to use up almost exactly 8GB of RAM.  
Here's an example R file that generates a large array, fills it with some random numbers, then sleeps for 5mins. This happens to use up almost exactly 8GB of RAM.  And you know it's going to run for about 5 mins.
Save this as 8GB.R:
Save this as 8GB.R:
Line 260: Line 63:
# mail this address
# mail this address
#$ -M chekh@stanford.edu
#$ -M $USER@stanford.edu
# send mail on begin, end, suspend
# send mail on begin, end, suspend
#$ -m bes
#$ -m bes
# request 8GB of RAM, not hard-enforced on FarmShare
#$ -l mem_free=8G
# request 6 mins of runtime, is hard-enforced on FarmShare
#$ -l h_rt=00:06:00
R --vanilla --no-save < 8GB.R  
R --vanilla --no-save < 8GB.R  
Line 273: Line 82:
Here are the output files that I get, one from stderr, one from stdout
Here are the output files that I get, one from stderr, one from stdout
<pre>$ cat r_test.script.o497
<pre>$ cat r_test.script.o2029205
R version 2.12.1 (2010-12-16)
Copyright (C) 2010 The R Foundation for Statistical Computing
R version 3.0.1 (2013-05-16) -- "Good Sport"
ISBN 3-900051-07-0
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
R is a collaborative project with many contributors.
Line 288: Line 99:
'help.start()' for an HTML browser interface to help.
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
Type 'q()' to quit R.
> x <- array(1:1073741824, dim=c(1024,1024,1024))  
> x <- array(1:1073741824, dim=c(1024,1024,1024))  
> x <- gaussian()
> x <- gaussian()
> Sys.sleep(300)
> Sys.sleep(300)
In the mail that you get about the ending of the job, the maxvmem number is actually incorrect, it is a known bug in this version of SGE. The R script on this page actually uses 8GB of vmem.  
And here's the e-mail I get about the job, you can see the runtime and memory usage:
Job 2029205 (r_test.script) Complete
User            = chekh
Queue            = saucy.q@barley12.Stanford.EDU
Host            = barley12.Stanford.EDU
Start Time      = 07/10/2014 12:54:31
End Time        = 07/10/2014 13:00:08
User Time        = 00:00:29
System Time      = 00:00:06
Wallclock Time  = 00:05:37
CPU              = 00:00:35
Max vmem         = 8.107G
Exit Status      = 0
== Another R Sample Job ==
== Another R Sample Job ==
Line 317: Line 141:
# mail this address
# mail this address
#$ -M chekh@stanford.edu
#$ -M $USER@stanford.edu
# send mail on begin, end, suspend
# send mail on begin, end, suspend
#$ -m bes
#$ -m bes
Line 355: Line 179:
R --vanilla --no-save < R-jags.R
R --vanilla --no-save < R-jags.R
== Jupyter ==
R can also be run in a [https://jupyter.org Jupyter] notebook on FarmShare servers and used via a web browser.
[https://irkernel.github.io IRkernel] is available as part of the prebuilt Jupyter environment accessible via the [[Jupyter| Jupyter installation guide]].
== Links  ==
== Links  ==
Line 383: Line 213:
*cd R-3.1.1
*cd R-3.1.1
*./configure --enable-R-shlib
*./configure --enable-R-shlib
*write /farmshare/software/free
R is now configured for x86_64-unknown-linux-gnu
R is now configured for x86_64-unknown-linux-gnu
Line 406: Line 234:
   Recommended packages:      yes
   Recommended packages:      yes
*write /farmshare/software/mf/saucy/r/3.1.1.lua
*also added rstudio
Added R 3.1.2 as above to Ubuntu 14.04.
As chekh on corn25 (oldest CPU)
  cd /farmshare/software/free/r
  wget http://cran.r-project.org/src/base/R-3/R-3.2.1.tar.gz
  tar zxvf R-3.2.1.tar.gz
  cd R-3.2.1
  ./configure --enable-R-shlib
==lapack issues==
==lapack issues==
Line 424: Line 265:
$ R --no-save < lapack.r  
$ R --no-save < lapack.r  
*proclus - http://web.stanford.edu/group/proclus/cgi-bin/mediawiki/index.php/Software-R
*sherlock - http://sherlock.stanford.edu/mediawiki/index.php/R

Latest revision as of 23:00, 22 June 2018


Which R are you using?

Try run

 which R

Try run

 R --version

As of 2014-07, we have two versions of R installed. If you do nothing, you'll get the default R that comes with Ubuntu 14.04, which is R v3.0.2, and includes a lot of R libraries which are distributed by Ubuntu.

As of 2015-01 we have a second newer version available, v3.1.2, available via 'module load r'. This also includes rstudio.

To use the newer version:

 log in with X11 forwarding or via FarmVNC
 module load r

Looking at installed packages

You can see the list of installed R libraries by the library() call in R


We have a lot of packages already installed, you can ask us to install more, or just install them quickly in your homedir.

You can also use


to check which directories R will look in. https://stat.ethz.ch/R-manual/R-devel/library/base/html/libPaths.html

Installing CRAN Packages

Most CRAN packages can be installed per-user by running install.packages() in an interactive session:

install.packages("package_name", dependencies = TRUE)

R initially attempts to install to /usr/local/lib/R, and you don't have permissions to write there, so it will prompt for the creation of a library subdirectory in ~/R (if necessary) and fall back to installation there when the initial attempt fails. If your package requires dependencies available from the standard Ubuntu repositories you can e-mail us requesting installation.

You can, of course, install R libraries into any arbitrary path and just add that path to your R env. That will probably break the next time R is upgraded to a new version, since your packages are built with the older version.

If you have trouble with some kind of SSL error, you can explicitly specify an HTTP mirror, e.g.

 install.packages("spatstat", dependencies=TRUE, repos="http://cran.cnr.Berkeley.edu/")

R Sample Job

Here's an example R file that generates a large array, fills it with some random numbers, then sleeps for 5mins. This happens to use up almost exactly 8GB of RAM. And you know it's going to run for about 5 mins.

Save this as 8GB.R:

x <- array(1:1073741824, dim=c(1024,1024,1024)) 
x <- gaussian()

Here's an example SGE submit script that runs that R file.


# use the current directory
#$ -cwd
#$ -S /bin/bash

# mail this address
#$ -M $USER@stanford.edu
# send mail on begin, end, suspend
#$ -m bes

# request 8GB of RAM, not hard-enforced on FarmShare
#$ -l mem_free=8G

# request 6 mins of runtime, is hard-enforced on FarmShare
#$ -l h_rt=00:06:00

R --vanilla --no-save < 8GB.R

You can submit it with just

 qsub r_test.script

Here are the output files that I get, one from stderr, one from stdout

$ cat r_test.script.o2029205 

R version 3.0.1 (2013-05-16) -- "Good Sport"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> x <- array(1:1073741824, dim=c(1024,1024,1024)) 
> x <- gaussian()
> Sys.sleep(300)

And here's the e-mail I get about the job, you can see the runtime and memory usage:

Job 2029205 (r_test.script) Complete
 User             = chekh
 Queue            = saucy.q@barley12.Stanford.EDU
 Host             = barley12.Stanford.EDU
 Start Time       = 07/10/2014 12:54:31
 End Time         = 07/10/2014 13:00:08
 User Time        = 00:00:29
 System Time      = 00:00:06
 Wallclock Time   = 00:05:37
 CPU              = 00:00:35
 Max vmem         = 8.107G
 Exit Status      = 0

Another R Sample Job

R script, let's call it R-rjags.R

print("Hello World")
#this just loaded some settings from that library

Job script, let's call it R-jags.submit.script


# use the current directory
#$ -cwd
#$ -S /bin/bash

# mail this address
#$ -M $USER@stanford.edu
# send mail on begin, end, suspend
#$ -m bes

R --vanilla --no-save < R-jags.R

Submit it to the test queue with a small memory requirement:

 qsub -l mem_free=200M -l testq=1 R-jags.submit.script

Looking at the output files, it errored out because R can't find the package rjags. You have two alternatives:

  • include the R library from /mnt/glusterfs/software
  • use modules to specify the full R install from /mnt/glusterfs/software

The first way, you would add this line to your R script:

 .libPaths(c("/mnt/glusterfs/software/free/R-2.15.0/lib/R/library", "/usr/lib/R/library"))

The second way, your script will look like this:

$ cat R-jags.submit.script

# use the current directory
#$ -cwd
#$ -S /bin/bash

# mail this address
#$ -M chekh@stanford.edu
# send mail on begin, end, suspend
#$ -m bes

eval `tclsh /mnt/glusterfs/software/free/modules/tcl/modulecmd.tcl sh autoinit`
module load R-2.15.0
R --vanilla --no-save < R-jags.R


R can also be run in a Jupyter notebook on FarmShare servers and used via a web browser.

IRkernel is available as part of the prebuilt Jupyter environment accessible via the Jupyter installation guide.


Some other departments have some other more detailed examples:

building our local R

Here's how I usually do it.


R 3.1.1 released today, I compiled it as chekh on corn40 (Ubuntu 13.10)

R is now configured for x86_64-unknown-linux-gnu

  Source directory:          .
  Installation directory:    /usr/local

  C compiler:                gcc -std=gnu99  -g -O2
  Fortran 77 compiler:       gfortran  -g -O2

  C++ compiler:              g++  -g -O2
  C++ 11 compiler:           g++  -std=c++11 -g -O2
  Fortran 90/95 compiler:    gfortran -g -O2
  Obj-C compiler:	     gcc -g -O2 -fobjc-exceptions

  Interfaces supported:      X11, tcltk
  External libraries:        readline, ICU, lzma
  Additional capabilities:   PNG, JPEG, TIFF, NLS, cairo
  Options enabled:           shared R library, shared BLAS, R profiling

  Recommended packages:      yes
  • make
  • write /farmshare/software/mf/saucy/r/3.1.1.lua
  • also added rstudio


Added R 3.1.2 as above to Ubuntu 14.04.


As chekh on corn25 (oldest CPU)

 cd /farmshare/software/free/r
 wget http://cran.r-project.org/src/base/R-3/R-3.2.1.tar.gz
 tar zxvf R-3.2.1.tar.gz
 cd R-3.2.1
 ./configure --enable-R-shlib

lapack issues

If you see messages like:

  unable to load shared object '/usr/lib/R/modules//lapack.so':

most likely you're mixing R versions and libraries.

Double check that you are not setting R library path to point to directories with older libraries.

This test script should run fine if you have everything set correctly

$ cat lapack.r 
zz = lm(Sepal.Length ~., data = iris) 

$ R --no-save < lapack.r 


Personal tools