- Running RStudio in Your Web Browser
- Running RStudio on Nobel
- Installing R Packages
- GSL Environment Module
- Installing ncdf4 or hdf5r
- Custom Modules
- Development Versions
- Using Conda
- Submitting Jobs to the Batch Scheduler
- Example of Installing Packages, Uploading Files and Running a Job
- Where to Run R Jobs
- Optimizing Performance
- Getting R Packages onto a Secure VM
- Rmpi and HPC R
- Getting Help
VERSION 4.3 IS NOW THE DEFAULT
As of May 2023, the default version of R on Della is 4.3. To continue using an earlier version such as 4.2, run the following command on the command line and/or in your Slurm script:
module load R/4.2.3
To see the available versions of R, run the following command:
module avail R
ONDEMAND RSTUDIO CAN NO LONGER ACCESS THE INTERNET (12/14/2022)
Internet access has been disabled in RStudio sessions due to security and compliance reasons. Previously, it was possible to install R packages and carry out other network operations in RStudio but this is no longer the case. The correct way to install R packages is to do so on the command line (see Installing R Packages below) before starting a session. If your work happens to require internet access in RStudio then use one of the visualization nodes by choosing "Interactive Apps" in the OnDemand main menu and then either "RStudio Server on Della Vis1" or "RStudio Server on Della Vis2". Keep in mind that the visualization nodes are shared between all users.
If you attempt to install R packages from within RStudio you will encounter this error:
> install.packages("microbenchmark") Warning in install.packages : unable to access index for repository https://cran.rstudio.com/src/contrib: cannot open URL 'https://cran.rstudio.com/src/contrib/PACKAGES'
Running RStudio via Your Web Browser
Princeton Virtual Desktop
If you are most comfortable with Microsoft Windows and only need a single CPU-core, consider running RStudio using the Princeton Virtual Desktop. Choose "Student Labs" then "RStudio". Central OIT maintains this service, so please open a Support Ticket with issues.
Learn More about Princeton Virtual Desktop.
Research Computing OnDemand
RStudio is available through two web portals. You will need to use a VPN to connect from off-campus (GlobalProtect VPN is recommended). If you have an account on Adroit or Della then browse to https://myadroit.princeton.edu or https://mydella.princeton.edu. To begin a session, click on "Interactive Apps" and then "RStudio Server". For more details see this tutorial from DSS. Complete this form if you need an acount on Adroit.
Note that on Adroit and Della, when you save any user data from your RStudio sessions (e.g. your session information, code history, etc.), those files are placed in your /scratch/network (on Adroit) or /scratch/gpfs (on Della) folder.
All R packages must be installed from the command line on the login node of Adroit or Della. To get to the command line on the login node from the OnDemand main menu, click on "Clusters" and then "<Name> Cluster Shell Access". This will take you to a black terminal window where you can install packages by running the appropriate commands, for example:
$ R > install.packages("dplyr")
See below for more details on installing packages on the command line.
From the MyAdroit/MyDella main menu choose "Files" then "/scratch/network/<YourNetID>" on MyAdroit or "/scratch/gpfs/<YourNetID>" on MyDella. Choose "New Dir" to make a directory with a name you create. Double click on the newly created directory to open it. Choose "Upload" to transfer your files from your local computer to Adroit/Della. If you need to edit a file after uploading then choose "Edit". You can also create new files. See a video demonstration of uploading files and many other operations in OnDemand. Learn more about the different locations to store your files.
Internet Access is Not Available During Running Sessions
RStudio runs on the compute nodes which do not have Internet access. This means that you will not be able to install R packages, download files, clone a repo from GitHub, etc. If you need internet access then in the main OnDemand menu, click on "Clusters" and then "<Name> Cluster Shell Access". This will present you with a black terminal screen on the head node where you can run commands which need internet access. Any files or packages that you download while on the head node will be available on the compute nodes where your OnDemand session runs. If your work happens to require internet access in RStudio then use one of the visualization nodes on Della by choosing "Interactive Apps" in the OnDemand main menu and then either "RStudio Server on Della Vis2" or "RStudio Server on Della Vis3". Keep in mind that the visualization nodes are shared between all users.
Running RStudio on Nobel
If you have an X server like XQuartz or MobaXterm running on your laptop then follow the commands below to run RStudio:
$ ssh -Y <YourNetID>@nobel.princeton.edu $ rstudio
If you encounter an error message like that below then it may be because you are over your quota:
X11 connection rejected because of wrong authentication. (xstata-se:42220): > Gtk-WARNING **: 20:05:35.756: cannot open display: localhost:12.0
Try removing unnecessary files or complete this form to request more space from OIT. Research Computing does not maintain the filesystems of Nobel. Also, make sure that you satisfy the X server requirements described on this page.
Most users have a 5 GB quota. Run the following command to see how much storage you are using:
$ du -sh ~/.
To see which folders are taking up the most space:
$ du -h --max-depth=1 ~/. | sort -hr
You can remove individual files with:
$ rm <file1> <file2> <file3>
Or remove entire directories with:
$ rm -rf <directory1> <directory2> <directory3>
Your home directory on Nobel is also known as the "H: drive".
Installing R Packages
R packages may be distributed in source form or as compiled binaries. Packages that come in source form must be compiled before they can be installed in your /home directory. The recommended tool suite for doing this is the GNU Compiler Collection (GCC) and specifically g++, which is the C++ compiler. To provide a stable environment for building software on our HPC clusters, the default version of GCC is kept the same for years at a time. To see the current version of g++, run the following command on one of the HPC clusters (e.g., Della):
$ g++ --version g++ (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4)
On some machines (e.g., Tiger) you may find that the GCC version is 4.8.5. In this case it is necessary to load the rh/devtoolset/8 module before installing packages. See "QUICK FIX" above.
Before You Install
Make sure you have enough disk space before installing. This can be done by running the checkquota command:
$ checkquota Storage/size quota filesystem report for user: ceisgrub Filesystem Mount Used Limit MaxLim Comment Adroit home /home 8.3GB 9.3GB 10GB Adroit scratch /scratch 0 0 0 Adroit scratch network /scratch/network 8.2GB 0 0 Storage number of files used report for user: ceisgrub Filesystem Mount Used Limit MaxLim Comment Adroit home /home 52.9K 975K 1.0M Adroit scratch /scratch 1 0 0 Adroit scratch network /scratch/network 39.3K 0 0 For quota increase requests please use this website: https://forms.rc.princeton.edu/quota
The difference between Limit and Used in the /home row is your available space. In the example above the user has 9.3 - 8.3 = 1 GB available. Most packages require fewer than 0.1 GB. However, if you are installing many packages then disk space should be a concern. If you require more space then follow the link at the bottom of the output of checkquota to request more.
Installing the First Package
After connecting to one of the clusters via ssh, start R and then install a package. The first time you do this you will want to answer 'yes' to the first two questions and then choose the value for USA (OH) when asked to select a CRAN mirror. Below is a full example session on Della from 2019:
$ ssh <YourNetID>@della.princeton.edu $ R R version 3.6.2 (2019-12-12) -- "Dark and Stormy Night" Copyright (C) 2019 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu (64-bit) ... > install.packages("argpase") Installing package into ‘/usr/lib64/R/library’ (as ‘lib’ is unspecified) Warning in install.packages("lubridate") : 'lib = "/usr/lib64/R/library"' is not writable Would you like to use a personal library instead? (yes/No/cancel) yes Would you like to create a personal library ‘~/R/x86_64-redhat-linux-gnu-library/3.6’ to install packages into? (yes/No/cancel) yes --- Please select a CRAN mirror for use in this session --- Secure CRAN mirrors 1: 0-Cloud [https] 2: Algeria [https] 3: Australia (Canberra) [https] 4: Australia (Melbourne 1) [https] 5: Australia (Melbourne 2) [https] 6: Australia (Perth) [https] 7: Austria [https] 8: Belgium (Ghent) [https] 9: Brazil (BA) [https] 10: Brazil (PR) [https] 11: Brazil (RJ) [https] 12: Brazil (SP 1) [https] 13: Brazil (SP 2) [https] 14: Bulgaria [https] 15: Chile (Santiago) [https] 16: China (Hong Kong) [https] 17: China (Lanzhou) [https] 18: China (Shanghai) [https] 19: Colombia (Cali) [https] 20: Czech Republic [https] 21: Denmark [https] 22: Ecuador (Cuenca) [https] 23: Ecuador (Quito) [https] 24: Estonia [https] 25: France (Lyon 1) [https] 26: France (Lyon 2) [https] 27: France (Marseille) [https] 28: France (Montpellier) [https] 29: Germany (Erlangen) [https] 30: Germany (Göttingen) [https] 31: Germany (Münster) [https] 32: Germany (Regensburg) [https] 33: Greece [https] 34: Hungary [https] 35: Iceland [https] 36: Indonesia (Jakarta) [https] 37: Ireland [https] 38: Italy (Padua) [https] 39: Japan (Tokyo) [https] 40: Japan (Yonezawa) [https] 41: Korea (Busan) [https] 42: Korea (Gyeongsan-si) [https] 43: Korea (Seoul 1) [https] 44: Korea (Ulsan) [https] 45: Malaysia [https] 46: Mexico (Mexico City) [https] 47: Morocco [https] 48: Norway [https] 49: Philippines [https] 50: Russia [https] 51: Spain (Madrid) [https] 52: Sweden [https] 53: Switzerland [https] 54: Turkey (Denizli) [https] 55: Turkey (Mersin) [https] 56: UK (Bristol) [https] 57: UK (London 1) [https] 58: USA (CA 1) [https] 59: USA (IA) [https] 60: USA (KS) [https] 61: USA (MI 1) [https] 62: USA (MI 2) [https] 63: USA (OR) [https] 64: USA (TN) [https] 65: USA (TX 1) [https] 66: Uruguay [https] 67: (other mirrors) Selection: 64
Your desired package and its dependencies will be built and installed. To help with organization, you can make different libraries and install your packages into the library of your choosing. After your first session, you will only be asked to select the CRAN mirror when installing a package.
GSL Environment Module
Some R packages require a newer version of the GNU Scientific Library (GSL). This can be accomplished with:
$ module load gsl/2.6 $ R > install.packages("<package-name>")
IMPORTANT: If you built a package with the gsl module loaded then you will need to add module load gsl/<version> to your Slurm script. If using OnDemand then enter this into the "Additional environment module(s) to load" field.
You do not need to load any modules to install sf, rgdal, rstan, brms, lwgeom, geojsonio and terra.
Installing ncdf4 and hdf5r
To install ncdf4, you need to load two environment modules before starting R. The full session appears as follows:
$ ssh <YourNetID>@della.princeton.edu $ module load hdf5/gcc/1.10.6 netcdf/gcc/hdf5-1.10.6/4.7.4 $ R > install.packages("ncdf4")
You must include the two modules for OnDemand RStudio sessions via the "Additional environment module(s) to load" field. If using sbatch then include the two modules in the Slurm script. The procedure above can be used for hdf5r (in this case include hdf5/gcc/1.10.6 and omit netcdf/gcc/hdf5-1.10.6/4.7.4).
You can create your own environment modules which can then be loaded for an OnDemand session. For example, one can create a Conda environment of R packages and load the module for this environment. See the directions for creating custom modules. To get this work on MyAdroit, add your module files here:
Specify the name of the module in the appropriate field when creating the OnDemand session.
Run the commands below to install RStan:
$ module load rh/devtoolset/8 # tiger only (not della or adroit) $ export DOWNLOAD_STATIC_LIBV8=1 # adroit only (not della or tiger) $ R > install.packages("rstan")
The RStan package compiles models from source at run time. For this reason it is necessary to make a modern compiler suite available. This can be done by including this line your Slurm script:
module load rh/devtoolset/8 # tiger only (not della or adroit)
Failure to load this module will result in an error such as:
g++: error: unrecognized command line option ‘-std=gnu++14’make: *** [file396ab327a56f.o] Error 1
This must also be done for packages that depend on RStan such as brms. These directions apply to Della. Tiger is not configured to work with RStan since that cluster is designed for multinode parallel jobs.
You can install the released version of a package such as furrr from CRAN with:
In certain cases, such as when you need the bleeding-edge changes and bug fixes, you should install the development version from GitHub with:
# install.packages("devtools") > devtools::install_github("DavisVaughan/furrr")
One can create an isolated Conda environment composed of R packages and R itself. You can search for these packages on anaconda.org. For instance, to create a Conda environment that includes rmapshaper and other packages:
$ module load anaconda3/2022.5 $ conda create --name mshpr-env --channel conda-forge r-dplyr r-rmapshaper r-sf $ conda activate mshpr-env $ R > q()
Note that a Conda environment composed of R packages comes with its own R executable. Be sure to load the anaconda3/2022.5 module and activate the environment in your Slurm script.
Submitting Jobs to the Batch Scheduler
The following Slurm script could be used to run a serial R job:
#!/bin/bash #SBATCH --job-name=R-serial # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks=1 # total number of tasks across all nodes #SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --mem-per-cpu=4G # memory per cpu-core (4G per cpu-core is default) #SBATCH --time=00:01:00 # total run time limit (HH:MM:SS) #SBATCH --mail-type=all # send email on start, end and fault #SBATCH --mail-user=<YourNetID>@princeton.edu module purge Rscript myscript.R
If you built a package with the gsl module loaded then you will need to add module load gsl/<version> before the Rscript command in the script above.
Follow the commands below to run your first R script on Della, for example:
$ ssh <YourNetID>@della.princeton.edu $ cd /scratch/gpfs/<YourNetID> $ git clone https://github.com/PrincetonUniversity/hpc_beginning_workshop $ cd hpc_beginning_workshop/serial_R # edit email address in job.slurm $ sbatch job.slurm
There is a similar example to that above for the Adroit cluster here.
Example of Installing Packages, Uploading Files and Running a Job
1. Install the required R packages
Connect via VPN (if off-campus) then browse to myadroit or mydella. Choose "Clusters" then "Adroit/Della Cluster Shell Access". This will open a black terminal screen. Run these commands (for your specific R packages):
$ module load rh/devtoolset/8 # tiger only (not della or adroit) $ R > install.packages(c("dplyr", "lubridate")) # answer "yes" twice then choose OH as the mirror by entering the appropriate number > q()
2. Upload your files
Return to your browser tab with the OnDemand main menu. Choose "Files" then "/scratch/network/<YourNetID>" on MyAdroit or "/scratch/gpfs/<YourNetID>" on MyDella. Choose "New Dir" to make a directory with a name you create (below this is referred to as <JobDirectory>). Double click on the newly created directory to open it. Choose "Upload" to transfer your R script, data files and Slurm script (job.slurm) from your local computer to Adroit/Della. If you need to edit a file after uploading then choose "Edit". You can also create new files. See a video demonstration of uploading files and many other operations in OnDemand.
3. Submit the job
Return to the tab with the black terminal. Run these commands:
$ cd /scratch/network/<YourNetID>/<JobDirectory> # or /scratch/gpfs for della $ sbatch job.slurm
To monitor the status of the job use:
$ squeue -u <YourNetID>
Once the job is complete you can download the files using the MyAdroit/MyDella GUI. To learn more about Slurm and the Linux command line see this guide.
Where to Run R Jobs
Adroit and Della are ideal for R jobs. The Tiger and Stellar clusters were designed for parallel jobs that require multiple nodes making them unfit for R. The scheduler on these clusters has been configured to give serial or single-node jobs the lowest priority. In some cases, squeue will classify the reason that the small job is pending as (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions). This is indicating that the required resources are being used for large jobs. Your serial or single-node job will eventually run, however. If you only have an account on Tiger and you want to run several small R jobs then please write to [email protected] to request an account on Della. Be sure to explain the situation.
The performance of numerically intensive packages such as RStan can be improved through compiler optimizations and vectorization. If you are an advanced user, before installing such a package, you may consider turning on these optimizations by creating a ~/.R/Makevars file containing these lines:
CC = gcc CXX = g++ FC = gfortran CFLAGS = -O3 -ffast-math -march=native -fwhole-program -fpic -m64 CXXFLAGS = -O3 -ffast-math -march=native -fwhole-program -fpic -m64 FFLAGS = -O3 -ffast-math -march=native -fwhole-program -fpic -m64
After installing such a package with the Makevars settings above, you must remove or rename the Makevars file to prevent the optimizations from creating incompatibilities with packages to be installed at a later time.
There may be times when you will need to specify a language standard. This can be done by adding a line to Makevars such as:
CXX14STD = -std=c++14
Getting R Packages onto a Secure VM
Most VMs are unable to reach the Internet and unreachable from the Internet for security purposes. In such cases one cannot directly install packages. One solution to this is to setup a similar environment on another machine which has Internet access and then copy it over (or have someone copy it). We generally suggest getting an account on Adroit, one of our cluster machines, where you can use the head node of the cluster to create the environment. You can request an account on here.
Once you have an account on Adroit, and are connected to one of the University VPN services, you can SSH from your computer directly to Adroit. You'll then install the R packages. Here is an example session:
$ ssh <YourNetID>@adroit.princeton.edu $ mkdir mylibs $ export R_LIBS_USER=/home/<YourNetID>/mylibs $ module load rh/devtoolset/8 # tiger only (not della or adroit) $ R > .libPaths() # "/home/<YourNetID>/mylibs" should appear in the list > install.packages(c("dplyr", "ggplot2", "lubridate", "caret")) ... Selection: 56 # choose a mirror such as "USA (OH) [https]" ...
When finished, quit R and return to the command line. Then use 'tar' to compress the mylibs directory:
$ tar cvzf mylibs.tar.gz mylibs
On the VM
Transfer mylibs.tar.gz to the VM and unpack it. In some cases this will need to be done by a member of Research Computing. To unpack the file use:
$ tar xzf mylibs.tar.gz
$ export R_LIBS_USER=<path/to>/mylibs $ R
You should be able to load the libraries in R. If you encounter problems use .libPaths() to check the paths that are searched for R libraries. You can also look inside the mylibs directory to check for the existence of certain packages and their dependencies.
Rmpi and HPC R
For directions on building Rmpi and approaches to parallelizing R scripts see this workshop. If you are using Rmpi on Della and you find that jobs hang, try adding this line to the end of your R script:
1. I tried to install an R package but the installation failed with this error message: for loop initial declarations are only allowed in C99 mode. What should I do?
This problem can be solved by loading a newer version of GCC. To do this, before starting R, run this command on the command line: module load rh/devtoolset/8. Read the content above for the explanation for this solution.
2. Nothing is working properly and I want to delete all my R packages and start over. How do I do this?
To delete all of your R packages and R files: rm -rf ~/R ~/.R ~/.rstudio ~/.Rhistory ~/.Rprofile ~/.RData. You may also need to remove lines from your ~/.bashrc file if you added or modified environment variables.
3. How do I see where the R packages are installed?
The paths to system and user packages can be seen with this R command: > .libPaths(). To specifically see the path to user packages use: > Sys.getenv("R_LIBS_USER").
4. How do I see my installed packages? All base and user packages can be listed with the R command:
5. How do I see the default packages? Default packages can be listed with the R command:
6. I have a list of packages. How do I install them all at once?
> install.packages(c("<package-name-1>", "<package-name-2>", ...))
7. How do I remove a package? A package can be removed with the command:
8. I have the source code for a package I want to install. How do I perform the installation? For RMPI, for instance, use this command on the command line:
$ R CMD INSTALL -l ~/.local/lib --no-test-load Rmpi_0.6-9.tar.gz
Then start R and do:
> library("Rmpi", lib.loc="/home/<YourNetID>/.local/lib")
Or one could set an R environment variable in ~/.bashrc:
9. How can I solve the following error?
Installing package into ‘/home/ceisgrub/R/x86_64-redhat-linux-gnu-library/3.6’ (as ‘lib’ is unspecified) --- Please select a CRAN mirror for use in this session --- Error in structure(.External(.C_dotTclObjv, objv), class = "tclObj") : [tcl] grab failed: window not viewable.
R is trying to display a list of mirrors via X11 forwarding so try unsetting DISPLAY before starting R:
$ unset DISPLAY $ module load rh/devtoolset/8 $ R > install.packages("<package-name>")
10. Which BLAS/LAPACK library is R using?
This information is available by running the sessionInfo() command.
11. How should I deal with this error: 'ERROR: failed to lock directory '/home/aturing/R/x86_64-redhat-linux-gnu-library/4.0' for modifying. Try removing '/home/aturing/R/x86_64-redhat-linux-gnu-library/4.0/00LOCK-data.table'?
Follow the suggestion which says to remove a specific directory. This can be done with: rm -rf /home/aturing/R/x86_64-redhat-linux-gnu-library/4.0/00LOCK-data.table. Be sure to use the path from your own case. If you are using RStudio on MyAdroit or MyDella then this command must be run on the command line. From the OnDemand main menu, click on "Clusters" and then "<Name> Cluster Shell Access". This will take you to a black terminal window where you can run the command.
12. How should I deal with this error: "Error: protect(): protection stack overflow."?
This error can occur when working with large data files. We are not aware of the solution for RStudio on MyAdroit or MyDella but if you submit a job to the Slurm scheduler, call Rscript in this way:
Rscript --max-ppsize=500000 myscript.R
13. How do I use a version of R from Anaconda instead of the system R in RStudio?
On the head node, activate your environment then use the "which R" command to get the path. Enter the full path into the PATH field (do not use ~ or $HOME) when creating the RStudio session via MyAdroit/MyDella.
14. What does "Status code 502" mean when using OnDemand RStudio?
This error can arise when a user tries to run two commands at once. Try letting each command finish before running the next. For instance, if the script is executing then do not try to also save the R file. The error can also be an indication of requesting insufficient RAM. Try starting a new session with more RAM.
15. When using OnDemand, I am presented with a prompt reading "Sign in to RStudio" with a username and password field. The line above the prompt reads: "Error: Temporary server error, please try again".
16. In OnDemand, after I click on “Connect to RStudio Server” it takes a very long time to connect. How do I solve this?
This is typically explained by past suspended sessions. Try stopping all OnDemand sessions and then run the command below on the command line:
$ rm -rf /home/<YourNetID>/.local/share/rstudio/sessions/active
For Data Analysis
For help on using R with data analysis please see the DSS website. DSS offers online tutorials and training for performing data analysis with R as well as one-on-one appointments.
For R Work on the HPC clusters
If you encounter any difficulties while working with R on the HPC clusters then please send an email to [email protected] or attend a walk-in help session.
Additional R Resources at Princeton
To see other centers with R resources on campus, view the exploringr.princeton.edu website.