Stata on the HPC Clusters and Nobel

MyAdroit and MyDella

Run Stata via Your Web Browser

If you are new to high-performance computing then you will find that the simplest way to use Stata on the HPC clusters is through the Open OnDemand web interface. If you have an account on Adroit or Della then browse to https://myadroit.princeton.edu or https://mydella.princeton.edu. If you need an account on Adroit then complete this form. Note that you will need to use a VPN to connect from off-campus. We recommend the GlobalProtect VPN.

To begin a session, click on "Interactive Apps" and then "XStata". You will need to choose the "Stata version", "Number of hours" and "Number of cores". Set "Number of cores" to 1 unless you are sure that your script has been explicitly parallelized using, for example, the Parallel Computing Toolbox (see below). Click "Launch" and then when your session is ready click "Launch XStata". Note that the more resources you request, the more you will have to wait for your session to become available.

Note that MyAdroit and MyDella run Stata on the compute nodes of the cluster which do not have internet access for security reasons. You will be able to install modules but if you need to download other files then it must be done on the head node.

To get to the head node from the OnDemand dashboard (main menu) click on "Clusters" then "Adroit/Della Shell Cluster Access". You will enter a black terminal screen on the head node which does have internet access. Use commands such as wget to download the files you require then return to your MyAdroit/MyDella session.

Make the Switch from Nobel to Adroit

There are many advantages to using Stata on Adroit/MyAdroit over Nobel. First, because you can use Stata through your web browser via MyAdroit you avoid all the pitfalls associated with ssh -Y. Second, Research Computing is directly in control of the Adroit filesystems so it is easy to check your quota and request additional space (see checkquota). Third, Adroit is configured like all the large clusters at Princeton (e.g., Tiger) so if you become proficient with the filesystems, environment modules and job scheduler of Adroit then you will be comfortable using any supercomputer in the world. Lastly, in addition to using the web browser interface you can also use the batch job scheduler to submit and run a large number of jobs at once. Complete this form to get an account on Adroit.

 

Running Stata on Nobel

If you have an X server like XQuartz or MobaXterm running on your laptop then follow the commands below to run xStata:

$ ssh -Y <YourNetID>@nobel.princeton.edu
$ module load stata/16.0
$ xstata myfile.do

You do not need to use a VPN to connect to Nobel since it is not behind a firewall.

To work with Stata on the command line without the GUI:

$ ssh <YourNetID>@nobel.princeton.edu
$ module load stata/16.0
$ stata myfile.do

Note that OIT maintains the filesystems on Nobel. They can be reached at helpdesk@princeton.edu.

Trouble Connecting and File Quota

If you encounter an error message like that below then it may be because you are over your quota:

X11 connection rejected because of wrong authentication.
(xstata-se:42220): > Gtk-WARNING **: 20:05:35.756: cannot open display: localhost:12.0

Try removing unnecessary files or complete this form to request more space from OIT. Research Computing does not maintain the filesystems of Nobel. Also, make sure that you satisfy the X server requirements described on this page.

Most users have a 5 GB quota. Run the following command to see how much storage you are using (it may take a minute):

$ du -sh ~/.

Or to see which folders are taking up the most space run this command (it may take a minute):

$ du -ch --max-depth=1 ~/. | sort -h

You can remove individual files with:

$ rm <file1> <file2> <file3>

Or remove entire directories with:

$ rm -rf <directory1> <directory2> <directory3>

Your home directory on Nobel, which is also known as the H: drive, has this absolute path: /n/homeserver2/user2a/<YourNetID>.

 

Where to Put ado-files

Run the sysdir command to see the search path for ado files:

. sysdir
   STATA:  /usr/licensed/stata-16.1/
    BASE:  /usr/licensed/stata-16.1/ado/base/
    SITE:  /usr/licensed/ado/
    PLUS:  ~/ado/plus/
PERSONAL:  ~/ado/personal/
OLDPLACE:  ~/ado/

You should store ado files in /home/<YourNetID>/ado/personal. For more see this document.

 

Submitting Batch Jobs to the Slurm Scheduler

Stata can be run on the HPC clusters, namely, Adroit and Della. These clusters use a job scheduler and all work must be submitted as batch jobs. Intermediate and advanced Stata users prefer submitting jobs to the Slurm scheduler over using the web interface (described above). A job consists of two pieces: (1) a Stata script or "do file" and (2) a Slurm script that specifies the needed resources, sets the environment and lists the commands to be run. The example below will help you understand how to run a batch job.

Running a Serial Stata Job

A serial Stata job is one that requires only a single CPU-core. Here is an example of a trivial, one-line serial Stata script (hello_world.do):

disp 21+21

The Slurm script (job.slurm) below can be used for serial jobs:

#!/bin/bash
#SBATCH --job-name=stata         # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G per cpu-core is default)
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=all          # send email on job start, end and fault
#SBATCH --mail-user=<YourNetID>@princeton.edu

module purge
module load stata/16.0

stata -b hello_world.do

To run the Stata script, simply submit the job to the cluster with the following command:

$ sbatch job.slurm

After the job completes, view the output with cat hello_world.log:

 /__    /   ____/   /   ____/
___/   /   /___/   /   /___/   16.0   Copyright 1985-2019 StataCorp LLC
  Statistics/Data Analysis            StataCorp
                                      4905 Lakeway Drive
                                      College Station, Texas 77845 USA
                                      800-STATA-PC        http://www.stata.com
                                      979-696-4600        stata@stata.com
                                      979-696-4601 (fax)

100-user Stata network perpetual license:
       Serial number:  401606267559
         Licensed to:  Stata/SE 16
                       100-user Network

Notes:
      1.  Stata is running in batch mode.
      2.  Unicode is supported; see help unicode_advice.

. do "hello_world.do" 

. display 21+21
42

. 
end of do-file

Use squeue -u $USER to monitor the progress of queued jobs.

To run the example Stata job above on Adroit, follow these steps:

$ ssh <YourNetID>@adroit.princeton.edu
$ git clone https://github.com/PrincetonUniversity/hpc_beginning_workshop.git
$ cd hpc_beginning_workshop/RC_example_jobs/serial_stata
$ sbatch job.slurm
# monitor the status of the job: $ squeue -u $USER
$ cat hello_world.log

 

Stata/MP

Research Computing only has a license for Stata/SE. This means that Stata/MP,  which is the multiprocessing version,  is not available on the HPC clusters or Nobel. However, you can use Stata/MP at DSS. With Stata/MP one can use multiple CPU-cores to run a single job which can lead to significant speed-ups.

 

FAQ

1. How do I solve this error: "I/O error writing .dta file. Usually such I/O errors are caused by the disk or file system being full."

This might be an indication that you are running out of space in /tmp due to how stata creates temporary files. Before you start Stata, run the following command to use your home directory instead:

export STATATMP=$HOME

 

Getting Help from OIT

If your Stata problem relates to the H: drive filesystem such as going over your quota then please contact OIT via email or live chat.

 

Getting Help from Data & Statistical Services (DSS)

For help on using Stata with data analysis please see the DSS website. DSS offers online tutorials and training for performing data analysis with Stata as well as one-on-one appointments.

 

Getting Help from Research Computing

If you encounter any difficulties while working with Stata on the HPC clusters then please send an email to cses@princeton.edu or attend a walk-in help session.