A general-purpose statistical software package on the Research Computing Clusters and Nobel

Running Stata via Your Web Browser

Princeton Virtual Desktop

If you are most comfortable with Microsoft Windows and only need a single CPU-core, consider running Stata using the Princeton Virtual Desktop. Choose "StataSE 17" after the desktop loads. Central OIT maintains this service, so please open a Support Ticket with issues.

Learn more about Princeton Virtual Desktop.

Research Computing OnDemand

The simplest way to use Stata on the HPC clusters is through the Open OnDemand web interface. You will need to use a VPN to connect from off-campus (GlobalProtect VPN is recommended). If you have an account on Adroit or Della then browse to https://myadroit.princeton.edu or https://mydella.princeton.edu. If you need an account on Adroit then complete the Cluster Account Requests form.

To begin a session, click on "Interactive Apps" and then "XStata". You will need to choose the "Stata version", "Number of hours" and "Number of cores". Set "Number of cores" to 1 unless you are sure that your script has been explicitly parallelized. Click "Launch" and then when your session is ready click "Launch XStata". Note that the more resources you request, the more you will have to wait for your session to become available.

OnDemand Stata

There is No Internet Access of the Compute Nodes

Note that MyAdroit and MyDella run Stata on the compute nodes of the cluster which do not have internet access. You will be able to install modules but if you need to download other files then it must be done on the head node.

To get to the head node from the OnDemand dashboard (main menu) click on "Clusters" then "Adroit/Della Shell Cluster Access". You will be presented with a black terminal on the head node which does have Internet access. Use commands such as wget to download the files you require then return to your MyAdroit/MyDella session.

Submitting Batch Jobs to the Slurm Scheduler

Stata can be run on the HPC clusters, namely, Adroit and Della. These clusters use a job scheduler and all work must be submitted as batch or interactive jobs. Intermediate and advanced Stata users prefer submitting jobs to the Slurm scheduler over using the web interface (described above). A job consists of two pieces: (1) a Stata script or "do file" and (2) a Slurm script that specifies the needed resources, sets the environment and lists the commands to be run. The example below will help you understand how to run a batch job.

Running a Serial Stata Job

A serial Stata job is one that requires only a single CPU-core. Here is an example of a trivial, one-line serial Stata script (hello_world.do):

disp 21+21

The Slurm script (job.slurm) below can be used for serial jobs:

#!/bin/bash
#SBATCH --job-name=stata         # create a short name for your job
#SBATCH --nodes=1                # node count
#SBATCH --ntasks=1               # total number of tasks across all nodes
#SBATCH --cpus-per-task=1        # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G         # memory per cpu-core (4G per cpu-core is default)
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)
#SBATCH --mail-type=all          # send email on job start, end and fault
#SBATCH --mail-user=<YourNetID>@princeton.edu
module purge
module load stata/16.1
stata -b hello_world.do

To run the Stata script, simply submit the job to the cluster with the following command:

$ sbatch job.slurm

After the job completes, view the output with cat hello_world.log:

$ cat hello_world.log
  ___  ____  ____  ____  ____ (R)
 /__    /   ____/   /   ____/
___/   /   /___/   /   /___/   16.0   Copyright 1985-2019 StataCorp LLC
  Statistics/Data Analysis            StataCorp
                                      4905 Lakeway Drive
                                      College Station, Texas 77845 USA
                                      800-STATA-PC        http://www.stata.com
                                      979-696-4600        [email protected]
                                      979-696-4601 (fax)
100-user Stata network perpetual license:
       Serial number:  401606267559
         Licensed to:  Stata/SE 16
                       100-user Network
Notes:
      1.  Stata is running in batch mode.
      2.  Unicode is supported; see help unicode_advice.
. do "hello_world.do" 
. display 21+21
42
. 
end of do-file

Use squeue -u $USER to monitor the progress of queued jobs.

To run the example Stata job above on Adroit, follow these steps:

$ ssh <YourNetID>@adroit.princeton.edu
$ cd /scratch/network/$USER
$ git clone https://github.com/PrincetonUniversity/hpc_beginning_workshop.git
$ cd hpc_beginning_workshop/stata
# edit email address in job.slurm
$ sbatch job.slurm
$ squeue -u $USER  # monitor job status (if blank then job is done)
$ cat hello_world.log

Stata Versions

To see which versions are available, run the following command:

$ module avail stata
--------- /usr/licensed/Modules/modulefiles -----------
stata/11.0  stata/13.0  stata/15.0  stata/16.1(default)  
stata/12.0  stata/14.0  stata/16.0  stata/17.0

Learn more about environment modules.

Where to Put ado-files

Run the sysdir command to see the search path for ado files:

. sysdir
   STATA:  /usr/licensed/stata-16.1/
    BASE:  /usr/licensed/stata-16.1/ado/base/
    SITE:  /usr/licensed/ado/
    PLUS:  ~/ado/plus/
PERSONAL:  ~/ado/personal/
OLDPLACE:  ~/ado/

You should store ado files in /home/<YourNetID>/ado/personal. For more see this document.

Below is an example of using "net install":

. net install st0187, from (https://www.stata-journal.com/software/sj10-2/)

Stata/MP

Research Computing only has a license for Stata/SE. This means that Stata/MP, which is the multiprocessing version, is not available on the HPC clusters or Nobel. Some research groups have their own license so this rule does not apply to them. All users can use Stata/MP at DSS. With Stata/MP one can use multiple CPU-cores to run a single job which can lead to significant speed-ups.

Make the Switch from Nobel to Adroit

There are many advantages to using Stata on Adroit/MyAdroit over Nobel. First, because you can use Stata through your web browser via MyAdroit you avoid all the pitfalls associated with ssh -Y. Second, Research Computing is directly in control of the Adroit filesystems so it is easy to check your quota and request additional space (see checkquota). Third, Adroit is configured like all the large clusters at Princeton (e.g., Tiger) so if you become proficient with the filesystems, environment modules and job scheduler of Adroit then you will be comfortable using any supercomputer in the world. Lastly, in addition to using the web browser interface you can also use the batch job scheduler to submit and run a large number of jobs at once. Complete the Cluster Account Requests form to get an account on Adroit.

Running Stata on Nobel

If you have an X server like XQuartz or MobaXterm running on your laptop then follow the commands below to run xStata:

$ ssh -Y <YourNetID>@nobel.princeton.edu
$ module load stata/16.1
$ xstata myfile.do

You do not need to use a VPN to connect to Nobel since it is not behind a firewall.

To work with Stata on the command line without the GUI:

$ ssh <YourNetID>@nobel.princeton.edu
$ module load stata/16.1
$ stata myfile.do

Note that OIT maintains the filesystems on Nobel. They can be reached at [email protected].

Trouble Connecting and File Quota

If you encounter an error message like that below then it may be because you are over your quota:

X11 connection rejected because of wrong authentication.
(xstata-se:42220): > Gtk-WARNING **: 20:05:35.756: cannot open display: localhost:12.0

Try removing unnecessary files or complete this form to request more space from OIT. Research Computing does not maintain the filesystems of Nobel. Also, make sure that you satisfy the X server requirements.

Most users have a 5 GB quota. Run the following command to see how much storage you are using (it may take a minute):

$ du -sh ~/.

Or to see which folders are taking up the most space run this command (it may take a minute):

$ du -ch --max-depth=1 ~/. | sort -h

You can remove individual files with:

$ rm <file1> <file2> <file3>

Or remove entire directories with:

$ rm -rf <directory1> <directory2> <directory3>

Your home directory on Nobel, which is also known as the H: drive, has this absolute path: /n/homeserver2/user2a/<YourNetID>.

Learning Resources

FAQ

1. How do I solve this error: "I/O error writing .dta file. Usually such I/O errors are caused by the disk or file system being full."

This might be an indication that you are running out of space in /tmp due to how stata creates temporary files. Before you start Stata, run the following command to use your home directory instead:

export STATATMP=$HOME

2. To obtain a license and the software to run STATA on a university-owned machine or your personal machine see: http://www.princeton.edu/oitstore

3. When using STATA via OnDemand, the session does not load but instead says "noVNC". What is the solution?

It could be that your sessions require more memory. When making the session be sure to choose a larger value in the field called "Memory allocated for the job, in GBs". The default for this field is 4 GB. Try a large value like 32 GB.

Getting Help from OIT

If your Stata problem relates to the H: drive filesystem then please contact OIT via email or live chat or complete this form to request a quota increase.

Getting Help from Data & Statistical Services (DSS)

For help on using Stata with data analysis please see the DSS website. DSS offers online tutorials and training for performing data analysis with Stata as well as one-on-one appointments.

Getting Help from Research Computing

If you encounter any difficulties while working with Stata on the HPC clusters then please send an email to [email protected] or attend a help session.