- Running Stata via Your Web Browser
- Running Stata on Nobel
- Where to Put ado-files
- Submitting Batch Jobs to the Slurm Scheduler
- Getting Help from OIT
- Getting Help from Data & Statistical Services
- Getting Help from Research Computing
The simplest way to use Stata on the HPC clusters is through the Open OnDemand web interface. You will need to use a VPN to connect from off-campus (GlobalProtect VPN is recommended). If you have an account on Adroit or Della then browse to https://myadroit.princeton.edu or https://mydella.princeton.edu. If you need an account on Adroit then complete this form.
To begin a session, click on "Interactive Apps" and then "XStata". You will need to choose the "Stata version", "Number of hours" and "Number of cores". Set "Number of cores" to 1 unless you are sure that your script has been explicitly parallelized using, for example, the Parallel Computing Toolbox (see below). Click "Launch" and then when your session is ready click "Launch XStata". Note that the more resources you request, the more you will have to wait for your session to become available.
Note that MyAdroit and MyDella run Stata on the compute nodes of the cluster which do not have Internet access for security reasons. You will be able to install modules but if you need to download other files then it must be done on the head node.
To get to the head node from the OnDemand dashboard (main menu) click on "Clusters" then "Adroit/Della Shell Cluster Access". You will be presented with a black terminal on the head node which does have Internet access. Use commands such as wget to download the files you require then return to your MyAdroit/MyDella session.
Make the Switch from Nobel to Adroit
There are many advantages to using Stata on Adroit/MyAdroit over Nobel. First, because you can use Stata through your web browser via MyAdroit you avoid all the pitfalls associated with ssh -Y. Second, Research Computing is directly in control of the Adroit filesystems so it is easy to check your quota and request additional space (see checkquota). Third, Adroit is configured like all the large clusters at Princeton (e.g., Tiger) so if you become proficient with the filesystems, environment modules and job scheduler of Adroit then you will be comfortable using any supercomputer in the world. Lastly, in addition to using the web browser interface you can also use the batch job scheduler to submit and run a large number of jobs at once. Complete this form to get an account on Adroit.
If you have an X server like XQuartz or MobaXterm running on your laptop then follow the commands below to run xStata:
$ ssh -Y <YourNetID>@nobel.princeton.edu $ module load stata/16.1 $ xstata myfile.do
You do not need to use a VPN to connect to Nobel since it is not behind a firewall.
To work with Stata on the command line without the GUI:
$ ssh <YourNetID>@nobel.princeton.edu $ module load stata/16.1 $ stata myfile.do
Note that OIT maintains the filesystems on Nobel. They can be reached at email@example.com.
Trouble Connecting and File Quota
If you encounter an error message like that below then it may be because you are over your quota:
X11 connection rejected because of wrong authentication. (xstata-se:42220): > Gtk-WARNING **: 20:05:35.756: cannot open display: localhost:12.0
Try removing unnecessary files or complete this form to request more space from OIT. Research Computing does not maintain the filesystems of Nobel. Also, make sure that you satisfy the X server requirements.
Most users have a 5 GB quota. Run the following command to see how much storage you are using (it may take a minute):
$ du -sh ~/.
Or to see which folders are taking up the most space run this command (it may take a minute):
$ du -ch --max-depth=1 ~/. | sort -h
You can remove individual files with:
$ rm <file1> <file2> <file3>
Or remove entire directories with:
$ rm -rf <directory1> <directory2> <directory3>
Your home directory on Nobel, which is also known as the H: drive, has this absolute path: /n/homeserver2/user2a/<YourNetID>.
Run the sysdir command to see the search path for ado files:
. sysdir STATA: /usr/licensed/stata-16.1/ BASE: /usr/licensed/stata-16.1/ado/base/ SITE: /usr/licensed/ado/ PLUS: ~/ado/plus/ PERSONAL: ~/ado/personal/ OLDPLACE: ~/ado/
You should store ado files in /home/<YourNetID>/ado/personal. For more see this document.
Stata can be run on the HPC clusters, namely, Adroit and Della. These clusters use a job scheduler and all work must be submitted as batch jobs. Intermediate and advanced Stata users prefer submitting jobs to the Slurm scheduler over using the web interface (described above). A job consists of two pieces: (1) a Stata script or "do file" and (2) a Slurm script that specifies the needed resources, sets the environment and lists the commands to be run. The example below will help you understand how to run a batch job.
Running a Serial Stata Job
A serial Stata job is one that requires only a single CPU-core. Here is an example of a trivial, one-line serial Stata script (hello_world.do):
The Slurm script (job.slurm) below can be used for serial jobs:
#!/bin/bash #SBATCH --job-name=stata # create a short name for your job #SBATCH --nodes=1 # node count #SBATCH --ntasks=1 # total number of tasks across all nodes #SBATCH --cpus-per-task=1 # cpu-cores per task (>1 if multi-threaded tasks) #SBATCH --mem-per-cpu=4G # memory per cpu-core (4G per cpu-core is default) #SBATCH --time=00:01:00 # total run time limit (HH:MM:SS) #SBATCH --mail-type=all # send email on job start, end and fault #SBATCH --mail-user=<YourNetID>@princeton.edu module purge module load stata/16.1 stata -b hello_world.do
To run the Stata script, simply submit the job to the cluster with the following command:
$ sbatch job.slurm
After the job completes, view the output with cat hello_world.log:
$ cat hello_world.log ___ ____ ____ ____ ____ (R) /__ / ____/ / ____/ ___/ / /___/ / /___/ 16.0 Copyright 1985-2019 StataCorp LLC Statistics/Data Analysis StataCorp 4905 Lakeway Drive College Station, Texas 77845 USA 800-STATA-PC http://www.stata.com 979-696-4600 firstname.lastname@example.org 979-696-4601 (fax) 100-user Stata network perpetual license: Serial number: 401606267559 Licensed to: Stata/SE 16 100-user Network Notes: 1. Stata is running in batch mode. 2. Unicode is supported; see help unicode_advice. . do "hello_world.do" . display 21+21 42 . end of do-file
Use squeue -u $USER to monitor the progress of queued jobs.
To run the example Stata job above on Adroit, follow these steps:
$ ssh <YourNetID>@adroit.princeton.edu $ git clone https://github.com/PrincetonUniversity/hpc_beginning_workshop.git $ cd hpc_beginning_workshop/RC_example_jobs/stata # edit email address in job.slurm $ sbatch job.slurm $ squeue -u $USER # monitor job status (if blank then job is done) $ cat hello_world.log
Research Computing only has a license for Stata/SE. This means that Stata/MP, which is the multiprocessing version, is not available on the HPC clusters or Nobel. However, you can use Stata/MP at DSS. With Stata/MP one can use multiple CPU-cores to run a single job which can lead to significant speed-ups.
1. How do I solve this error: "I/O error writing .dta file. Usually such I/O errors are caused by the disk or file system being full."
This might be an indication that you are running out of space in /tmp due to how stata creates temporary files. Before you start Stata, run the following command to use your home directory instead:
2. To obtain a license and the software to run STATA on a university-owned machine or your personal machine see: http://www.princeton.edu/oitstore