Tigressdata

OUTLINE

 

Overview

Tigressdata is a remote computer built for processing and visualizing your data. It can also be used for developing, debugging and testing codes. Its main advantage is having more powerful resources than the average personal computer (more CPUs, a powerful GPU, and more memory) to run visualization software.

Tigressdata mounts the filesystems from various clusters, but it intended for Tiger cluster users. Users with access to other clusters (e.g. Adroit, Della, Stellar) should use the visualization nodes associated with those clusters instead.

Some Technical Specifications:
Tigressdata is a single computer with 80 hardware threads (40 cores), 768 GB of RAM and an NVIDIA P100 GPU.

 

How to Access Tigressdata

Access to Tigressdata is automatically granted to users with accounts on Della and Tiger.

To use Tigressdata with graphical applications or general visualization work,see Option 2 in our guide to working with visualizations and graphical user-interface (GUI) applications.

To use Tigressdata for developing, debugging, and testing codes, we recommend connecting through SSH. Instructions can be found under Option 3 of our "Working with Visualizations and GUI Applications on the HPC Clusters" Knowledge Base article (found under the Support menu).

Launching Jobs

There is no batch scheduler running on Tigressdata. Users should launch processes directly from the command line such as:

$ python myscript.py

Because Tigressdata is shared among many users, each user is responsible for limiting the resources that they consume. While there is a mechanism in place to prevent a single user from monopolizing the machine, it does not work perfectly in all scenarios. Please write to [email protected] if a user is abusing Tigressdata by consuming the majority of resources.

Users may not launch a process that will overburden the system. Before launching any memory-intensive or compute-intensive task, please check:

$ cat /proc/loadavg
$ cat /proc/meminfo

The first three values in the loadavg output are running averages of the system load for the last 1, 5 and 15 minutes. The machine can support up to 80 hardware threads so load values around 10 are small while values around 60 are large. Oversubscribing the system with CPU-bound tasks will severely compromise throughput for all users. The MemFree field of the meminfo output is the most important to check before launching a task with a large memory footprint. However, if the SwapFree is significantly less than the SwapTotal value, then performance is already compromised, and adding another memory-intensive task will only exacerbate the problem.

Use the following command to see the usage by all users:

$ htop

The RES column indicates the memory usage and the S column indicates the state (R is running, S is sleeping). Processes in a state of S are generally not of interest. The CPU usage is indicated by CPU%. A value of 400 for CPU% means that the process is using four CPU-cores at 100% utilization each. Press the q key to exit from htop.

The processes of a specific user can be seen with the following command:

$ htop -u <NetID>

Jupyter Notebooks

Instead of using Tigressdata, consider using alternative options for running Jupyter notebooks.

Batch and Interactive Jobs

Tigressdata is specifically for visualization and data analysis. You should submit batch or interactive jobs from the login nodes of the clusters when possible. This will keep Tigressdata free for those who need it.

 

Hardware Configuration

Tigressdata is a single machine with the following specs:

Processor Cores Hardware Threads Memory Max Instruction Set
2.4 GHz Intel Skylake 40 80 768 GB AVX-512

Tigressdata also offers an NVIDIA P100 GPU with 16 GB of memory.

 

What's Available on Tigressdata?

Available Software and Programming Languages

Note that this list is updated periodically, and may not always reflect the latest software, display programs, or languages on Tigressdata. The best way to check for all available resources is to log into Tigressdata and explore the options detailed in the Explore All Available Programs below.

Software Available by Loading Modules

agilent
anaconda
ansys
boost
cadence
cuda
ddt
EDEM
emacs
fftw
gdal
gsl
gurobi
hdf5
IDL
intel compiler
julia
lapack
lumerical
map
mathematica
matlab
netcdf
paraview
paraview-headless
pyferret/gcc/netcdf-4.4.0/hdf5
rh/devtoolset
samtools
spark/hadoop1
stata
synopsys/saed-mc
turbovnc
virtualgl
visit

Display and Visualization Programs

Programs with graphical user interfaces will run with best performance within the TurboVNC virtual desktop, as mentioned in access Option 2 above. View the full instructions for TurboVNC on tigressdata for help.

X-window tools, such as xmgrace
ImageMagick suite, such as display, composite, montage
evince
eog
ffmpeg, ffplay
firefox
imagej
mate
mathematica
matlab
ncview
mayavi2
paraview
visit

Programming languages

C
Fortran
Java
Julia
Python (anaconda)
R
Stata

Explore All Available Programs

Tigressdata runs the Springdale Linux operating systems. There is a large collection of software tools for working with files and directories.  

Many command line programs can be found in the /usr/bin directory, such as gedit, ssh, and more.
To see all these, run:  

$ ls /usr/bin

You can identify relevant software by running the apropos command, such as:  

$ apropos search


You can follow up the apropos listing by checking the on-line manual page for an item by running:

$ man grep

More information is available in our Getting Started with the Research Computing Clusters guide.

Additional software on Tigressdata is organized in "modules".  
To see the available software modules, run:

$ module avail

Loading a module will add directories to your path and set environment variables.
To see the effect of a module, e.g. run:

$ module show anaconda3/2020.7

See our Knowledge Base modules page for more details.

 

Accessing Files from Della and Tiger on Tigressdata

Tigressdata mounts both the /tigress and /projects folders that are found on most of Research Computing's clusters. This means any files saved in these folders from Della or Tiger can be accessed from Tigressdata.

The figure below makes it clear that the /scratch/gpfs/filesystems of Della and Tiger as well as /tigress and /projects are accessible from Tigressdata:

HPC clusters and the filesystems that are available to each. Users should write job output to /scratch/gpfs.

For example, to access files on the /scratch/gpfs filesystem of Della from Tigressdata use the path /della/scratch/gpfs/<YourNetID>

$ ssh <YourNetID>@tigressdata.princeton.edu
$ ls /della/scratch/gpfs/<YourNetID>

The commands above also apply to Tiger with the appropriate changes. The paths are also shown in the output of the "checkquota" command.

 

Web Scraping

Follow the directions below to launch a web scraping Python script using Selenium and Chromedriver:

$ ssh <YourNetID>@tigressdata.princeton.edu
$ mkdir -p software && cd software
$ wget https://chromedriver.storage.googleapis.com/89.0.4389.23/chromedriver_linux64.zip
$ unzip chromedriver_linux64.zip

Create a Conda environment on tigressdata:

$ module load anaconda3/2020.11
$ conda create --name sel-env selenium -y

In your Python script:

DRIVER_PATH = "/home/<YourNetID>/software/chromedriver"
driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)

Then run with:

$ conda activate sel-env
$ python myscript.py

If your SSH connection is lost then the process corresponding to your running Python script will be terminated. You should learn about tmux or nohup for dealing with this. Note that Chromium 90 has been installed in place of Chrome. If for some reason you need to launch the web browser GUI then the command is "chromium-browser". Be aware that Tigressdata goes offline for maintenance on the second Tuesday of every month (all processes are killed including tmux sessions and commands running under nohup).

 

Downloading Large Files over Long Times

Let's say that you want to download one or more large files to Tiger. There are two concerns with this. First, one should always avoid running jobs on the login nodes since it is shared by many users. The solution to this is to use Tigressdata. Second, you may lose your SSH connection to Tigressdata during the download if you need to stay connected for hours or days. The solution is to use tmux which will create a persistent session on Tigressdata that will be available until downtime, which is the second Tuesday of every month. If the SSH connection drops then it is no problem since the tmux session will continue running on Tigressdata. The commands below illustrate the procedure:

$ ssh <YourNetID>@tigressdata.princeton.edu
$ tmux
$ cd /della/scratch/gpfs/<YourNetID>
$ wget https://www.bigdata.org/dataset.tar.gz
$ exit  # close tmux when the download completes
$ exit  # close your SSH session on tigressdata

Learn more about tmux on the Connecting to the Clusters page.

 

Downloading Large Files using Firefox

In some cases it is useful to download large files using Firefox on Tigressdata. Follow these steps with the appropriate X server software running (e.g., XQuartz):

$ ssh -X <YourNetID>@tigressdata.princeton.edu

Launch firefox by clicking on the icon when the graphical desktop appears. You can also launch a terminal and run "firefox" on the command line.

Once firefox loads, click on the "hamburger" icon or the three horizontal lines in the upper right. Then choose "Preferences" and set "Save files to" by clicking on "Browse...". Click on "Other Locations" then "Computer". "home", "tigress" and "projects" are immediately available. To save to /scratch/gpfs on Della, for instance, choose "della" then "scratch" then "gpfs" and so on.

Once the path is set, try to do the download.

 

Tigressdata is Not Accessible from Stellar

Please use the Stellar visualization nodes when working on Stellar.

 

Important Guidelines

As its name implies, Tigressdata has fiber connectivity to /tigress (and /projects), which is the large archival storage system. There is also NFS connectivity to selected parallel or scratch storage spaces allocated to the Princeton clusters. Several commercial and open-source packages are installed on Tigressdata.

Please be mindful that Tigressdata is a shared resource for all users (i.e., there is no job scheduler). Use the htop command and see "Job Scheduling" above to monitor usage.