- How to Access Tigressdata
- Hardware Configuration
- What's Available on Tigressdata?
- Accessing Files from Della and Tiger on Tigressdata
- Web Scraping
- Downloading Large Files over Long Times
- Downloading Large Files using Firefox
- Tigressdata is Not Accessible from Stellar
- Important Guidelines
Tigressdata is a remote computer built for processing and visualizing your data. It can also be used for developing, debugging and testing codes. Its main advantage is having more powerful resources than the average personal computer (more CPUs, a powerful GPU, and more memory) to run visualization software.
Tigressdata mounts the filesystems from various clusters, but it intended for Tiger cluster users. Users with access to other clusters (e.g. Adroit, Della, Stellar) should use the visualization nodes associated with those clusters instead.
Some Technical Specifications:
Tigressdata is a single computer with 80 hardware threads (40 cores), 768 GB of RAM and an NVIDIA P100 GPU.
Access to Tigressdata is automatically granted to users with accounts on Della and Tiger.
To use Tigressdata with graphical applications or general visualization work,see Option 2 in our guide to working with visualizations and graphical user-interface (GUI) applications.
To use Tigressdata for developing, debugging, and testing codes, we recommend connecting through SSH. Instructions can be found under Option 3 of our "Working with Visualizations and GUI Applications on the HPC Clusters" Knowledge Base article (found under the Support menu).
There is no batch scheduler running on Tigressdata. Users should launch processes directly from the command line such as:
$ python myscript.py
Because Tigressdata is shared among many users, each user is responsible for limiting the resources that they consume. While there is a mechanism in place to prevent a single user from monopolizing the machine, it does not work perfectly in all scenarios. Please write to [email protected] if a user is abusing Tigressdata by consuming the majority of resources.
Users may not launch a process that will overburden the system. Before launching any memory-intensive or compute-intensive task, please check:
$ cat /proc/loadavg $ cat /proc/meminfo
The first three values in the loadavg output are running averages of the system load for the last 1, 5 and 15 minutes. The machine can support up to 80 hardware threads so load values around 10 are small while values around 60 are large. Oversubscribing the system with CPU-bound tasks will severely compromise throughput for all users. The MemFree field of the meminfo output is the most important to check before launching a task with a large memory footprint. However, if the SwapFree is significantly less than the SwapTotal value, then performance is already compromised, and adding another memory-intensive task will only exacerbate the problem.
Use the following command to see the usage by all users:
The RES column indicates the memory usage and the S column indicates the state (R is running, S is sleeping). Processes in a state of S are generally not of interest. The CPU usage is indicated by CPU%. A value of 400 for CPU% means that the process is using four CPU-cores at 100% utilization each. Press the q key to exit from htop.
The processes of a specific user can be seen with the following command:
$ htop -u <NetID>
Instead of using Tigressdata, consider using alternative options for running Jupyter notebooks.
Batch and Interactive Jobs
Tigressdata is specifically for visualization and data analysis. You should submit batch or interactive jobs from the login nodes of the clusters when possible. This will keep Tigressdata free for those who need it.
Tigressdata is a single machine with the following specs:
|Processor||Cores||Hardware Threads||Memory||Max Instruction Set|
|2.4 GHz Intel Skylake||40||80||768 GB||AVX-512|
Tigressdata also offers an NVIDIA P100 GPU with 16 GB of memory.
Available Software and Programming Languages
Note that this list is updated periodically, and may not always reflect the latest software, display programs, or languages on Tigressdata. The best way to check for all available resources is to log into Tigressdata and explore the options detailed in the Explore All Available Programs below.
Software Available by Loading Modules
Display and Visualization Programs
Programs with graphical user interfaces will run with best performance within the TurboVNC virtual desktop, as mentioned in access Option 2 above. View the full instructions for TurboVNC on tigressdata for help.
X-window tools, such as xmgrace
ImageMagick suite, such as display, composite, montage
Tigressdata runs the Springdale Linux operating systems. There is a large collection of software tools for working with files and directories.
Many command line programs can be found in the /usr/bin directory, such as gedit, ssh, and more.
To see all these, run:
$ ls /usr/bin
You can identify relevant software by running the apropos command, such as:
$ apropos search
You can follow up the apropos listing by checking the on-line manual page for an item by running:
$ man grep
More information is available in our Getting Started with the Research Computing Clusters guide.
Additional software on Tigressdata is organized in "modules".
To see the available software modules, run:
$ module avail
Loading a module will add directories to your path and set environment variables.
To see the effect of a module, e.g. run:
$ module show anaconda3/2020.7
See our Knowledge Base modules page for more details.
Tigressdata mounts both the /tigress and /projects folders that are found on most of Research Computing's clusters. This means any files saved in these folders from Della or Tiger can be accessed from Tigressdata.
The figure below makes it clear that the /scratch/gpfs/filesystems of Della and Tiger as well as /tigress and /projects are accessible from Tigressdata:
For example, to access files on the /scratch/gpfs filesystem of Della from Tigressdata use the path /della/scratch/gpfs/<YourNetID>
$ ssh <YourNetID>@tigressdata.princeton.edu $ ls /della/scratch/gpfs/<YourNetID>
The commands above also apply to Tiger with the appropriate changes. The paths are also shown in the output of the "checkquota" command.
Follow the directions below to launch a web scraping Python script using Selenium and Chromedriver:
$ ssh <YourNetID>@tigressdata.princeton.edu $ mkdir -p software && cd software $ wget https://chromedriver.storage.googleapis.com/89.0.4389.23/chromedriver_linux64.zip $ unzip chromedriver_linux64.zip
Create a Conda environment on tigressdata:
$ module load anaconda3/2020.11 $ conda create --name sel-env selenium -y
In your Python script:
DRIVER_PATH = "/home/<YourNetID>/software/chromedriver" driver = webdriver.Chrome(options=options, executable_path=DRIVER_PATH)
Then run with:
$ conda activate sel-env $ python myscript.py
If your SSH connection is lost then the process corresponding to your running Python script will be terminated. You should learn about tmux or nohup for dealing with this. Note that Chromium 90 has been installed in place of Chrome. If for some reason you need to launch the web browser GUI then the command is "chromium-browser". Be aware that Tigressdata goes offline for maintenance on the second Tuesday of every month (all processes are killed including tmux sessions and commands running under nohup).
Let's say that you want to download one or more large files to Tiger. There are two concerns with this. First, one should always avoid running jobs on the login nodes since it is shared by many users. The solution to this is to use Tigressdata. Second, you may lose your SSH connection to Tigressdata during the download if you need to stay connected for hours or days. The solution is to use tmux which will create a persistent session on Tigressdata that will be available until downtime, which is the second Tuesday of every month. If the SSH connection drops then it is no problem since the tmux session will continue running on Tigressdata. The commands below illustrate the procedure:
$ ssh <YourNetID>@tigressdata.princeton.edu $ tmux $ cd /della/scratch/gpfs/<YourNetID> $ wget https://www.bigdata.org/dataset.tar.gz $ exit # close tmux when the download completes $ exit # close your SSH session on tigressdata
Learn more about tmux on the Connecting to the Clusters page.
In some cases it is useful to download large files using Firefox on Tigressdata. Follow these steps with the appropriate X server software running (e.g., XQuartz):
$ ssh -X <YourNetID>@tigressdata.princeton.edu
Launch firefox by clicking on the icon when the graphical desktop appears. You can also launch a terminal and run "firefox" on the command line.
Once firefox loads, click on the "hamburger" icon or the three horizontal lines in the upper right. Then choose "Preferences" and set "Save files to" by clicking on "Browse...". Click on "Other Locations" then "Computer". "home", "tigress" and "projects" are immediately available. To save to /scratch/gpfs on Della, for instance, choose "della" then "scratch" then "gpfs" and so on.
Once the path is set, try to do the download.
Please use the Stellar visualization nodes when working on Stellar.
As its name implies, Tigressdata has fiber connectivity to /tigress (and /projects), which is the large archival storage system. There is also NFS connectivity to selected parallel or scratch storage spaces allocated to the Princeton clusters. Several commercial and open-source packages are installed on Tigressdata.
Please be mindful that Tigressdata is a shared resource for all users (i.e., there is no job scheduler). Use the htop command and see "Job Scheduling" above to monitor usage.