Datasets

AlphaFold

The AlphaFold dataset is available on Della and Tiger, in the /scratch/gpfs/DATASETS folder. To view the dataset on Della, for example:

$ ssh <YourNetID>@della.princeton.edu
$ cd /scratch/gpfs/DATASETS
$ ls -l
total 1
drwxr-xr-x. 4 root root 4096 Oct 21 08:49 alphafold
-rw-r--r--. 1 root root   84 Oct 21 08:50 README.txt

On Della there is an environment module for alphafold as well:

$ module avail alphafold

After loading the module you should be able to run the software.

 

Imagenet

$ ssh <YourNetID>@della.princeton.edu
$ cd /scratch/gpfs/DATASETS/imagenet
$ ls -l
total 2
drwxr-xr-x. 4 root root 4096 Mar  8 13:46 ilsvrc_2012_2017_face_obfuscation
drwxr-xr-x. 4 root root 4096 Mar  8 14:34 ilsvrc_2012_classification_localization
-rw-r--r--. 1 root root  111 Mar  8 14:44 README

For example, with PyTorch:

$ conda activate torch-env
(torch-env) $ python
>>> from torchvision import datasets
>>> datasets.ImageNet(root="/scratch/gpfs/DATASETS/imagenet/ilsvrc_2012_classification_localization", split="train")
>>> datasets.ImageNet(root="/scratch/gpfs/DATASETS/imagenet/ilsvrc_2012_classification_localization", split="val")

 

BLOOM

BigScience Large Open-science Open-access Multilingual Language Model (BLOOM)
Version 1.3 / 6 July 2022

Read a description of the model.

$ ssh <YourNetID>@della.princeton.edu
$ cd /scratch/gpfs/DATASETS/bloom_model_1.3/bloom

 

Additional Datasets

Please write to cses@princeton.edu to request that a certain dataset be made available. Requests should be made for datasets that are of interest to multiple users.