AlphaFold
The AlphaFold dataset is available on Della and Tiger, in the /scratch/gpfs/DATASETS folder. To view the dataset on Della, for example:
$ ssh <YourNetID>@della.princeton.edu $ cd /scratch/gpfs/DATASETS $ ls -l total 1 drwxr-xr-x. 4 root root 4096 Oct 21 08:49 alphafold -rw-r--r--. 1 root root 84 Oct 21 08:50 README.txt
On Della there is an environment module for alphafold as well:
$ module avail alphafold
After loading the module you should be able to run the software.
BLOOM
BigScience Large Open-science Open-access Multilingual Language Model (BLOOM)
Version 1.3 / 6 July 2022
Read a description of the model.
$ ssh <YourNetID>@della.princeton.edu $ cd /scratch/gpfs/DATASETS/bloom_model_1.3/bloom
CIFAR
CIFAR 10 and 100 are avaiable:
$ ssh <YourNetID>@della.princeton.edu $ cd /scratch/gpfs/DATASETS/cifar
Hugging Face
There is currently one HF dataset available (i.e., the "en" variant of the c4 dataset):
$ ssh <YourNetID>@della.princeton.edu $ cd /scratch/gpfs/DATASETS/hugging_face/c4 $ ls -l en
Imagenet
$ ssh <YourNetID>@della.princeton.edu $ cd /scratch/gpfs/DATASETS/imagenet $ ls -l total 3 drwxr-xr-x. 4 root root 4096 Mar 8 2022 ilsvrc_2012_2017_face_obfuscation drwxr-xr-x. 4 root root 4096 Mar 8 2022 ilsvrc_2012_classification_localization drwxr-xr-x. 5 root root 4096 Jan 13 11:56 imagenet21k_resized drwxr-xr-x. 7 root root 4096 Feb 28 23:48 imagenet_c -rw-r--r--. 1 root root 111 Mar 8 2022 README
For example, with PyTorch:
$ conda activate torch-env (torch-env) $ python >>> from torchvision import datasets >>> datasets.ImageNet(root="/scratch/gpfs/DATASETS/imagenet/ilsvrc_2012_classification_localization", split="train") >>> datasets.ImageNet(root="/scratch/gpfs/DATASETS/imagenet/ilsvrc_2012_classification_localization", split="val")
Additional Datasets
Please write to [email protected] to request that a certain dataset be made available. Requests should be made for datasets that are of interest to multiple users.