Sharing data with other Research Computing users and external collaborators

Introduction

This page presents ways to share data with other users on the Princeton HPC clusters. To share data outside of the Princeton HPC systems see Sharing Data Outside of Princeton. Note that the word "group" on this page always refers to Unix groups and not research groups.

Unix Permissions

In brief, each file on the cluster has read, write and execute permissions for the user, group and other. The user is you, the group is like the department, institute or even research group you are in (e.g., politics, chem, pni, eac, gaussian), and other is all the users that are not you and not in your group.

Before continuing, you should be familiar with the chmod command for setting the permissions on your directories and files. If you are not then see one of the many online tutorials such as this one.

PermissionMeaning for filesMeaning for directories
read (r)Contents of the file can be displayedContents of directory can be listed
write (w)File can be modified or deletedFiles can be created in or deleted from directory
execute (x)File can be run like a programDirectory can be entered (i.e., the cd command works)

World-Readable Directories

Now that you know about the chmod command, to create a directory that everyone on the cluster can access, the user with NetID aturing would use these commands:

$ ssh [email protected]
$ chmod 755 /home/aturing
$ mkdir public
$ chmod 755 public

One can also use the following syntax:

$ chmod u=rwx,go=rx public

In the command above, the read (r) and execute (x) permissions for the group (g) and other (o) are enabled. Next, check that the permissions are correct:

$ ls -ld /home/aturing
drwxr-xr-x. 4 aturing math 9 Jun 23 15:31 /home/aturing
$ ls -ld public
drwxr-xr-x. 2 aturing math 6 Jun 23 15:32 public

We see that the read and execute permissions for group and other are properly set which will allow anyone on the cluster to see the contents of public. As emphasized below, it is necessary to set the rx permisions for each directory in the path to a file to make it accessible.

To make a file in public readable by everyone on the cluster, use chmod on the file to set its permissions:

$ cd public
$ touch myfile.dat
$ chmod 744 myfile.dat

Now myfile.dat can be read by everyone on the cluster. We can check the permissions on myfile.dat with:

$ ls -l myfile.dat
-rw-r--r--. 1 aturing math 0 June 23 15:33 myfile.dat

Note that if public was set to 744 or u=rwx,go=r then myfile.dat would not be accessible. The execution bit must be set on a directory for its contents to be accessible. In general, for a file to be accessible, the rx permissions of every directory in the path to the file must be enabled (in this case for group and other). This explains why the chmod command was applied to both /home/aturing and public. If public was 755 and /home/aturing was 744 then public would not be accessible.

One can see all of the permissions for a given path using the following command:

$ namei -l /home/aturing/public/myfile.dat
f: /home/jdh4/public/myfile.dat
dr-xr-xr-x root    root /
drwxr-xr-x root    root home
drwxr-xr-x aturing math aturing
drwxr-xr-x aturing math public
-rw-r--r-- aturing math myfile.dat

Usually one wants to restrict access to directories and files and we show that next.

Group-Readable Directories

Let's say you want to make a directory that only people in your Unix group can use but no one else. First determine which group you are in and which groups other users are in. Run the groups command to see your group(s):

$ groups
math

To see the groups of another user such as einstein:

$ groups einstein
einstein : physics

To see all the groups and their members:

$ getent group

Now that you know about groups, let's return to the task of making a folder in /scratch/gpfs that only members of your group can look at. If your NetID is aturing then the set of commands would look like this:

$ chmod 750 /scratch/gpfs/aturing
$ cd /scratch/gpfs/aturing
$ mkdir groupdata
$ chmod 750 groupdata

Let's check the results:

$ ls -ld /scratch/gpfs/aturing
drwxr-x---. 4 aturing math 9 Jun 23 15:31 /scratch/gpfs/aturing
$ ls -ld /scratch/gpfs/aturing/groupdata
drwxr-x---. 2 aturing math 6 Jun 23 15:32 /scratch/gpfs/aturing/groupdata

The above looks good. Next, to create a file in groupdata that is readable by group members only:

$ cd groupdata
$ touch myfile1.dat
$ chmod 740 myfile1.dat

To create a file in groupdata that is readable and writable by group members only:

$ touch myfile2.dat
$ chmod 760 myfile2.dat

If you create a subdirectory within groupdata then it would need to have read and execute permissions (750 or u=rwx,g=rx,o=) in order for group members to access its contents. That is, for a file to be accessible, the rx permissions of every directory in the path to that file must be enabled (in this case at the group level). This explains why the chmod command was used on both /scratch/gpfs/aturing and groupdata. If the rx permissions on groupdata were enabled but those of /scratch/gpfs/aturing were not then the file in groupdata would not be accessible.

Multiple Groups

If you belong to multiple groups then if needed use the chgrp command to change the group associated with a file or directory. For example, if aturing belonged to math and cs then the group associated with myfile.dat can be changed from math to cs as follows:

$ ls -l myfile
-rw-r--r--. 1 aturing math 0 June 23 15:33 myfile.dat
$ chgrp cs myfile
$ ls -l myfile
-rw-r--r--. 1 aturing cs   0 June 23 15:33 myfile.dat

Now myfile.dat is only available to members of the cs group and not any other group. Note that by default there can be only one group associated with a given file or directory. You can use ACLs (see below) when members of multiple groups need access to a file.

Access Control Lists (ACLs)

Access control lists or ACLs allow you to go well beyond what is possible with group-readable directories. This is essential when not all members of your group should have access to the data or when you need to share the data with a user in a different group.

To illustrate the use of ACLs, let's imagine that a user with the NetID aturing wants to share files with the user einstein. Below is the sequence of commands that aturing would execute to create a directory and file that only einstein could access.

Begin by connecting to the cluster and checking the permissions on the parent directory:

$ ssh [email protected]
$ ls -ld /scratch/gpfs/aturing
drwx------.  2 aturing   math   512   Jun 23 12:42   /scratch/gpfs/aturing

We see that the permissions on /scratch/gpfs/aturing are rwx------ or 700, which means that only aturing can access the contents of that directory. In general this is the right starting point. If the permissions were other than rwx------ then it may be necessary to use the chmod command to make the permissions more restrictive. The leading "d" in the permissions is an indication that /scratch/gpfs/aturing is a directory and not a simple file.

Next, make the directory and file to be shared:

$ cd /scratch/gpfs/aturing
$ mkdir data4two
$ chmod 700 data4two
$ cd data4two
$ touch file.dat
$ chmod 700 file.dat

The ACLs are set so that only user einstein can access the file:

$ setfacl -m u:einstein:rx /scratch/gpfs/aturing
$ setfacl -m u:einstein:rx /scratch/gpfs/aturing/data4two
$ setfacl -m u:einstein:rw file.dat

The commands above give only einstein read and write access to file.dat. If only read access is needed then replace "rw" with "r" in the last command. Note that the rx permissions of each directory in the path to file.dat had to be enabled. If the rx permissions of data4two were set but those of the parent directory /scratch/gpfs/aturing were not then the file would not be accessible. If another user needs the same access as einstein then aturing can run the three setfacl commands above for that user.

To view the ACL settings for file.dat:

$ getfacl file.dat
# file: file.dat
# owner: aturing
# group: math
user::rwx
user:einstein:rw-
group::---
mask::rw-
other::---

Let's return to the permissions on /scratch/gpfs/aturing:

$ ls -ld /scratch/gpfs/aturing
drwx------+  2 aturing   math  512 Jun 23 12:42 /scratch/gpfs/aturing

We see that a "+" character has replaced the "." at the end of the permissions. This is an indication of ACLs.

To remove the ACLs from file.dat:

$ setfacl -b file.dat

For more on ACLs see one of the many online tutorials such as this one.

Note that ACLs can become tedious and for complicated cases they can produce unintuitive results. If you find yourself in one these scenarios then consider asking CSES to make a new group as discussed below.

New Groups

When ACLs are insufficient or overly involved then usually the solution is to create a new group. As discussed above, a group is a collection of NetIDs. Directories and files can be configured such that only members of a group can work with them. If you believe a new group is needed then send a request to [email protected].

/projects

Previously, users were given access to /tigress/<YourNetID> for storage. As of fall 2020, the new approach is to create a single directory or fileset on /projects named after the PI of the user's research group. Individual group members can create directories for their work within this directory. Like /tigress, /projects is also backed up so users should be responsible about the number and size of the files they store. The new approach allows group members to more easily share files. Also, when a group member leaves the university, their files on /projects will still be available to the group. PI's can write to [email protected] to request a fileset for their group on /projects.

Research Computer User

In cases where a collaborator needs to work with large amounts of data on the Princeton systems, it may make sense to have a temporary account created so that the collaborator can work directly with the data. This sidesteps the need for a large data transfer. Faculty and approved staff can request a Research Computer User (RCU) account from central OIT in such cases. After the account has been created by central OIT, request a Research Computing account by sending the NetID and cluster name to [email protected]. The request to Research Computing must come from a PI that has already sponsored researchers on the desired cluster.

Sharing Data Outside of Princeton

For a list of resources for sharing data with collaborators see Sharing Data Outside of Princeton.