How-to Cards
The practical and handy reference

Data transfer between Atlas and UL HPC Clusters#

A recommended storage pattern is to have the master copy of data on Atlas (project folder) and only store data on the UL HPC Clusters temporarily for the required practical duration of computational analysis. The derived data and results should hereafter be transferred back to Atlas. This How-to Card describes the different methods to transfer data between Atlas and the UL HPC Clusters. The three recommended methods to transfer data are:

  1. Via laptop with scp or rsync
  2. Via dedicated Virtual Machine (VM)
  3. Via Large File Transfer (LFT)

Please refer to the dedicated knowledge bases to see how to connect to UL HPC Clusters and to mount Atlas.

data-transfer-flow.png

1. Via laptop using scp or rsync#

When using the UL laptop to transfer data between UL HPC Clusters and Atlas, you must mount Atlas via smb on laptop before using scp and rsync for the transfer. While both commands ensure a secure transfer of data between the UL HPC Clusters and Atlas, rsync may be much faster for handling large amounts of small files (which are transferred very quickly in batches), and for selective incremental updates of large datasets (it is capable to automatically transferring only the changed files, thus saving time).

  • scp: transfers all files and directories.
  • rsync: transfers only the files which differ between the source and the destination.

Please visit the UL HPC documentation to see how to use rsync and scp.

2. Via dedicated Virtual Machine (VM) using rsync#

Data can be transferred via a dedicated VM, which can be requested via ServiceNow. Instead of transferring data between Atlas and UL HPC Clusters through the laptop as described above, the transfer will go through the dedicated VM. Once connected to the VM and mounted to Atlas, the rsync command can be used in the same way as described in the UL HPC documentation. This method is recommended for recurring transfers of very large datasets that benefit from high-speed network connection between the VM and the HPC.

Note: For larger transfers between Atlas and UL HPC Clusters, you may want to run the operations in background using screen or tmux. These prevent interruption of data transfer in cases when your ssh connection gets interrupted.

3. Via Large File Transfer (LFT)#

An alternative solution is to use LFT for transferring data between Atlas and UL HPC Clusters. This method can reliably transfer large data volumes (typically several terabytes). However, LFT can only be used if the data is already on LFT (e.g., received from external collaborators). In this case, you can make a copy of the data and directly download it to the UL HPC Clusters for computational analysis. Note that a master copy of the data must still be manually uploaded to Atlas for internal archival.

Please refer to the dedicated How-to Card on LFT for detailed information.

Note: In cases when the analysis data are not already received via LFT, we strongly recommend to use one of the other (simpler) methods instead.