How to move and store data on AWS ================================= This guide explains how to transfer data between the cluster and S3. Related ------- - :doc:`Overview of the available AWS DP HPC cluster storage <../reference/sdp/aws-dp-hpc-cluster>` - :doc:`How to run a pipeline on AWS using SLURM ` - :doc:`How to start an interactive compute node on AWS ` - `S3 documentation `_ - `Accessing the clusters (Confluence) `_ Prerequisites ------------- - An account on the AWS DP HPC cluster Steps ----- 1. Create a project directory on shared storage To keep your data organized, create a project-specific directory on the shared storage: .. code-block:: bash mkdir -p /shared/fsx1// 2. Transfer data between S3 and shared storage The ``aws s3`` commands below can be run on the headnode or inside a SLURM job script. Use ``aws s3 sync`` to transfer whole directories and ``aws s3 cp`` for individual files. Before transferring data, you can run the command with ``--dryrun`` to verify source and destination paths. .. code-block:: bash aws s3 cp --dryrun s3://skao-sdp-testdata/path/to/file.ms /shared/fsx1// aws s3 sync --dryrun /shared/fsx1//dataset/ s3://skao-sdp-testdata/path/to/dataset/ Before transferring data, you can run the command with ``--dryrun`` to verify source and destination paths. .. code-block:: bash aws s3 cp --dryrun s3://skao-sdp-testdata/path/to/file.ms /shared/fsx1// aws s3 sync --dryrun /shared/fsx1//dataset/ s3://skao-sdp-testdata/path/to/dataset/ a. Copy a single file from S3 to shared storage .. code-block:: bash aws s3 cp s3://skao-sdp-testdata/path/to/file.ms /shared/fsx1// b. Copy a single file from shared storage to S3 .. code-block:: bash aws s3 cp /shared/fsx1//output.fits s3://skao-sdp-testdata/path/to/output.fits c. Sync a directory from S3 to shared storage This downloads any files in the S3 prefix that are missing from the local directory (or that have changed): .. code-block:: bash aws s3 sync s3://skao-sdp-testdata/path/to/dataset/ /shared/fsx1//dataset/ d. Sync a directory from shared storage to S3 This uploads any files in the local directory that are missing from the S3 prefix (or that have changed): .. code-block:: bash aws s3 sync /shared/fsx1//dataset/ s3://skao-sdp-testdata/path/to/dataset/ .. note:: ``aws s3 sync`` only copies files that are new or modified. It does **not** delete files in the destination that have been removed from the source unless you add the ``--delete`` flag. Use ``--delete`` with care. Verification ------------ To confirm that the transfer completed successfully, list the files at the destination: - For S3: .. code-block:: bash aws s3 ls s3://skao-sdp-testdata/path/to/dataset/ - For shared storage: .. code-block:: bash ls -lh /shared/fsx1//dataset/