How to move and store data on AWS
This guide explains how to transfer data between the cluster and S3.
Prerequisites
An account on the AWS DP HPC cluster
Steps
Create a project directory on shared storage
To keep your data organized, create a project-specific directory on the shared storage:
mkdir -p /shared/fsx1/<your-project>/
Transfer data between S3 and shared storage
The aws s3 commands below can be run on the headnode or inside a SLURM job script. Use aws s3
sync to transfer whole directories and aws s3 cp for individual files.
Before transferring data, you can run the command with --dryrun to verify source and destination
paths.
aws s3 cp --dryrun s3://skao-sdp-testdata/path/to/file.ms /shared/fsx1/<your-project>/ aws s3 sync --dryrun /shared/fsx1/<your-project>/dataset/ s3://skao-sdp-testdata/path/to/dataset/
Before transferring data, you can run the command with --dryrun to verify source and destination
paths.
aws s3 cp --dryrun s3://skao-sdp-testdata/path/to/file.ms /shared/fsx1/<your-project>/ aws s3 sync --dryrun /shared/fsx1/<your-project>/dataset/ s3://skao-sdp-testdata/path/to/dataset/
Copy a single file from S3 to shared storage
aws s3 cp s3://skao-sdp-testdata/path/to/file.ms /shared/fsx1/<your-project>/
Copy a single file from shared storage to S3
aws s3 cp /shared/fsx1/<your-project>/output.fits s3://skao-sdp-testdata/path/to/output.fits
Sync a directory from S3 to shared storage
This downloads any files in the S3 prefix that are missing from the local directory (or that have changed):
aws s3 sync s3://skao-sdp-testdata/path/to/dataset/ /shared/fsx1/<your-project>/dataset/
Sync a directory from shared storage to S3
This uploads any files in the local directory that are missing from the S3 prefix (or that have changed):
aws s3 sync /shared/fsx1/<your-project>/dataset/ s3://skao-sdp-testdata/path/to/dataset/
Note
aws s3 sync only copies files that are new or modified. It does not delete files in the
destination that have been removed from the source unless you add the --delete flag. Use
--delete with care.
Verification
To confirm that the transfer completed successfully, list the files at the destination:
For S3:
aws s3 ls s3://skao-sdp-testdata/path/to/dataset/
For shared storage:
ls -lh /shared/fsx1/<your-project>/dataset/