How to monitor a pipeline job on AWS ==================================== This guide describes how to monitor a pipeline job running on the AWS DP HPC cluster using the SLURM CLI. Related ------- - :doc:`How to run a pipeline on AWS ` - `CIMG pipeline can be monitored using the Prefect UI `_ Prerequisites ------------- - An account on the AWS DP HPC cluster - A SLURM job currently running on the cluster Steps ----- Use standard SLURM tools to monitor the submitted job while it is running. 1. Identify the job ID from ``squeue`` or the output of ``sbatch``: .. code-block:: bash squeue -u "$USER" To continuously monitor queue status of your own jobs, use ``watch``: .. code-block:: bash watch -n 1 squeue -u "$USER" 2. Inspect detailed job state and allocated resources: .. code-block:: bash scontrol show job 3. Check accounting information (state, start/end time, exit code): .. code-block:: bash sacct -j --format=JobID,JobName,Partition,AllocCPUS,State,ExitCode,Elapsed,Start,End 4. Follow the SLURM output log in real time (you might need to wait until the status changes from configuring (CF) to running (R) before the log file is created): .. code-block:: bash tail -f slurm-.out The output log filename is configurable via ``#SBATCH --output`` (or ``sbatch --output``). Check the relevant pipeline's SLURM script for the exact pattern, for example ``#SBATCH --output=slurm-%x-%j.out``. 5. If you need to cancel the job, use the ``scancel`` command with the job ID: .. code-block:: bash scancel Or cancel all of your own jobs: .. code-block:: bash scancel -u "$USER" Finishing up ------------ Once the SLURM job has finished, the data products will be available in the output directory. Check your pipeline's documentation for the expected output location.