Analyse Core Dumps
When a program crashes, it often generates a core dump file that contains a snapshot of its memory at the time of the crash. This file can be invaluable for debugging purposes.
This guide will walk you through the steps to view and analyze core dumps using the tool made available by System team for containerized applications running in the CI/CD kubernetes cluster.
In SKA Infrastructure, it has been adopted the core-dump-handler tool, which provides (by running a privileged container in each k8s cluster node) the ability to capture and store core dumps from containerized applications.
For every core dump a ZIP file is generated (containing a full description of the crashing process, the OCI image where it was running, the pod’s metadata and running state) and sent to an object storage bucket (typically AWS S3) for later retrieval and analysis. The name of the ZIP file comes from the template “ns-{namespace}-hn-{hostname}-en-{exe_name}-ts-{timestamp}-uuid-{uuid}”.
Core dump zip files will be stored for 5 days in the object storage, after which they will be automatically deleted.
To be able to search and view the core dumps stored in the object storage, a web application has been developed and is available here.
This application allows users to browse, search, and download core dumps through a user-friendly web interface.
Once downloaded, core dumps can be analyzed using debugging tools such as gdb or lldb, depending on the programming language used to develop the application.
A script is also provided and available at this link which extracts the zip file, prints metadata, and (if possible) runs gdb or pystack to produce a backtrace report. A typical usage of the script is as follows:
$ ./analyse.sh core-dump.zip --docker --pystack --show-pystack
Unable to find image 'artefact.skao.int/ska-tmc-subarraynode@sha256:9a9d28e3e496fbcdd22443fcc0bfeffb2abcaba02ba3133332ada55276450208' locally
artefact.skao.int/ska-tmc-subarraynode@sha256:9a9d28e3e496fbcdd22443fcc0bfeffb2abcaba02ba3133332ada55276450208: Pulling from ska-tmc-subarraynode
7478e0ac0f23: Pull complete
070a91a8e828: Pull complete
98d838d9b7d9: Pull complete
ff1c6cb153f3: Pull complete
60c920007381: Pull complete
d12ebc392b5b: Pull complete
a980f0878a7f: Pull complete
08dadd123c16: Pull complete
62094daab6c9: Pull complete
f966d036089d: Pull complete
dff4cd84dbd0: Pull complete
29d120112833: Pull complete
caf9f9db3f18: Pull complete
6cb6eb99c810: Pull complete
53e78ac39196: Pull complete
54544d35400d: Pull complete
9184e821f24e: Pull complete
d218eec4a8b4: Pull complete
Digest: sha256:9a9d28e3e496fbcdd22443fcc0bfeffb2abcaba02ba3133332ada55276450208
Status: Downloaded newer image for artefact.skao.int/ska-tmc-subarraynode@sha256:9a9d28e3e496fbcdd22443fcc0bfeffb2abcaba02ba3133332ada55276450208
/tmp/core-dump-CgC50H/gdb-report.txt
/tmp/core-dump-CgC50H/gdb-report.docker.txt
/tmp/core-dump-CgC50H/pystack-report.txt
WARNING(process_core): The interpreter is shutting itself down so it is possible that no Python stack trace is available for inspection. You can still use --native-all to force displaying all the threads.
💀 Could not gather enough information to extract the Python frame information 💀
There was not enough information to locate the necessary data structures to
obtain the Python stack trace from the process or the core file. Some scenarios
where this can happen are:
* The Python interpreter is shutting down and the internal structures are not
available or the memory is being freed.
* The process is embedding Python but the embedding machinery is not activated
in the process or is stale (this can happen with processes like mod_wsgi).
* If you are analyzing a core file, is possible that the binary and all the
shared libraries that were used to create the core file are not present in
the machine you are using to do the analysis. In this case the core may
report not enough information to resolve the necessary symbols. If you see
messages reporting missing shared libraries, you can try using the
`--lib-search-root` or `--lib-search-path` options to indicate locations to
search for shared libraries. For example:
$ pystack core $COREFILE $EXECUTABLE --lib-search-root /path/to/shared/libs/
The `--lib-search-root` option is especially useful for self-contained
applications that pack all shared libraries as this can be simply pointed
to the root of the self-contained bundle.
You can still try to call pystack with the `--exhaustive` option to activate
more exhaustive (and slower) methods to obtain the necessary information. This
can take some extra time but it can resolve the Python stack even in some of
the most challenging situations.
Core file information:
state: D zombie: True niceness: 0
pid: 1 ppid: 0 sid: 1
uid: 1000 gid: 1000 pgrp: 1
executable: python3 arguments: python3 /subarray_node_mid.py 01 -ORBendPoint giop:tcp::45450 -ORBendPointPubli
The process died due to receiving a SIGSEGV signal.