Crash Manager
Linux process crash manager
The CrashManager is a new coredump handler and crash manager for linux system which brings more features to existing coredump handling solutions.
Highlights:
Coredumps are pre-processed while parsing the coredump stream from the kernel to generate IDs for easy categorization of the crashes (CrashID -> based on Instruction Pointer and Return Address, VectorID -> Return address only)
Coredumps are context aware, crashes in containers can be easily identified (ContextID) and filtered with automatic container name labelling for LXC containers
Automatic management of coredump database size limits defined in the configuration file
The coredump output is one file, a compressed tarball containing the coredump and the context files (including binary) defined in the configuration file as context information per process. See default configuration file: https://github.com/anpopa/crashmanager/blob/master/config/crashmanager.conf.in
The coredump output is a standard compressed tarball no extra tooling is required to extract the information
Dynamic content support dumping binary files as well. This is very useful to embed data like screenshots, textures, databases, etc.
Support for cascade crashing. When a process crash analyses require peer process coredump for debugging (eg. generate a server coredump when a client is crashing with ipc timeout)
The component is using libarchive to create the output so the compression algorithm can be easily changed at build time
A crash journal is created and maintained on target with information like the history of crashes, file transfer states, removed crashdumps, etc.
The component provides a new tool crashinfo which can be used on target to extract journal information and/or in SDK to easily extract crash information (obtaining the backtrace is as easy as crashinfo --bt <crashdump_archive.cdh.tar.gz>
Examples
Executing crashinfo without arguments on target list the crash history with context information about the crashes:
% crashinfo
Idx Procname Timestamp CrashID VectorID Context PID TRS REM FILE
1 crashtest 07:52:01 2019-08-29 4747714566DE87D2 4BD0D5866D3FA284 debian 4356 1 0 crashtest.4356.1567065121.cdh.tar.gz
2 crashtest 08:24:08 2019-08-29 D170728BBE14D94F D1E0A2C7F0051A0A debian 4556 1 0 crashtest.4556.1567067048.cdh.tar.gz
3 crashtest 08:24:30 2019-08-29 D170728BBE14D94F D1E0A2C7F0051A0A debian 4576 1 0 crashtest.4576.1567067070.cdh.tar.gz
4 crashtest 08:23:02 2019-08-30 D170728BBE14D94F D1E0A2C7F0051A0A debian 6184 1 0 crashtest.6184.1567153382.cdh.tar.gz
To see the context information about a crash just use the info argument and the crash archive name (or path):
% crashinfo --info crashtest.6184.1567153382.cdh.tar.gz
[crashdata]
ProcessName = crashtest
ProcessThread = crashtest
ProcessExe = /usr/local/bin/crashtest
LifecycleState = running
CrashTimestamp = 1567153382
ProcessID = 6184
ResidentID = 6184
CrashSignal = 11
CrashID = D170728BBE14D94F
VectorID = D1E0A2C7F0051A0A
ContextID = F18BDD746CC08FED
ContextName = debian
IP = 0x0000558ce3f57576
RA = 0x00007fffbc00df30
IPFileOffset = 0x0000000000001576
RAFileOffset = 0x000000000002409b
IPModuleName = /usr/local/bin/crashtest
RAModuleName = /usr/lib/x86_64-linux-gnu/libc-2.28.so
CoredumpSize = 380928
The content of the archive is dynamic and we can see the content with files and print the content the content of any file with print:
% crashinfo --files crashtest.6184.1567153382.cdh.tar.gz
root.proc.6184.cmdline
root.proc.6184.fd
root.proc.6184.ns
root.proc.6184.cgroup
root.proc.6184.stack
root.proc.6184.environ
root.proc.6184.status
root.proc.6184.sched
root.proc.6184.maps
root.proc.6184.stat
root.proc.6184.smaps
core.crashtest.6184.0000
info.crashdata
% crashinfo --print root.proc.6184.fd crashtest.6184.1567153382.cdh.tar.gz
lrwx------ 1 1000 1000 64 0 -> /dev/pts/0
lrwx------ 1 1000 1000 64 1 -> /dev/pts/0
lrwx------ 1 1000 1000 64 2 -> /dev/pts/0
Because now the crashdump is embedding the coredump and the context information we can print the backtrace very easy in SDK:
% crashinfo --bt crashtest.6184.1567153382.cdh.tar.gz
Extracting coredump with size 380928 ... Done.
New file name: /tmp/.HXMK7Z/crashtest.6184.1567153382.core
Reading symbols from /usr/local/bin/crashtest...done.
[New LWP 6184]
Core was generated by `crashtest -t2'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 main (argc=2, argv=0x7fffbc00e018) at ../testing/crashtest/crashtest.c:146
146 *(int*)0 = 2;
#0 main (argc=2, argv=0x7fffbc00e018) at ../testing/crashtest/crashtest.c:146
Build and Install
The build system is meson so make sure you have meson installed:
% cd crashmanager
% meson setup build
% cd build
% ninja
% ninja install
Configuration
For complete list of configuation option use meson configure:
% meson configure
Errata
The repository is currently unavailable because of a potential liability with my current employer.