Crash Manager

Linux process crash manager

The CrashManager is a new coredump handler and crash manager for linux system which brings more features to existing coredump handling solutions.

Highlights:

  • Coredumps are pre-processed while parsing the coredump stream from the kernel to generate IDs for easy categorization of the crashes (CrashID -> based on Instruction Pointer and Return Address, VectorID -> Return address only)

  • Coredumps are context aware, crashes in containers can be easily identified (ContextID) and filtered with automatic container name labelling for LXC containers

  • Automatic management of coredump database size limits defined in the configuration file

  • The coredump output is one file, a compressed tarball containing the coredump and the context files (including binary) defined in the configuration file as context information per process. See default configuration file: https://github.com/anpopa/crashmanager/blob/master/config/crashmanager.conf.in

  • The coredump output is a standard compressed tarball no extra tooling is required to extract the information

  • Dynamic content support dumping binary files as well. This is very useful to embed data like screenshots, textures, databases, etc.

  • Support for cascade crashing. When a process crash analyses require peer process coredump for debugging (eg. generate a server coredump when a client is crashing with ipc timeout)

  • The component is using libarchive to create the output so the compression algorithm can be easily changed at build time

  • A crash journal is created and maintained on target with information like the history of crashes, file transfer states, removed crashdumps, etc.

  • The component provides a new tool crashinfo which can be used on target to extract journal information and/or in SDK to easily extract crash information (obtaining the backtrace is as easy as crashinfo --bt <crashdump_archive.cdh.tar.gz>

Examples

Executing crashinfo without arguments on target list the crash history with context information about the crashes:

% crashinfo

Idx Procname Timestamp CrashID VectorID Context PID TRS REM FILE

1 crashtest 07:52:01 2019-08-29 4747714566DE87D2 4BD0D5866D3FA284 debian 4356 1 0 crashtest.4356.1567065121.cdh.tar.gz

2 crashtest 08:24:08 2019-08-29 D170728BBE14D94F D1E0A2C7F0051A0A debian 4556 1 0 crashtest.4556.1567067048.cdh.tar.gz

3 crashtest 08:24:30 2019-08-29 D170728BBE14D94F D1E0A2C7F0051A0A debian 4576 1 0 crashtest.4576.1567067070.cdh.tar.gz

4 crashtest 08:23:02 2019-08-30 D170728BBE14D94F D1E0A2C7F0051A0A debian 6184 1 0 crashtest.6184.1567153382.cdh.tar.gz

To see the context information about a crash just use the info argument and the crash archive name (or path):

% crashinfo --info crashtest.6184.1567153382.cdh.tar.gz

[crashdata]

ProcessName = crashtest

ProcessThread = crashtest

ProcessExe = /usr/local/bin/crashtest

LifecycleState = running

CrashTimestamp = 1567153382

ProcessID = 6184

ResidentID = 6184

CrashSignal = 11

CrashID = D170728BBE14D94F

VectorID = D1E0A2C7F0051A0A

ContextID = F18BDD746CC08FED

ContextName = debian

IP = 0x0000558ce3f57576

RA = 0x00007fffbc00df30

IPFileOffset = 0x0000000000001576

RAFileOffset = 0x000000000002409b

IPModuleName = /usr/local/bin/crashtest

RAModuleName = /usr/lib/x86_64-linux-gnu/libc-2.28.so

CoredumpSize = 380928

The content of the archive is dynamic and we can see the content with files and print the content the content of any file with print:

% crashinfo --files crashtest.6184.1567153382.cdh.tar.gz

root.proc.6184.cmdline

root.proc.6184.fd

root.proc.6184.ns

root.proc.6184.cgroup

root.proc.6184.stack

root.proc.6184.environ

root.proc.6184.status

root.proc.6184.sched

root.proc.6184.maps

root.proc.6184.stat

root.proc.6184.smaps

core.crashtest.6184.0000

info.crashdata


% crashinfo --print root.proc.6184.fd crashtest.6184.1567153382.cdh.tar.gz

lrwx------ 1 1000 1000 64 0 -> /dev/pts/0

lrwx------ 1 1000 1000 64 1 -> /dev/pts/0

lrwx------ 1 1000 1000 64 2 -> /dev/pts/0

Because now the crashdump is embedding the coredump and the context information we can print the backtrace very easy in SDK:

% crashinfo --bt crashtest.6184.1567153382.cdh.tar.gz

Extracting coredump with size 380928 ... Done.

New file name: /tmp/.HXMK7Z/crashtest.6184.1567153382.core

Reading symbols from /usr/local/bin/crashtest...done.

[New LWP 6184]

Core was generated by `crashtest -t2'.

Program terminated with signal SIGSEGV, Segmentation fault.

#0 main (argc=2, argv=0x7fffbc00e018) at ../testing/crashtest/crashtest.c:146

146 *(int*)0 = 2;

#0 main (argc=2, argv=0x7fffbc00e018) at ../testing/crashtest/crashtest.c:146

Build and Install

The build system is meson so make sure you have meson installed:

% cd crashmanager

% meson setup build

% cd build

% ninja

% ninja install

Configuration

For complete list of configuation option use meson configure:

% meson configure

Errata

The repository is currently unavailable because of a potential liability with my current employer.