224 lines
13 KiB
Markdown
224 lines
13 KiB
Markdown
|
# Breakpad Client Libraries
|
|||
|
|
|||
|
## Objective
|
|||
|
|
|||
|
The Breakpad client libraries are responsible for monitoring an application for
|
|||
|
crashes (exceptions), handling them when they occur by generating a dump, and
|
|||
|
providing a means to upload dumps to a crash reporting server. These tasks are
|
|||
|
divided between the “handler” (short for “exception handler”) library linked in
|
|||
|
to an application being monitored for crashes, and the “sender” library,
|
|||
|
intended to be linked in to a separate external program.
|
|||
|
|
|||
|
## Background
|
|||
|
|
|||
|
As one of the chief tasks of the client handler is to generate a dump, an
|
|||
|
understanding of [dump files](processor_design.md) will aid in understanding the
|
|||
|
handler.
|
|||
|
|
|||
|
## Overview
|
|||
|
|
|||
|
Breakpad provides client libraries for each of its target platforms. Currently,
|
|||
|
these exist for Windows on x86 and Mac OS X on both x86 and PowerPC. A Linux
|
|||
|
implementation has been written and is currently under review.
|
|||
|
|
|||
|
Because the mechanisms for catching exceptions and the methods for obtaining the
|
|||
|
information that a dump contains vary between operating systems, each target
|
|||
|
operating system requires a completely different handler implementation. Where
|
|||
|
multiple CPUs are supported for a single operating system, the handler
|
|||
|
implementation will likely also require separate code for each processor type to
|
|||
|
extract CPU-specific information. One of the goals of the Breakpad handler is to
|
|||
|
provide a prepackaged cross-platform system that masks many of these
|
|||
|
system-level differences and quirks from the application developer. Although the
|
|||
|
underlying implementations differ, the handler library for each system follows
|
|||
|
the same set of principles and exposes a similar interface.
|
|||
|
|
|||
|
Code that wishes to take advantage of Breakpad should be linked against the
|
|||
|
handler library, and should, at an appropriate time, install a Breakpad handler.
|
|||
|
For applications, it is generally desirable to install the handler as early in
|
|||
|
the start-up process as possible. Developers of library code using Breakpad to
|
|||
|
monitor itself may wish to install a Breakpad handler when the library is
|
|||
|
loaded, or may only want to install a handler when calls are made in to the
|
|||
|
library.
|
|||
|
|
|||
|
The handler can be triggered to generate a dump either by catching an exception
|
|||
|
or at the request of the application itself. The latter case may be useful in
|
|||
|
debugging assertions or other conditions where developers want to know how a
|
|||
|
program got in to a specific non-crash state. After generating a dump, the
|
|||
|
handler calls a user-specified callback function. The callback function may
|
|||
|
collect additional data about the program’s state, quit the program, launch a
|
|||
|
crash reporter application, or perform other tasks. Allowing for this
|
|||
|
functionality to be dictated by a callback function preserves flexibility.
|
|||
|
|
|||
|
The sender library is also has a separate implementation for each supported
|
|||
|
platform, because of the varying interfaces for accessing network resources on
|
|||
|
different operating systems. The sender transmits a dump along with other
|
|||
|
application-defined information to a crash report server via HTTP. Because dumps
|
|||
|
may contain sensitive data, the sender allows for the use of HTTPS.
|
|||
|
|
|||
|
The canonical example of the entire client system would be for a monitored
|
|||
|
application to link against the handler library, install a Breakpad handler from
|
|||
|
its main function, and provide a callback to launch a small crash reporter
|
|||
|
program. The crash reporter program would be linked against the sender library,
|
|||
|
and would send the crash dump when launched. A separate process is recommended
|
|||
|
for this function because of the unreliability inherent in doing any significant
|
|||
|
amount of work from a crashed process.
|
|||
|
|
|||
|
## Detailed Design
|
|||
|
|
|||
|
### Exception Handler Installation
|
|||
|
|
|||
|
The mechanisms for installing an exception handler vary between operating
|
|||
|
systems. On Windows, it’s a relatively simple matter of making one call to
|
|||
|
register a [top-level exception
|
|||
|
filter](http://msdn.microsoft.com/library/en-us/debug/base/setunhandledexceptionfilter.asp)
|
|||
|
callback function. On most Unix-like systems such as Linux, processes are
|
|||
|
informed of exceptions by the delivery of a signal, so an exception handler
|
|||
|
takes the form of a signal handler. The native mechanism to catch exceptions on
|
|||
|
Mac OS X requires a large amount of code to set up a Mach port, identify it as
|
|||
|
the exception port, and assign a thread to listen for an exception on that port.
|
|||
|
Just as the preparation of exception handlers differ, the manner in which they
|
|||
|
are called differs as well. On Windows and most Unix-like systems, the handler
|
|||
|
is called on the thread that caused the exception. On Mac OS X, the thread
|
|||
|
listening to the exception port is notified that an exception has occurred. The
|
|||
|
different implementations of the Breakpad handler libraries perform these tasks
|
|||
|
in the appropriate ways on each platform, while exposing a similar interface on
|
|||
|
each.
|
|||
|
|
|||
|
A Breakpad handler is embodied in an `ExceptionHandler` object. Because it’s a
|
|||
|
C++ object, `ExceptionHandler`s may be created as local variables, allowing them
|
|||
|
to be installed and removed as functions are called and return. This provides
|
|||
|
one possible way for a developer to monitor only a portion of an application for
|
|||
|
crashes.
|
|||
|
|
|||
|
### Exception Basics
|
|||
|
|
|||
|
Once an application encounters an exception, it is in an indeterminate and
|
|||
|
possibly hazardous state. Consequently, any code that runs after an exception
|
|||
|
occurs must take extreme care to avoid performing operations that might fail,
|
|||
|
hang, or cause additional exceptions. This task is not at all straightforward,
|
|||
|
and the Breakpad handler library seeks to do it properly, accounting for all of
|
|||
|
the minute details while allowing other application developers, even those with
|
|||
|
little systems programming experience, to reap the benefits. All of the Breakpad
|
|||
|
handler code that executes after an exception occurs has been written according
|
|||
|
to the following guidelines for safety at exception time:
|
|||
|
|
|||
|
* Use of the application heap is forbidden. The heap may be corrupt or
|
|||
|
otherwise unusable, and allocators may not function.
|
|||
|
* Resource allocation must be severely limited. The handler may create a new
|
|||
|
file to contain the dump, and it may attempt to launch a process to continue
|
|||
|
handling the crash.
|
|||
|
* Execution on the thread that caused the exception is significantly limited.
|
|||
|
The only code permitted to execute on this thread is the code necessary to
|
|||
|
transition handling to a dedicated preallocated handler thread, and the code
|
|||
|
to return from the exception handler.
|
|||
|
* Handlers shouldn’t handle crashes by attempting to walk stacks themselves,
|
|||
|
as stacks may be in inconsistent states. Dump generation should be performed
|
|||
|
by interfacing with the operating system’s memory manager and code module
|
|||
|
manager.
|
|||
|
* Library code, including runtime library code, must be avoided unless it
|
|||
|
provably meets the above guidelines. For example, this means that the STL
|
|||
|
string class may not be used, because it performs operations that attempt to
|
|||
|
allocate and use heap memory. It also means that many C runtime functions
|
|||
|
must be avoided, particularly on Windows, because of heap operations that
|
|||
|
they may perform.
|
|||
|
|
|||
|
A dedicated handler thread is used to preserve the state of the exception thread
|
|||
|
when an exception occurs: during dump generation, it is difficult if not
|
|||
|
impossible for a thread to accurately capture its own state. Performing all
|
|||
|
exception-handling functions on a separate thread is also critical when handling
|
|||
|
stack-limit-exceeded exceptions. It would be hazardous to run out of stack space
|
|||
|
while attempting to handle an exception. Because of the rule against allocating
|
|||
|
resources at exception time, the Breakpad handler library creates its handler
|
|||
|
thread when it installs its exception handler. On Mac OS X, this handler thread
|
|||
|
is created during the normal setup of the exception handler, and the handler
|
|||
|
thread will be signaled directly in the event of an exception. On Windows and
|
|||
|
Linux, the handler thread is signaled by a small amount of code that executes on
|
|||
|
the exception thread. Because the code that executes on the exception thread in
|
|||
|
this case is small and safe, this does not pose a problem. Even when an
|
|||
|
exception is caused by exceeding stack size limits, this code is sufficiently
|
|||
|
compact to execute entirely within the stack’s guard page without causing an
|
|||
|
exception.
|
|||
|
|
|||
|
The handler thread may also be triggered directly by a user call, even when no
|
|||
|
exception occurs, to allow dumps to be generated at any point deemed
|
|||
|
interesting.
|
|||
|
|
|||
|
### Filter Callback
|
|||
|
|
|||
|
When the handler thread begins handling an exception, it calls an optional
|
|||
|
user-defined filter callback function, which is responsible for judging whether
|
|||
|
Breakpad’s handler should continue handling the exception or not. This mechanism
|
|||
|
is provided for the benefit of library or plug-in code, whose developers may not
|
|||
|
be interested in reports of crashes that occur outside of their modules but
|
|||
|
within processes hosting their code. If the filter callback indicates that it is
|
|||
|
not interested in the exception, the Breakpad handler arranges for it to be
|
|||
|
delivered to any previously-installed handler.
|
|||
|
|
|||
|
### Dump Generation
|
|||
|
|
|||
|
Assuming that the filter callback approves (or does not exist), the handler
|
|||
|
writes a dump in a directory specified by the application developer when the
|
|||
|
handler was installed, using a previously generated unique identifier to avoid
|
|||
|
name collisions. The mechanics of dump generation also vary between platforms,
|
|||
|
but in general, the process involves enumerating each thread of execution, and
|
|||
|
capturing its state, including processor context and the active portion of its
|
|||
|
stack area. The dump also includes a list of the code modules loaded in to the
|
|||
|
application, and an indicator of which thread generated the exception or
|
|||
|
requested the dump. In order to avoid allocating memory during this process, the
|
|||
|
dump is written in place on disk.
|
|||
|
|
|||
|
### Post-Dump Behavior
|
|||
|
|
|||
|
Upon completion of writing the dump, a second callback function is called. This
|
|||
|
callback may be used to launch a separate crash reporting program or to collect
|
|||
|
additional data from the application. The callback may also be used to influence
|
|||
|
whether Breakpad will treat the exception as handled or unhandled. Even after a
|
|||
|
dump is successfully generated, Breakpad can be made to behave as though it
|
|||
|
didn’t actually handle an exception. This function may be useful for developers
|
|||
|
who want to test their applications with Breakpad enabled but still retain the
|
|||
|
ability to use traditional debugging techniques. It also allows a
|
|||
|
Breakpad-enabled application to coexist with a platform’s native crash reporting
|
|||
|
system, such as Mac OS X’ [CrashReporter](http://developer.apple.com/technotes/tn2004/tn2123.html)
|
|||
|
and [Windows Error Reporting](http://msdn.microsoft.com/isv/resources/wer/).
|
|||
|
|
|||
|
Typically, when Breakpad handles an exception fully and no debuggers are
|
|||
|
involved, the crashed process will terminate.
|
|||
|
|
|||
|
Authors of both callback functions that execute within a Breakpad handler are
|
|||
|
cautioned that their code will be run at exception time, and that as a result,
|
|||
|
they should observe the same programming practices that the Breakpad handler
|
|||
|
itself adheres to. Notably, if a callback is to be used to collect additional
|
|||
|
data from an application, it should take care to read only “safe” data. This
|
|||
|
might involve accessing only static memory locations that are updated
|
|||
|
periodically during the course of normal program execution.
|
|||
|
|
|||
|
### Sender Library
|
|||
|
|
|||
|
The Breakpad sender library provides a single function to send a crash report to
|
|||
|
a crash server. It accepts a crash server’s URL, a map of key-value parameters
|
|||
|
that will accompany the dump, and the path to a dump file itself. Each of the
|
|||
|
key-value parameters and the dump file are sent as distinct parts of a multipart
|
|||
|
HTTP POST request to the specified URL using the platform’s native HTTP
|
|||
|
facilities. On Linux, [libcurl](http://curl.haxx.se/) is used for this function,
|
|||
|
as it is the closest thing to a standard HTTP library available on that
|
|||
|
platform.
|
|||
|
|
|||
|
## Future Plans
|
|||
|
|
|||
|
Although we’ve had great success with in-process dump generation by following
|
|||
|
our guidelines for safe code at exception time, we are exploring options for
|
|||
|
allowing dumps to be generated in a separate process, to further enhance the
|
|||
|
handler library’s robustness.
|
|||
|
|
|||
|
On Windows, we intend to offer tools to make it easier for Breakpad’s settings
|
|||
|
to be managed by the native group policy management system.
|
|||
|
|
|||
|
We also plan to offer tools that many developers would find desirable in the
|
|||
|
context of handling crashes, such as a mechanism to determine at launch if the
|
|||
|
program last terminated in a crash, and a way to calculate “crashiness” in terms
|
|||
|
of crashes over time or the number of application launches between crashes.
|
|||
|
|
|||
|
We are also investigating methods to capture crashes that occur early in an
|
|||
|
application’s launch sequence, including crashes that occur before a program’s
|
|||
|
main function begins executing.
|