Android P上Java Crash、Native Crash的異常處理流程學習
常見的應用閃退有Java Crash和Native Crash引起,基於最新的Android P原始碼,以下是其2者的異常處理流程學習:
一. Java Crash
Java程式碼中未被try catch捕獲的異常發生時,虛擬機器會呼叫Thread#dispatchUncaughtException方法來處理異常:
// libcore/ojluni/src/main/java/java/lang/Thread.java public final void dispatchUncaughtException(Throwable e) { Thread.UncaughtExceptionHandler initialUeh = Thread.getUncaughtExceptionPreHandler(); if (initialUeh != null) { try { initialUeh.uncaughtException(this, e); } catch (RuntimeException | Error ignored) { // Throwables thrown by the initial handler are ignored } } getUncaughtExceptionHandler().uncaughtException(this, e); }
以上流程中,共有2個UncaughtExceptionHandler會參與處理,分別是PreHandler和Handler,核心是執行其各自實現的uncaughtException方法。
Android中提供了此二者的預設實現。Android系統中,應用程序由Zygote程序孵化而來,Zygote程序啟動時,zygoteInit方法中會呼叫RuntimeInit.commonInit,程式碼如下:
// frameworks/base/core/java/com/android/internal/os/ZygoteInit.java /** * The main function called when started through the zygote process... */ public static final Runnable zygoteInit(int targetSdkVersion, String[] argv, ClassLoader classLoader) { // ... RuntimeInit.commonInit(); ZygoteInit.nativeZygoteInit(); return RuntimeInit.applicationInit(targetSdkVersion, argv, classLoader); }
RuntimeInit.commonInit方法中會設定預設的UncaughtExceptionHandler ,程式碼如下:
// frameworks/base/core/java/com/android/internal/os/RuntimeInit.java protected static final void commonInit() { // ... /* * set handlers; these apply to all threads in the VM. Apps can replace * the default handler, but not the pre handler. */ LoggingHandler loggingHandler = new LoggingHandler(); Thread.setUncaughtExceptionPreHandler(loggingHandler); Thread.setDefaultUncaughtExceptionHandler(new KillApplicationHandler(loggingHandler)); // ... }
例項化2個物件,分別是LoggingHandler和KillApplicationHandler,均繼承於Thread#UncaughtExceptionHandler,重寫unCaughtException方法。其中:
-
LoggingHandler
,列印異常資訊,包括程序名,pid,Java棧資訊等。
- 系統程序,日誌以"*** FATAL EXCEPTION IN SYSTEM PROCESS: "開頭
- 應用程序,日誌以"FATAL EXCEPTION: "開頭
- KillApplicationHandler ,通知AMS,殺死程序。程式碼如下:
@Override public void uncaughtException(Thread t, Throwable e) { try { // 1. 確保LoggingHandler已打印出資訊(Android 9.0新增) ensureLogging(t, e); // 2. 通知AMS處理異常,彈出閃退的對話方塊等 ActivityManager.getService().handleApplicationCrash( mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e)); } catch (Throwable t2) { // ... } finally { // 3. 確保殺死程序 Process.killProcess(Process.myPid()); // 本質上給自己傳送Singal 9,殺死程序 System.exit(10); // Java中關閉程序的方法,呼叫其結束Java虛擬機器 } }
注意 1:
- Thread#setDefaultUncaughtExceptionHandler是公開API。應用可通過呼叫,自定義UncaughtExceptionHandler,替換掉KillApplicationHandler,這樣能自定義邏輯處理掉異常,避免閃退發生。
- Thread#setUncaughtExceptionPreHandler是hidden API。應用無法呼叫,不能替換LoggingHandler。
/** * ...... * @hide only for use by the Android framework (RuntimeInit) b/29624607 */ public static void setUncaughtExceptionPreHandler(UncaughtExceptionHandler eh) { uncaughtExceptionPreHandler = eh; } .... public static void setDefaultUncaughtExceptionHandler(UncaughtExceptionHandler eh) { defaultUncaughtExceptionHandler = eh; }
因此常出現的情況:
App執行時丟擲uncaught exception後,LoggingHandler在日誌中打印出了“FATAL EXCEPTION”資訊,但應用已替換KillApplicationHandler,應用程序並不會退出,AMS也不會得到通知。應用仍正常執行。
注意 2:
預設情況下,uncaught exception發生後,KillApplicationHandler的方法中會執行System.exit(10)結束程序的Java虛擬機器。此時,如果程序中仍有邏輯建立新執行緒,會丟擲錯誤Error:Thread starting during runtime shutdown。如下:
java.lang.InternalError: Thread starting during runtime shutdown at java.lang.Thread.nativeCreate(Native Method) at java.lang.Thread.start(Thread.java:733)
日誌中遇見此Error,建議首先查詢下引發程序異常退出的真正原因。
二. Native Crash
Native異常發生時,CPU通過異常中斷的方式,觸發異常處理流程。Linux kernel會將中斷處理,統一為訊號。應用程序可以註冊接收訊號。
Android P,預設註冊訊號處理函式的程式碼位置是:bionic/linker/linker_main.cpp,其中呼叫debuggerd_init方法註冊。linker_main.cpp程式碼如下:
// bionic/linker/linker_main.cpp /* * This code is called after the linker has linked itself and * fixed it's own GOT. It is safe to make references to externs * and other non-local data at this point. */ static ElfW(Addr) __linker_init_post_relocation(KernelArgumentBlock& args) { // ... debuggerd_init(&callbacks); }
debuggerd_init方法中會執行訊號處理函式的註冊,程式碼如下:
// system/core/debuggerd/handler/debuggerd_handler.cpp void debuggerd_init(debuggerd_callbacks_t* callbacks) { // ... struct sigaction action; memset(&action, 0, sizeof(action)); sigfillset(&action.sa_mask); action.sa_sigaction = debuggerd_signal_handler; action.sa_flags = SA_RESTART | SA_SIGINFO; // Use the alternate signal stack if available so we can catch stack overflows. action.sa_flags |= SA_ONSTACK; debuggerd_register_handlers(&action); }
由上看出,訊號處理的預設函式是debuggerd_signal_handler,那註冊接收哪些訊號呢?具體看debuggerd_register_handlers方法,如下:
// system/core/debuggerd/include/debuggerd/handler.h static void __attribute__((__unused__)) debuggerd_register_handlers(struct sigaction* action) { sigaction(SIGABRT, action, nullptr); sigaction(SIGBUS, action, nullptr); sigaction(SIGFPE, action, nullptr); sigaction(SIGILL, action, nullptr); sigaction(SIGSEGV, action, nullptr); #if defined(SIGSTKFLT) sigaction(SIGSTKFLT, action, nullptr); #endif sigaction(SIGSYS, action, nullptr); sigaction(SIGTRAP, action, nullptr); sigaction(DEBUGGER_SIGNAL, action, nullptr); }
通過sigaction方法,註冊接收的訊號有:SIGABRT,SIGBUS,SIGFPE,SIGILL,SIGSEGV,SIGSTKFLT,SIGSYS,SIGTRAP,DEBUGGER_SIGNAL,共計9個。
接下來,如果Native異常發生,處理流程如下:
- 應用的預設訊號處理函式debuggerd_signal_handler被呼叫,其主要作用是針對目標程序,clone出一個子程序,並執行debuggerd_dispatch_pseudothread方法,此方法執行結束後,子程序退出。如下:
// system/core/debuggerd/handler/debuggerd_handler.cpp // Handler that does crash dumping by forking and doing the processing in the child. // Do this by ptracing the relevant thread, and then execing debuggerd to do the actual dump. static void debuggerd_signal_handler(int signal_number, siginfo_t* info, void* context) { // ... // 1. 列印一條Fatal signal日誌,包含基本的異常資訊 log_signal_summary(info); // 2. clone子程序 pid_t child_pid = clone(debuggerd_dispatch_pseudothread, pseudothread_stack, CLONE_THREAD | CLONE_SIGHAND | CLONE_VM | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID, &thread_info, nullptr, nullptr, &thread_info.pseudothread_tid); // ... }
log_signal_summary方法會在日誌中列印一條“Fatal signal”的異常資訊。通過註釋大致瞭解,如果後續過程失敗,至少先保留一條基本的Native異常資訊。例如:
12-16 14:30:17.067 10177 4780 4780 F libc : Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x74 in tid 4780 (com.kevin.test), pid 4780 (com.kevin.test)
-
子程序clone出後,會執行debuggerd_dispatch_pseudothread方法,其主要作用是通過execle函式,執行/system/bin/crash_dump32或/system/bin/crash_dump64程式,並傳入相關引數,包括:
- main_tid:發生Native Crash的執行緒id(目標程序)
- pseudothread_tid:初步從程式碼看,與獲取backtrace有關,後續更多調研
- debuggerd_dump_type:共有4種dump型別,發生Native Crash時的型別是kDebuggerdTombstone
static int debuggerd_dispatch_pseudothread(void* arg) { // ... execle(CRASH_DUMP_PATH, CRASH_DUMP_NAME, main_tid, pseudothread_tid, debuggerd_dump_type, nullptr, nullptr); // ... }
注意:此時執行crash_dump32或crash_dump64,並不會新建立一個程序。原因是:Linux中,execle函式將當前程序替換為1個新程序,新啟動的程式main方法被執行,新舊程序的pid不變。
-
crash_dump.cpp的main方法會執行,程式碼位置:system/core/debuggerd/crash_dump.cpp,這裡可以說是Native Crash異常處理的核心程式碼,其主要作用是:
- 通過ptrace attach到應用(看原始碼這裡迴圈ptrace到應用的每條子執行緒,並針對發生Native Crash的執行緒會呼叫ReadCrashInfo方法),讀取應用的暫存器等資訊,最終彙總所有異常資訊,包括機型版本,ABI,訊號,暫存器,backtrace等,在日誌中輸出
- 通過Socket通知tombstoned程序,將所有異常資訊輸出到/data/tombstones/tombstone_xx檔案中
- 通過Socket通知System_server程序,(NativeCrashListener執行緒會監聽socket通訊),並最終呼叫到AMS#handleApplicationCrashInner方法(邏輯同Java Crash的處理此時保持一致)
以上邏輯,主要程式碼如下:
// system/core/debuggerd/crash_dump.cpp int main(int argc, char** argv) { // ... // 1. 通過ptrach attach到應用,獲取異常資訊 ATRACE_NAME("ptrace"); for (pid_t thread : threads) { // ... ThreadInfo info; info.pid = target_process; info.tid = thread; info.process_name = process_name; info.thread_name = get_thread_name(thread); if (!ptrace_interrupt(thread, &info.signo)) { PLOG(WARNING) << "failed to ptrace interrupt thread " << thread; ptrace(PTRACE_DETACH, thread, 0, 0); continue; } if (thread == g_target_thread) { // Read the thread's registers along with the rest of the crash info out of the pipe.kDebuggerdTombstone, ReadCrashInfo(input_pipe, &siginfo, &info.registers, &abort_address); info.siginfo = &siginfo; info.signo = info.siginfo->si_signo; } else { info.registers.reset(Regs::RemoteGet(thread)); if (!info.registers) { PLOG(WARNING) << "failed to fetch registers for thread " << thread; ptrace(PTRACE_DETACH, thread, 0, 0); continue; } } // ... } // ... // 2. 與tombstoned程序建立Socket通訊,目的由tombstoned程序輸出異常資訊至/data/tombstones/tombstone_xx檔案 { ATRACE_NAME("tombstoned_connect"); LOG(INFO) << "obtaining output fd from tombstoned, type: " << dump_type; g_tombstoned_connected = tombstoned_connect(g_target_thread, &g_tombstoned_socket, &g_output_fd, dump_type); } // ... // 3. 通過Socket通知System_server程序 activity_manager_notify(target_process, signo, amfd_data); // ... }
- 最後介紹下AMS端的處理。System_server程序中,AMS啟動時,會先呼叫startObservingNativeCrashes方法,啟動1個新執行緒NativeCrashListener,其作用是迴圈監聽Socket埠(Socket Path:/data/system/ndebugsocket),接收來自debuggerd端的Native異常資訊(如上面分析,對端是執行crash_dump程式的程序)。主要程式碼如下:
// frameworks/base/services/core/java/com/android/server/am/NativeCrashListener.java final class NativeCrashListener extends Thread { // ... @Override public void run() { // ... try { FileDescriptor serverFd = Os.socket(AF_UNIX, SOCK_STREAM, 0); final UnixSocketAddress sockAddr = UnixSocketAddress.createFileSystem( DEBUGGERD_SOCKET_PATH); Os.bind(serverFd, sockAddr); Os.listen(serverFd, 1); Os.chmod(DEBUGGERD_SOCKET_PATH, 0777); while (true) { FileDescriptor peerFd = null; try { if (MORE_DEBUG) Slog.v(TAG, "Waiting for debuggerd connection"); peerFd = Os.accept(serverFd, null /* peerAddress */); if (MORE_DEBUG) Slog.v(TAG, "Got debuggerd socket " + peerFd); if (peerFd != null) { // consumeNativeCrashData(peerFd); } // ... }
每接收到一次Native異常資訊後,通過consumeNativeCrashData方法,啟動1個新執行緒,呼叫AcitivityManagerService#handleApplicationCrashInner方法,至此處理邏輯將與Java Crash保持一致 。通知AMS,有Native Crash發生,列印日誌,彈出FC閃退對話方塊等。
文中如有不正確或需完善的地方,歡迎指正,一起學習,謝謝:-)
作者:kevin song,2018.12.18於南京建鄴區